Difference between revisions of "RegressionDist"

(categories)
Line 24: Line 24:
 
  Variable X := { indexed by K.  Maybe others.  Possibly uncertain }
 
  Variable X := { indexed by K.  Maybe others.  Possibly uncertain }
 
  Variable S := { the known noise level }
 
  Variable S := { the known noise level }
  Chance C := '''RegressionDist'''(Y,B,I,K)
+
  Chance C := [[RegressionDist]](Y,B,I,K)
 
  Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
 
  Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
  
Line 30: Line 30:
 
  Variable E_C := [[Regression]](Y,B,I,K)
 
  Variable E_C := [[Regression]](Y,B,I,K)
 
  Variable S := [[RegressionNoise]]( Y,B,I,K,E_C )
 
  Variable S := [[RegressionNoise]]( Y,B,I,K,E_C )
  Chance C := '''RegressionDist'''(Y,B,I,K,E_C)
+
  Chance C := [[RegressionDist]](Y,B,I,K,E_C)
 
  Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
 
  Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
  
 
If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter.  However, when computing S for your prediction, don't [[RegressionNoise]] in context.  Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
 
If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter.  However, when computing S for your prediction, don't [[RegressionNoise]] in context.  Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
 +
 +
= Errors That Might Result =
 +
 +
Evaluation Error in C:
 +
Array is not symmetric in System Function Decompose.
 +
while evaluating function Gaussian.
 +
Call stack:
 +
    Gaussian
 +
    RegressionDist
 +
    C
 +
 +
Possible causes:
 +
* One of your independent variables might be zero for every data point.  As of Analytica 4.2, [[RegressionDist]] is not robust to this singularity.  Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite. 
 +
: Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.
 +
* Your data (most likely in the basis) contains NaN values.
  
 
= See Also =
 
= See Also =

Revision as of 00:10, 7 August 2009


RegressionDist(Y,B,I,K,C,S)

RegressionDist returns linear regression coefficients as a distribution.

Suppose you have data where Y was produced as:

 Y = Sum( C*B, K ) + Normal(0,S)

S is the measurement noise. You have the data (B[I,K] and Y[I]). You might or might not know the measurement noise S. So you perform a linear regression to obtain an estimate of C. Because your estimate is obtained from a finite amount of data, your estimate of C is itself uncertain. This function returns the coefficients C as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and K), reflecting the uncertainty in the estimation of these parameters.

Library

Multivariate Distributions.ana

Examples

If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows:

{ Your model of the dependent variables: }
Variable Y := your historical dependent data, indexed by I
Variable B := your historical independent data, indexed by I,K
Variable X := { indexed by K.  Maybe others.  Possibly uncertain }
Variable S := { the known noise level }
Chance C := RegressionDist(Y,B,I,K)
Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)

If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become:

Variable E_C := Regression(Y,B,I,K)
Variable S := RegressionNoise( Y,B,I,K,E_C )
Chance C := RegressionDist(Y,B,I,K,E_C)
Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)

If you use RegressionNoise to compute S, you should use Mid(RegressionNoise(...)) for the S parameter. However, when computing S for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.

Errors That Might Result

Evaluation Error in C:
Array is not symmetric in System Function Decompose.
while evaluating function Gaussian.
Call stack:
   Gaussian
   RegressionDist
   C

Possible causes:

  • One of your independent variables might be zero for every data point. As of Analytica 4.2, RegressionDist is not robust to this singularity. Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite.
Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.
  • Your data (most likely in the basis) contains NaN values.

See Also

Comments


You are not allowed to post comments.