Difference between revisions of "RegressionDist"
(categories) |
|||
Line 24: | Line 24: | ||
Variable X := { indexed by K. Maybe others. Possibly uncertain } | Variable X := { indexed by K. Maybe others. Possibly uncertain } | ||
Variable S := { the known noise level } | Variable S := { the known noise level } | ||
− | Chance C := | + | Chance C := [[RegressionDist]](Y,B,I,K) |
Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | ||
Line 30: | Line 30: | ||
Variable E_C := [[Regression]](Y,B,I,K) | Variable E_C := [[Regression]](Y,B,I,K) | ||
Variable S := [[RegressionNoise]]( Y,B,I,K,E_C ) | Variable S := [[RegressionNoise]]( Y,B,I,K,E_C ) | ||
− | Chance C := | + | Chance C := [[RegressionDist]](Y,B,I,K,E_C) |
Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | ||
If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter. However, when computing S for your prediction, don't [[RegressionNoise]] in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter. | If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter. However, when computing S for your prediction, don't [[RegressionNoise]] in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter. | ||
+ | |||
+ | = Errors That Might Result = | ||
+ | |||
+ | Evaluation Error in C: | ||
+ | Array is not symmetric in System Function Decompose. | ||
+ | while evaluating function Gaussian. | ||
+ | Call stack: | ||
+ | Gaussian | ||
+ | RegressionDist | ||
+ | C | ||
+ | |||
+ | Possible causes: | ||
+ | * One of your independent variables might be zero for every data point. As of Analytica 4.2, [[RegressionDist]] is not robust to this singularity. Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite. | ||
+ | : Remedy: Eliminate independent variables that are everywhere zero from the basis before calling. | ||
+ | * Your data (most likely in the basis) contains NaN values. | ||
= See Also = | = See Also = |
Revision as of 00:10, 7 August 2009
RegressionDist(Y,B,I,K,C,S)
RegressionDist returns linear regression coefficients as a distribution.
Suppose you have data where Y was produced as:
Y = Sum( C*B, K ) + Normal(0,S)
S is the measurement noise. You have the data (B[I,K] and Y[I]). You might or might not know the measurement noise S. So you perform a linear regression to obtain an estimate of C. Because your estimate is obtained from a finite amount of data, your estimate of C is itself uncertain. This function returns the coefficients C as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and K), reflecting the uncertainty in the estimation of these parameters.
Library
Multivariate Distributions.ana
Examples
If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows:
{ Your model of the dependent variables: } Variable Y := your historical dependent data, indexed by I Variable B := your historical independent data, indexed by I,K Variable X := { indexed by K. Maybe others. Possibly uncertain } Variable S := { the known noise level } Chance C := RegressionDist(Y,B,I,K) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)
If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become:
Variable E_C := Regression(Y,B,I,K) Variable S := RegressionNoise( Y,B,I,K,E_C ) Chance C := RegressionDist(Y,B,I,K,E_C) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)
If you use RegressionNoise to compute S, you should use Mid(RegressionNoise(...)) for the S parameter. However, when computing S for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
Errors That Might Result
Evaluation Error in C: Array is not symmetric in System Function Decompose. while evaluating function Gaussian. Call stack: Gaussian RegressionDist C
Possible causes:
- One of your independent variables might be zero for every data point. As of Analytica 4.2, RegressionDist is not robust to this singularity. Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite.
- Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.
- Your data (most likely in the basis) contains NaN values.
Enable comment auto-refresher