Difference between revisions of "RegressionDist"
m (formatted example code) |
|||
Line 16: | Line 16: | ||
If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows: | If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows: | ||
− | { Your model of the dependent variables: } | + | { Your model of the dependent variables: } |
− | Variable Y := your historical dependent data, indexed by I | + | Variable Y := your historical dependent data, indexed by I |
− | Variable B := your historical independent data, indexed by I,K | + | Variable B := your historical independent data, indexed by I,K |
− | Variable X := { indexed by K. Maybe others. Possibly uncertain } | + | Variable X := { indexed by K. Maybe others. Possibly uncertain } |
− | Variable S := { the known noise level } | + | Variable S := { the known noise level } |
− | Chance C := RegressionDist(Y,B,I,K) | + | Chance C := '''RegressionDist'''(Y,B,I,K) |
− | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | + | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) |
If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become: | If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become: | ||
− | Variable E_C := Regression(Y,B,I,K) | + | Variable E_C := [[Regression]](Y,B,I,K) |
− | Variable S := [[RegressionNoise]]( Y,B,I,K,E_C ) | + | Variable S := [[RegressionNoise]]( Y,B,I,K,E_C ) |
− | Chance C := RegressionDist(Y,B,I,K,E_C) | + | Chance C := '''RegressionDist'''(Y,B,I,K,E_C) |
− | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) | + | Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S) |
If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter. However, when computing S for your prediction, don't [[RegressionNoise]] in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter. | If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter. However, when computing S for your prediction, don't [[RegressionNoise]] in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter. |
Revision as of 08:39, 4 May 2007
RegressionDist(Y,B,I,K,C,S)
RegressionDist returns linear regression coefficients as a distribution.
Suppose you have data where Y was produced as:
Y = Sum( C*B, K ) + Normal(0,S)
S is the measurement noise. You have the data (B[I,K] and Y[I]). You might or might not know the measurement noise S. So you perform a linear regression to obtain an estimate of C. Because your estimate is obtained from a finite amount of data, your estimate of C is itself uncertain. This function returns the coefficients C as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and K), reflecting the uncertainty in the estimation of these parameters.
Library
Multivariate Distributions.ana
Examples
If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows:
{ Your model of the dependent variables: } Variable Y := your historical dependent data, indexed by I Variable B := your historical independent data, indexed by I,K Variable X := { indexed by K. Maybe others. Possibly uncertain } Variable S := { the known noise level } Chance C := RegressionDist(Y,B,I,K) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)
If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become:
Variable E_C := Regression(Y,B,I,K) Variable S := RegressionNoise( Y,B,I,K,E_C ) Chance C := RegressionDist(Y,B,I,K,E_C) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)
If you use RegressionNoise to compute S, you should use Mid(RegressionNoise(...)) for the S parameter. However, when computing S for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
Enable comment auto-refresher