RegressionDist(Y,B,I,K,C,S)

RegressionDist returns linear regression coefficients as a distribution.

Suppose you have data where Y was produced as:

 Y = Sum( C*B, K ) + Normal(0,S)

S is the measurement noise. You have the data (B[I,K] and Y[I]). You might or might not know the measurement noise S. So you perform a linear regression to obtain an estimate of C. Because your estimate is obtained from a finite amount of data, your estimate of C is itself uncertain. This function returns the coefficients C as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and K), reflecting the uncertainty in the estimation of these parameters.

Library

Multivariate Distributions.ana

Examples

If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows:

{ Your model of the dependent variables: } Variable Y := your historical dependent data, indexed by I Variable B := your historical independent data, indexed by I,K Variable X := { indexed by K. Maybe others. Possibly uncertain } Variable S := { the known noise level } Chance C := RegressionDist(Y,B,I,K) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)

If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become: Variable E_C := Regression(Y,B,I,K) Variable S := RegressionNoise( Y,B,I,K,E_C ) Chance C := RegressionDist(Y,B,I,K,E_C) Variable Predicted_Y := Sum(C*X,K) + Normal(0,S)

If you use RegressionNoise to compute S, you should use Mid(RegressionNoise(...)) for the S parameter. However, when computing S for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.

RegressionDist

Contents

RegressionDist(Y,B,I,K,C,S)

Library

Examples

See Also