Difference between revisions of "RegressionDist"

Revision as of 00:25, 14 January 2016

RegressionDist(Y, B, I, K, C, S)

RegressionDist returns linear regression coefficients as a distribution.

Suppose you have data where «Y» was produced as:

Y = Sum(C*B, K) + Normal(0, S)

«S» is the measurement noise. You have the data («B[I, K]» and «Y[I]»). You might or might not know the measurement noise «S». So you perform a linear regression to obtain an estimate of «C». Because your estimate is obtained from a finite amount of data, your estimate of «C» is itself uncertain. This function returns the coefficients «C» as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and «K»), reflecting the uncertainty in the estimation of these parameters.

Library

Multivariate Distributions.ana

Examples

If you know the noise level «S» in advance, then you can use historical data as a starting point for building a predictive model of «Y», as follows:

{ Your model of the dependent variables: }

Variable Y := your historical dependent data, indexed by I

Variable B := your historical independent data, indexed by I, K

Variable X := { indexed by K. Maybe others. Possibly uncertain }

Variable S := { the known noise level }

Chance C := RegressionDist(Y, B, I, K)

Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)

If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become:

Variable E_C := Regression(Y, B, I, K)

Variable S := RegressionNoise(Y, B, I, K, E_C)

Chance C := RegressionDist(Y, B, I, K, E_C)

Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)

If you use RegressionNoise to compute «S», you should use Mid(RegressionNoise(...)) for the «S» parameter. However, when computing «S» for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.

Errors That Might Result

Evaluation Error in C:

Array is not symmetric in System Function Decompose.

while evaluating function Gaussian.

Call stack:

Gaussian

RegressionDist

C

Possible causes:

One of your independent variables might be zero for every data point. As of Analytica 4.2, RegressionDist is not robust to this singularity. Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite.

Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.

Your data (most likely in the basis) contains NaN values.

@@ Line 2: / Line 2: @@
 [[category:Doc Status D]]   <!-- do not delete this line, Lumina internal use -->
-= RegressionDist(Y,B,I,K'',C,S'') =
+== RegressionDist(Y, B, I, K'', C, S'') ==
 RegressionDist returns linear regression coefficients as a distribution.
-Suppose you have data where Y was produced as:
+Suppose you have data where «Y» was produced as:
-  Y = [[Sum]]( C*B, K ) + [[Normal]](0,S)
+:<code>Y = Sum(C*B, K) + Normal(0, S)</code>
-S is the measurement noise.  You have the data (B[I,K] and Y[I]).  You might or might not know the measurement noise S.  So you perform a linear regression to obtain an estimate of C.  Because your estimate is obtained from a finite amount of data, your estimate of C is itself uncertain.  This function returns the coefficients C as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and K), reflecting the uncertainty in the estimation of these parameters.
+«S» is the measurement noise.  You have the data («B[I, K]» and «Y[I]»).  You might or might not know the measurement noise «S».  So you perform a linear regression to obtain an estimate of «C».  Because your estimate is obtained from a finite amount of data, your estimate of «C» is itself uncertain.  This function returns the coefficients «C» as a distribution (i.e., in [[Sample]] mode, it returns a sampling of coefficients indexed by [[Run]] and «K»), reflecting the uncertainty in the estimation of these parameters.
-= Library =
+== Library ==
-Multivariate Distributions.ana
+<code>Multivariate Distributions.ana</code>
-= Examples =
+== Examples ==
-If you know the noise level S in advance, then you can use historical data as a starting point for building a predictive model of Y, as follows:
+If you know the noise level «S» in advance, then you can use historical data as a starting point for building a predictive model of «Y», as follows:
- { Your model of the dependent variables: }
+:<code>{ Your model of the dependent variables: }</code>
- Variable Y := your historical dependent data, indexed by I
+:<code>Variable Y := your historical dependent data, indexed by I</code>
- Variable B := your historical independent data, indexed by I,K
+:<code>Variable B := your historical independent data, indexed by I, K</code>
- Variable X := { indexed by K.  Maybe others.  Possibly uncertain }
+:<code>Variable X := { indexed by K.  Maybe others.  Possibly uncertain }</code>
- Variable S := { the known noise level }
+:<code>Variable S := { the known noise level }</code>
- Chance C := [[RegressionDist]](Y,B,I,K)
+:<code>Chance C := RegressionDist(Y, B, I, K)</code>
- Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
+:<code>Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)</code>
-If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of Predicted_Y anyway, and you'll need to do a regression to find it.  So you can pass these optional parameters into RegressionDist.  The last three lines above become:
+If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of <code>Predicted_Y</code> anyway, and you'll need to do a regression to find it.  So you can pass these optional parameters into [[RegressionDist]].  The last three lines above become:
- Variable E_C := [[Regression]](Y,B,I,K)
+:<code>Variable E_C := Regression(Y, B, I, K)</code>
- Variable S := [[RegressionNoise]]( Y,B,I,K,E_C )
+:<code>Variable S := RegressionNoise(Y, B, I, K, E_C)</code>
- Chance C := [[RegressionDist]](Y,B,I,K,E_C)
+:<code>Chance C := RegressionDist(Y, B, I, K, E_C)</code>
- Variable Predicted_Y := [[Sum]](C*X,K) + [[Normal]](0,S)
+:<code>Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)</code>
-If you use [[RegressionNoise]] to compute S, you should use [[Mid]]([[RegressionNoise]](...)) for the S parameter.  However, when computing S for your prediction, don't [[RegressionNoise]] in context.  Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
+If you use [[RegressionNoise]] to compute «S», you should use <code>Mid(RegressionNoise(...))</code> for the «S» parameter.  However, when computing «S» for your prediction, don't [[RegressionNoise]] in context.  Better is if you don't know the measurement noise in advance, don't supply it as a parameter.
-= Errors That Might Result =
+== Errors That Might Result ==
- Evaluation Error in C:
+:<code>Evaluation Error in C:</code>
- Array is not symmetric in System Function Decompose.
+:<code>Array is not symmetric in System Function Decompose.</code>
- while evaluating function Gaussian.
+:<code>while evaluating function Gaussian.</code>
- Call stack:
+:<code>Call stack:</code>
-    Gaussian
+::<code>Gaussian</code>
-    RegressionDist
+::<code>RegressionDist</code>
-    C
+::<code>C</code>
 Possible causes:
 * One of your independent variables might be zero for every data point.  As of Analytica 4.2, [[RegressionDist]] is not robust to this singularity.  Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite.
 : Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.
-* Your data (most likely in the basis) contains NaN values.
+* Your data (most likely in the basis) contains [[NaN]] values.
-= See Also =
+==See Also ==
 * [[Regression]]
 * [[RegressionNoise]]
 * [[RegressionFitProb]]
 * [[Analytica_User_Group/Past_Topics#Using_Regression|Using Regression webinar]]