# RegressionDist

## RegressionDist(Y, B, I, K*, C, S*)

RegressionDist is similar to Regression(Y, B, I, K), but it returns linear regression coefficients not as mid values but as a probability distribution reflecting the uncertainty in the regression fit and measurement noise. You can use the uncertain coefficients from RegressionDist to generate a predictive probability distribution on «Y» that reflects this uncertainty.

Suppose you have data where «Y» was produced as:

`Y = Sum(C*B, K) + Normal(0, S)`

«S» is the measurement noise. You have the data («B[I, K]» and «Y[I]»). You might or might not know the measurement noise «S». So you perform a linear regression to obtain an estimate of «C». Because your estimate is obtained from a finite amount of data, your estimate of «C» is itself uncertain. This function returns the coefficients «C» as a distribution (i.e., in Sample mode, it returns a sampling of coefficients indexed by Run and «K»), reflecting the uncertainty in the estimation of these parameters.

## Library

Multivariate Distributions library functions (Multivariate Distributions.ana)

- Use
**File → Add Library...**to add this library

## Examples

If you know the noise level «S» in advance, then you can use historical data as a starting point for building a predictive model of «Y», as follows:

`{ Your model of the dependent variables: }`

`Variable Y := your historical dependent data, indexed by I`

`Variable B := your historical independent data, indexed by I, K`

`Variable X := { indexed by K. Maybe others. Possibly uncertain }`

`Variable S := { the known noise level }`

`Chance C := RegressionDist(Y, B, I, K)`

`Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)`

If you don't know the noise level, then you need to estimate it. You'll need it for the normal term of `Predicted_Y`

anyway, and you'll need to do a regression to find it. So you can pass these optional parameters into RegressionDist. The last three lines above become:

`Variable E_C := Regression(Y, B, I, K)`

`Variable S := RegressionNoise(Y, B, I, K, E_C)`

`Chance C := RegressionDist(Y, B, I, K, E_C)`

`Variable Predicted_Y := Sum(C*X, K) + Normal(0, S)`

If you use RegressionNoise to compute «S», you should use `Mid(RegressionNoise(...))`

for the «S» parameter. However, when computing «S» for your prediction, don't RegressionNoise in context. Better is if you don't know the measurement noise in advance, don't supply it as a parameter.

## Errors That Might Result

`Evaluation Error in C:`

`Array is not symmetric in System Function Decompose.`

`while evaluating function Gaussian.`

`Call stack:`

`Gaussian`

`RegressionDist`

`C`

Possible causes:

- One of your independent variables might be zero for every data point. As of Analytica 4.2, RegressionDist is not robust to this singularity. Note that this singularity is problematic -- the mean coefficient value for that variable is undefined and the variance on the coefficient uncertainty is infinite.

- Remedy: Eliminate independent variables that are everywhere zero from the basis before calling.

- Your data (most likely in the basis) contains NaN values.

Enable comment auto-refresher