Difference between revisions of "Variance"

(parameter names)
(Removed Covariance feature, replaced with builtin Covariance function)
Line 43: Line 43:
  
 
When one or more points with non-zero weight in x are INF or -INF, Variance will return INF if min(x)<max(x), or INF if min(x)=+INF or max(x)=-INF. If there are fewer than two numeric points with positive weight, Variance returns NAN. Any point with zero weight is ignored, so that INF or NAN values don't cause the result to become NAN if they are given a zero weight.  
 
When one or more points with non-zero weight in x are INF or -INF, Variance will return INF if min(x)<max(x), or INF if min(x)=+INF or max(x)=-INF. If there are fewer than two numeric points with positive weight, Variance returns NAN. Any point with zero weight is ignored, so that INF or NAN values don't cause the result to become NAN if they are given a zero weight.  
 
= Computing Covariance =
 
 
(first exposed in build 4.0.0.41)
 
 
Variance( X, ''I, w,'' K'', K2'' )
 
When each data point is a vector, a generalization of the variance is the covariance, represented by a square symmetric positive-definite matrix, which captures the pairwise variation of the components in the vector, with the component variances along the diagonal.  This is similar to a [[Correlation]] matrix (which can be obtained using the [[Correlation]] function).  In fact, each element of a [[Correlation]] matrix is the covariance divided by the row and column standard deviations.  The [[Gaussian]] distribution function expects a covariance matrix.
 
 
When computing a covariance matrix, your data will be at least two dimensional.  Since each data point is a vector, one index corresponds to this vector dimension, K.  The second index corresponds to the data points themselves, and is usually Run, but will be a different index when computing covariance from historical data.
 
 
The resulting covariance matrix is square and two-dimensional, and thus needs a copy of the K index.  Therefore, before computing the covariance matrix, you must create a second index as a copy of your vector dimension.  Then, provide these two indexes using the two optional parameters, K and K2 to Variance, as in this example where each vector data point is indexed by D:
 
 
Index D2 := D;
 
Variance( X, K:D, K2:D2 )
 
 
In this example, the running index is implicitly [[[Run]].  If a different running index, I, is used, then the syntax is:
 
 
Index D2 := D;
 
Variance( X,I, D:D, K2:D2 )
 
 
The K2 parameter can be omitted.  When omitted, a local index is automatically created, with a name formed by appending "_2" to the first index.
 
 
== Weighted Covariance ==
 
 
The Variance function can be used to compute the covariance when each data point is non-uniformly weighted.  The weight can be specified using the optional w parameter, or if not specified, the [[SampleWeighting]] system variable providing the weightings when the running index is Run.  A non-uniform weighting should be indexed by the running index.
 
  
 
= See Also =
 
= See Also =
Line 73: Line 48:
 
* [[Statistical Functions and Importance Weighting]]
 
* [[Statistical Functions and Importance Weighting]]
 
* [[SDeviation]], [[Skewness]], [[Kurtosis]].
 
* [[SDeviation]], [[Skewness]], [[Kurtosis]].
* [[Correlation]]
+
* [[Correlation]], [[Covariance]]

Revision as of 17:02, 2 May 2007


Computes variance of an uncertain quantity, or the weighted sample variance of a data set.

With multivariate samples (where data point is a vector), the Variance function can also be used to compute the covariance (or weighted covariance) matrix from a data set.

Simple Usage

If X is an uncertain quantity, dependent on Analytica distribution functions, the variance is obtained using

Variance(X)

X is evaluated in sample mode, and the variance along the Run index computed.

Variance along Index

Given a data set indexed by I, the sample variance along I is computed using:

Variance(X,I)

When the running index, I, is the system index Run (or not specified), the value of X is evaluated in Sample mode and the average value among numeric values computed. If the running index is anything other than Run, then X is evaluated in context.

Weighted variance

The weighted variance computing by assigning a different "weight" to each point. The weight vector, wt, should be indexed by I (or by Run if I is not specified), and the weighted variance is computed using one of these forms

Variance(X,w:wt)
Variance(X,I,w:wt)

When the w parameter is not specified, and the running index I is either the Run index or is not specified, then the weighting defaults to the value in the system variable SampleWeighting.

The weighted variance is defined as

${\sum_i w_i (x-\bar{x})^2} \over { \sum_i w_i (1-w_i) }$

where the sum is taken over numeric values, $\bar{x}$ is the weighted mean, and where $\sum_i w_i = 1$. If Sum(w,i)<>1, the w's are normalized, so that the sum over numeric values is 1. This is an unbiased estimator of the weighted variance.

When $w_i$ is constant, this simplifies to

${\sum_i (x-\bar)^2} \over {N-1}$ where $N$ is the number of points (sampleSize when I is Run).

When one or more points with non-zero weight in x are INF or -INF, Variance will return INF if min(x)<max(x), or INF if min(x)=+INF or max(x)=-INF. If there are fewer than two numeric points with positive weight, Variance returns NAN. Any point with zero weight is ignored, so that INF or NAN values don't cause the result to become NAN if they are given a zero weight.

See Also

Comments


You are not allowed to post comments.