Difference between revisions of "Variance"

(added info about obtaining covariance from the variance function.)
 
(3 intermediate revisions by one other user not shown)
Line 2: Line 2:
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
 
   
 
   
Computes variance of an uncertain quantity, or the weighted sample variance of a data set.
+
==Variance(X, ''I, w'')==
 +
Computes variance of an uncertain quantity «X», or the «w»-weighted sample variance of a data set.
  
With multivariate samples (where data point is a vector), the Variance function can also be used to compute the covariance (or weighted covariance) matrix from a data set.
+
With multivariate samples (where data point is a vector), the [[Variance]] function can also be used to compute the covariance (or weighted covariance) matrix from a data set.
  
= Simple Usage =
+
If «X» is an uncertain quantity, dependent on Analytica distribution functions, the variance is obtained using [[Variance]](X).
  
If ''X'' is an uncertain quantity, dependent on Analytica distribution functions, the variance is obtained using
+
«X» is evaluated in [[sample]] mode, and the variance along the [[Run]] index computed.
  
Variance(X)
+
==Optional parameters==
 +
=== I ===
 +
Given a data set indexed by «I», the sample variance along index «I» is computed using [[Variance]](X, I).
  
X is evaluated in sample mode, and the variance along the [[Run]] index computed.
+
When the running index «I» is the system index [[Run]] (or not specified), the value of «X» is evaluated in [[Sample]] mode and the average value among numeric values computed. If the running index is anything other than [[Run]], then «X» is evaluated in context.  
  
= Variance along Index =
+
=== W ===
 +
The weighted variance computing by assigning a different "weight" to each point.  The weight vector <code>wt</code> should be indexed by «I» (or by [[Run]] if «I» is not specified), and the weighted variance is computed using one of these forms
 +
:<code>Variance(X, w: wt)</code>
 +
:<code>Variance(X, I, w:wt)</code>
  
Given a data set indexed by I, the sample variance along I is computed using:
+
When the «w» parameter is not specified, and the running index «I» is either the [[Run]] index or is not specified, then the weighting defaults to the value in the system variable [[SampleWeighting]].
 
 
Variance(X,I)
 
 
 
When the running index, I, is the system index [[Run]] (or not specified), the value of X is evaluated in Sample mode and the average value among numeric values computed. If the running index is anything other than Run, then X is evaluated in context.
 
 
 
= Weighted variance =
 
 
 
The weighted variance computing by assigning a different "weight" to each point.  The weight vector, wt, should be indexed by I (or by Run if I is not specified), and the weighted variance is computed using one of these forms
 
 
 
Variance(X,w:wt)
 
Variance(X,I,w:wt)
 
 
 
When the w parameter is not specified, and the running index I is either the Run index or is not specified, then the weighting defaults to the value in the system variable [[SampleWeighting]].
 
  
 
The weighted variance is defined as
 
The weighted variance is defined as
 +
:<math>
 +
{\sum_i w_i (x-\bar{x})^2} \over { \sum_i w_i (1-w_i) }
 +
</math>
  
${\sum_i w_i (x-\bar{x})^2} \over { \sum_i w_i (1-w_i) }$
+
where the sum is taken over numeric values, <math>\bar{x}</math> is the weighted mean, and where <math>\sum_i w_i = 1</math>. If ''Sum(w, i) <> 1'', the «w»'s are normalized, so that the sum over numeric values is 1. This is an unbiased estimator of the weighted variance.  
 
 
where the sum is taken over numeric values, $\bar{x}$ is the weighted mean, and where $\sum_i w_i = 1$. If [[Sum]](w,i)<>1, the w's are normalized, so that the sum over numeric values is 1. This is an unbiased estimator of the weighted variance.  
 
 
 
When $w_i$ is constant, this simplifies to
 
 
 
${\sum_i (x-\bar)^2} \over {N-1}$
 
where $N$ is the number of points (sampleSize when I is Run).
 
 
 
When one or more points with non-zero weight in x are INF or -INF, Variance will return INF if min(x)<max(x), or INF if min(x)=+INF or max(x)=-INF. If there are fewer than two numeric points with positive weight, Variance returns NAN. Any point with zero weight is ignored, so that INF or NAN values don't cause the result to become NAN if they are given a zero weight.
 
 
 
= Computing Covariance =
 
 
 
(first exposed in build 4.0.0.41)
 
 
 
When each data point is a vector, a generalization of the variance is the covariance, represented by a square symmetric positive-definite matrix, which captures the pairwise variation of the components in the vector, with the component variances along the diagonal.  This is similar to a [[Correlation]] matrix (which can be obtained using the [[Correlation]] function).  In fact, each element of a [[Correlation]] matrix is the covariance divided by the row and column standard deviations.  The [[Gaussian]] distribution function expects a covariance matrix.
 
 
 
When computing a covariance matrix, your data will be at least two dimensional.  Since each data point is a vector, one index corresponds to this vector dimension.  The Variance function refers to this as the CoVarDim.  The second index corresponds to the data points themselves, and is usually Run, but will be a different index when computing covariance from historical data.
 
 
 
The resulting covariance matrix is square and two-dimensional, and thus needs a copy of the CoVarDim index.  Therefore, before computing the covariance matrix, you must create a second index as a copy of your vector dimension.  Then, provide these two indexes using the two optional parameters, CoVarDim and CoVarDim2 to Variance, as in this example:
 
 
 
Index K2 := K;
 
Variance( X, CoVarDim:K, CoVarDim2:K2 )
 
 
 
In this example, the running index is implicitly [[[Run]].  If a different running index, I, is used, then the syntax is:
 
 
 
Index K2 := K;
 
Variance( X,I, CoVarDim:K, CoVarDim2:K2 )
 
  
The CoVarDim2 parameter can be omitted.  When omitted, a local index is automatically created, with a name formed by appending "_2" to the first index.
+
When ''w<sub>i</sub>'' is constant, this simplifies to  
  
== Weighted Covariance ==
+
:<math>{\sum_i (x-\bar)^2} \over {N-1}</math>
  
The Variance function can be used to compute the covariance when each data point is non-uniformly weighted.  The weight can be specified using the optional w parameter, or if not specified, the [[SampleWeighting]] system variable providing the weightings when the running index is Run..  A non-uniform weighting should be indexed by the running index.
+
where ''N'' is the number of points ([[sampleSize]] when «I» is [[Run]]).  
  
= See Also =
+
When one or more points with non-zero weight in x are [[INF]] or -[[INF]], Variance will return [[INF]] if ''[[Min]](x) < [[Max]](x)'', or [[INF]] if ''[[Min]](x) = +[[INF]]'' or ''[[Max]](x) = -[[INF]]''. If there are fewer than two numeric points with positive weight, Variance returns [[NaN]]. Any point with zero weight is ignored, so that [[INF]] or [[NaN]] values don't cause the result to become [[NaN]] if they are given a zero weight.
  
 +
== See Also ==
 
* [[Statistical Functions and Importance Weighting]]
 
* [[Statistical Functions and Importance Weighting]]
* [[SDeviation]], [[Skewness]], [[Kurtosis]].
+
* [[SDeviation]]
 +
* [[Skewness]]
 +
* [[Kurtosis]]
 
* [[Correlation]]
 
* [[Correlation]]
 +
* [[Covariance]]

Latest revision as of 21:51, 18 January 2016


Variance(X, I, w)

Computes variance of an uncertain quantity «X», or the «w»-weighted sample variance of a data set.

With multivariate samples (where data point is a vector), the Variance function can also be used to compute the covariance (or weighted covariance) matrix from a data set.

If «X» is an uncertain quantity, dependent on Analytica distribution functions, the variance is obtained using Variance(X).

«X» is evaluated in sample mode, and the variance along the Run index computed.

Optional parameters

I

Given a data set indexed by «I», the sample variance along index «I» is computed using Variance(X, I).

When the running index «I» is the system index Run (or not specified), the value of «X» is evaluated in Sample mode and the average value among numeric values computed. If the running index is anything other than Run, then «X» is evaluated in context.

W

The weighted variance computing by assigning a different "weight" to each point. The weight vector wt should be indexed by «I» (or by Run if «I» is not specified), and the weighted variance is computed using one of these forms

Variance(X, w: wt)
Variance(X, I, w:wt)

When the «w» parameter is not specified, and the running index «I» is either the Run index or is not specified, then the weighting defaults to the value in the system variable SampleWeighting.

The weighted variance is defined as

[math]\displaystyle{ {\sum_i w_i (x-\bar{x})^2} \over { \sum_i w_i (1-w_i) } }[/math]

where the sum is taken over numeric values, [math]\displaystyle{ \bar{x} }[/math] is the weighted mean, and where [math]\displaystyle{ \sum_i w_i = 1 }[/math]. If Sum(w, i) <> 1, the «w»'s are normalized, so that the sum over numeric values is 1. This is an unbiased estimator of the weighted variance.

When wi is constant, this simplifies to

[math]\displaystyle{ {\sum_i (x-\bar)^2} \over {N-1} }[/math]

where N is the number of points (sampleSize when «I» is Run).

When one or more points with non-zero weight in x are INF or -INF, Variance will return INF if Min(x) < Max(x), or INF if Min(x) = +INF or Max(x) = -INF. If there are fewer than two numeric points with positive weight, Variance returns NaN. Any point with zero weight is ignored, so that INF or NaN values don't cause the result to become NaN if they are given a zero weight.

See Also

Comments


You are not allowed to post comments.