# Correlation

Release: |
---|

## Correlation(X, Y*, I, w, resultIfNoVariation*)

Computes an estimate of the correlation or weighted correlation between two uncertain quantities «X» and «Y». The correlation is a measure of the degree of linear dependence and ranges between -1 and 1. A correlation of 0 indicates that the two quantities appear to be independent (although if there is a non-linear relationship, this is not necessarily the case), a positive value indicates that they tend to increase together, while a negative correlation indicates that an increase in one quantity tends to be accompanied by a decrease in the other.

When «X» and «Y» are both uncertain quantities, the correlation is computed by Correlation(X, Y).

If there is no variation in «X» or «Y», the result is NaN, unless you specify the optional parameter «resultIfNoVariation» which is returned in this case.

## Correlation of data

If you have a data set containing two variables, `A`

and `B`

, where data points are indexed by `J`

, the correlation of `A`

and `B`

is computed using

`Correlation(A, B, J)`

Here `J`

is referred to as the *running index*.

If you have an array in which you want to find the correlation of two columns, then you will apply the subscript operator to extract each column. For example, the following computes the correlation between historical revenue in 2002 and 2003 (where data points are indexed by `J`

).

`Correlation(HistoricalRevenue[Year = 2002], HistoricalRevenue[Year = 2003], J)`

## Weighted Correlation

Unweighted correlation treats all data or sample points as equally weighted. Weighted correlation computes the correlation when each data point may have a different weight. The optional «w» parameter may be used to specify a weight, which should be indexed by the running index (or by Run if no running index is specified). For example, the following specifies an importance weight:

`Correlation(X, Y, w: sampleImportance)`

The global sample weighting, specified by the system variable SampleWeighting, is used by default.

## Computing a sample correlation matrix

Suppose each sample point is a vector along index `I`

. A correlation matrix is a 2-D square symmetric matrix where each element `(m, n)`

indicates the correlation of column `I = m`

and column `I = n`

. To compute a sample correlation matrix for a given set of data, create a second index, `I2`

as a copy of `I`

:

`Index I2 := CopyIndex(I)`

With this index, the correlation matrix, index by `I`

and `I2`

, is computed from data `X`

using:

`Correlation(X, X[I = I2])`

Or if data points are listed along an index other than Run, say `J`

, this would be:

`Correlation(X, X[I = I2], J)`

## Mathematical details

Weighted correlation is given by

- [math]\displaystyle{ {\sum_i { w_i \hat{x}_i \hat{y}_i } \over {\sqrt{\sum_i w_i \hat{x}_i^2 \sum_j w_j \hat{y}_j^2} } } }[/math]

## Null values

When either parameter contains a Null value, that data points is ignored. The computed Correlation is then based only on those points with non-Null values. Note that for certain statistical significance tests, you may need to adjust the degrees of freedom used to match the number of non-Null values.

You cannot compute the Correlation when there is only a single data point. When only one data point is supplied, or when there is only one data point that is non-Null in both «x» and «y» parameters, the result of Correlation is NaN.

When there are zero data points, or when every data point is Null in one or the other of «x» or «y», then the result of Correlation is Null. (*Note*: Prior to Analytica 4.4, the result in this case was NaN. The change to Null in 4.4 makes the handling of Null by Correlation() consistent with other Analytica functions.)

## See Also

- SampleCorrelation
- RankCorrel -- the Rank Correlation function
- Variance
- Covariance
- SampleCovariance -- multi-dimensional case, covariance matrix
- Statistical Functions and Importance Weighting

Enable comment auto-refresher

Pdavis2