RankCorrel

Release:	… • 6.0 • 6.1 • 6.2 • 6.3 • 6.4 • 6.5 • 6.6

RankCorrel(x, y, I, rankType, w, resultIfNoVariation)

RankCorrel(x, y) computes the Spearman rank correlation between two uncertain quantities «x» and «y».

RankCorrel(x, y, I) computes the Spearman rank correlation between two paired sets of data points, «x» and «y», sharing a common index «I».

Definition

The Spearman rank correlation coefficient is defined to be the Pearson's Correlation Coefficient of the ranks of the data points. In other words, to compute, you find Rank(x, I) and Rank(y, I), the ranks of the data points, and then compute the standard Correlation between these ranks. As an Analytica expression, this is written:

RankCorrel(x, y, I) := Correlation(Rank(x, I), Rank(y, I), I)

Interpretation

Rank correlation is a measure of how monotonic the relationship between two quantities appears to be. The measure is between -1 and 1, with larger negative numbers indicating a strong negative monotonic tendency, larger positive numbers indicating a stronger positive monotonic tendency, and numbers near zero indicating little or no monotonic relationship. For example, a large positive number (meaning close to 1) indicates that increases in one quantity tends to be associated with increases in the other quantity. A large negative number (meaning close to -1) indicates that increases in one quantity tends to be associated with decreases in the other quantity.

The expected rank correlation between two statistically independent quantities, «x» and «y», is zero. The actual computed rank correlation may differ from zero slightly due to sampling error, but would be expected to be very close to zero, and to approach zero as the sample size increases. Rank correlation is often used to test for statistical dependence, but you need to be careful with your conclusions, since while a (statistically significant) non-zero rank correlation does imply statistical dependence, a zero rank correlation does not necessarily imply statistical independence. For example, in the relation y = x², sampled uniformly from «x» in the interval [-a, a], the two quantities «x» and «y» are clearly dependent, yet the expected rank correlation will zero.

Given a set of uncertain inputs, x₁, x₂, .., x_N, and a computed output, y = f(x₁,.., x_N), the absolute value of the rank correlation, Abs(RankCorrel(x_i, y)), provides a good measure of the relative degree to which the uncertainty in input x_i contributes to the uncertainty in output «y». This measure is used by Analytica's Make Importance sensitivity analysis feature.

Rank correlation is often described as nonparametric correlation. This is because the correlation does not assume any functional form for the relationship between the quantities, other than the more general concept that they are statistically related by some monotone function. In contract, Pearson's correlation assumes that «x» and «y» are related by a linear function with Gaussian noise added, in which there are clearly underlying parameters.

Rank Type

The optional «rankType» parameter controls how ties are treated. By default, when there are ties, they are assigned the same mid-rank (i.e., average rank among those points that are tied). The «rankType» can also be set to -1 (= lower ranks), 0 (= mid ranks (default)), +1 (= upper ranks), or null (= unique ranks). See the Rank function for more details. The mid-rank is almost always used for RankCorrel, so alternative values were are unusual.

You should note that the Rank function, in contrast, uses lower-rank by default, but also accepts these same options. So strictly speaking, when ties are present, the real equivalent is:

RankCorrel(x, y, I) := Correlation(Rank(x, y, I, type: 0), Rank(x, y, I, type: 0), I)

Early versions of Analytica (Analytica 3.1 and earlier) used the lower-rank for ties when computing RankCorrel.

Weighted Rank Correlation

The function can also be used to compute the weighted rank correlation, in which each data point (or uncertain sample) is assigned a different weighting, contributing unequally to the rank correlation coefficient.

When computing the weighted rank correlation between two uncertain quantities, you need to provide a weighting for each Monte Carlo sample. This weighting is in the form of an array indexed by Run, where each point indicates the relative importance of the point. In Bayesian analysis, the weighting is often equal to the likelihood of observed data given the uncertain inputs (so that computed values are posterior values). You can specify this weighting explicitly using the «w» parameter, e.g.:

RankCorrel(x, y, w: likelihood)

Or, if «w» is not specified, the weighting specified by the global system variable SampleWeighting is used.

To compute the weighted rank correlation between two paired data sets, you need to providing a weighting for each pair in the form of an array of relative weights indexed by «I». This array is then passed in the «w» parameter, e.g.:

RankCorrel(x, y, I, w: my_wts)

Result if there is no variation

If either «X» or «Y» has no variation (among points with non-zero weights), the result is NaN.

If you prefer a different value for this case, such as 0 or Null, specify it in the optional parameter «resultIfNoVariation»..

Null Values

Any data points having a Null value for «x» or «y» are ignored in the computation of RankCorrel.

Note that you must have at least 3 non-Null data points to get a meaningful rank correlation coefficient. When you have fewer than 3 points, the result is NaN.

Statistical Significance

A non-zero RankCorrel indicates that a statistical dependence exists between «x» and «y». But, with a finite sample size, the non-zero value may be nothing more than an artifact of sampling error. A p-value quantifies the probability that a rank correlation as large as the value observed could have been observed if the quantities really are statistically independent. The p-Value is a standard measure of statistical significance, with a small value (e.g., <5%) indicating that the evidence for statistical dependence is significant.

The p-Value for statistical dependence can be computed using a formula devised by Fisher as follows:

Var n := Sum(x <> Null and y <> Null, I);

Var rc := RankCorrel(x, y, I);

Var z := 0.4856*Ln((1 + rc)/(1 - rc)) * Sqrt(n - 3);

2*CumNormal(z)

This is known as a 2-sided p-Value, meaning it measure significance of the departure from 0 in either direction. You can also test for one-sided significance -- e.g., the significance of negative dependence, or the significance of positive dependence only, by changing the last line to:

CumNormal(z) { For negative dependence }

CumNormal(-z) { For positive dependence }

Generally you would not be justified in using a 1-sided test unless you has a priori reason to believe that the only two possibilities are statistical independence or dependence in the given direction.

Distribution of Expected Rank Correlation

Note: This topic is explained in more detail in the webinar on Rank Correlation by Lonnie Chrisman.

RankCorrel(x, y) computes the sample rank correlation -- the measure that exists among the finite set of data points. This provides information about the true underlying expected rank correlation, but because of the finite sample size, the presence of sampling error means that the two are not equal. When computing rank correlation, we are really interested in what the underlying expected rank correlation is, which of course cannot be known with certainty to us mere mortals. However, we can use Monte Carlo techniques to obtain a distribution on the underlying expected rank correlation, with the distribution reflecting our degree of uncertainty. Larger sample sizes will result in tighter (lower-variance) distributions.

The following function generalizes RankCorrel, so that in Mid-mode, the sample rank correlation is returned, while in probabilistic evaluation mode, a distribution for the underlying expected rank correlation is returned.

Function RankCorrel_Dist(x, y: ContextSamp[I]; I: Index = Run) :=

Index xy := ['x', 'y'];

Var sampleRc := RankCorrel(x, y, I);

If IsSampleEvalMode Then (

Var pt := BiNormal(0, 1, xy, sampleRc, over: I);

RankCorrel(pt[xy = 'x'], pt[xy = 'y'], I)

)Else

sampleRc

Library

Sensitivity Analysis library (Sensitivity Analysis Library.ana)

The library is not included with Analytica and needs to be downloaded and installed before it can be used.