# Correlate With

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

## Correlate_With(S, ReferenceS, rankCorr)

Used to specify a distribution having a specified rank correlation with a reference distribution. «S» is the marginal distribution of the result. «ReferenceS» is the reference distribution.

Reorders the samples of «S» so that the result is correlated with the reference sample with a rank correlation close to «rankCorr».

## Library

Multivariate Distributions library functions (Multivariate Distributions.ana)

## Example

To generate a LogNormal distribution that is highly correlation with Ch1 (which may be any distribution), use e.g.:

Correlate_With(LogNormal(2, 3), Ch1, 0.8)

## Notes

Most commonly, when the term "correlation" is used, it is implied to mean Pearson Correlation, which is essentially a measure of linearity. Creating a distribution with this measure of correlation makes most sense when the joint distribution is Gaussian, i.e., each marginal distribution is Normal. In this case, you can specify the mean and variance of each variable, and the covariance for each pair of variables, and use the Gaussian function (found in the Multivariate Distribution library) to define the joint distribution. The covariance of two random variables is the correlation of the two variables times the product of their standard deviations, so the Gaussian can be defined directly in terms of Pearson Correlations. The BiNormal function may also be used when defining a 2-D Gaussian.

For non-Gaussian distributions, it is not necessarily possible for two distributions to have a desired Pearson correlation. However, we can ensure a given Rank Correlation, also called Spearman correlation. This is what Correlate_With and Correlate_Dists use.

Correlate_With is the most convenient way for specifying two univariate distributions with a given rank correlation. If you have three or more distributions that are mutually correlated, then you will need a symmetric matrix of rank correlations, and will need to use the Correlate_Dists function.

## Precision

The actual sample rank correlation of the sample generated will differ slightly from the requested rank correlation, due to the fact that the samples have a finite number of points. This sampling error reduces as you increase sample size. The standard deviation of this sampling error (i.e., of the difference between the sample rank correlation and the requested rank correlation) is

$\displaystyle{ se = \sqrt{ {1-rc^2}\over{n-2} } }$

where n is the sample size and rc is the requested rank correlation. For example, when using a sample size of n = 100 and rc = 0.7, we expect a sampling error in actual rank correlation of 0.07. There is therefore about 68% chance that the rank correlation of the samples will be between 0.63 and 0.77, but also a 5% chance it might be less than 0.56 or greater than 0.84.