Wilcoxon Distribution
Requires Analytica 4.5
Wilcoxon(m,n,exact)
The Wilcoxon distribution is a discrete, bell-shaped, non-negative distribution, which describes the distribution of the U-statistic in the Mann-Whitney-Wilcoxon Rank-Sum test when comparing two arbitrary distributions that are the same. The rank-sum test is perhaps the most commonly used non-parametric significance test in statistics to detect when one distribution is stochastically greater (or not-equal) to another without making assumption that the underlying distributions are normally distributed.
The Wilcoxon distribution function in Analytica returns a random Sample from the Wilcoxon distribution (or the Mid-value when evaluated in Mid-mode. When performing a rank-sum statistical test, the related functions CumWilcoxon can be used to compute the p-Value, or CumWilcoxonInv to compute the rejection threshold for a given significance level. ProbWilcoxon gives the probability density analytically (i.e., without using a Monte Carlo sample). Random(Wilcoxon(m,n))
can be used to generate single random variates.
The distribution is parameterized by two non-negative numbers: «m» and «n». In a rank-sum test, these correspond to the sample sizes of the data measured from each of the two populations.
Library
Distributions Library (all Wilcoxon functions are built-in functions)
The U-Statistic
Suppose you are given m observations from one population and n observations from a second population. It is assumed that the observations are ordinal (i.e., have a natural ordering, or less-than relationship). Because they are ordered, you can determine the rank of every observation among all m+n observations. The smallest observation is assigned a rank of 1, and the largest a rank of m+n.
For example, suppose your observations consistent of numeric measurements, and you have observed the following measurements:
- From Population 1:
[12.3, 2.3, 8.3]
- From Population 2:
[2.4, 18.1, 1.3, 5.5]
The ranks would be:
- Population 1 ranks:
[6,2,5]
- Population 2 ranks:
[3,7,1,4]
The U-statistic is based entirely on the ranks, rather than on the actual observed values. This eliminates any dependence on a specific distribution type. Let [math]\displaystyle{ R_1 }[/math] be the sum of the ranks in Population 1. The U-statistic is defined as:
- [math]\displaystyle{ U=R1 - {{m(m+1)}\over 2} }[/math]
In the example, [math]\displaystyle{ R_1=13 }[/math] and [math]\displaystyle{ U=7 }[/math].
Computation Time and Memory
The Wilcoxon distribution can require large amounts of time and memory to compute (this is true of all the functions, Wilcoxon, ProbWilcoxon, CumWilcoxon and CumWilcoxonInv, especially when «m» and «n» get large. However, at the same time, as «m» and «n» get large, the distribution approaches a Normal distribution. Hence, the functions automatically switch over to a Normal-approximation when the sum of «m» and «n» exceeds 100. At that point, the accuracy of the error for ProbWilcoxon or CumWilcoxon tends to be 0.1% or less (this is just by observation, not a proven bound). You can explicitly control when the exact or approximate computation is used by specifying the boolean «exact» parameter. When specified as true, the exact algorithm is used (which can easily exhaust memory or take an exhorbitant amount of time for very large values). You can switch over to the approximation sooner to save on time and memory by specifying an expression for the «exact» parameter, such as:
ProbWilcoxon(m,n,exact:m+n>50)
Enable comment auto-refresher