KeelinCoefficients

Revision as of 20:52, 31 March 2017 by Lchrisman (talk | contribs)


KeelinCoefficients( xi, pi, I, K, lb, ub, nTerms, flags )

This fits a Keelin distribution, also known as a MetaLog distribution, to data, or to a set of («xi», «pi») fractile - percentile level pairs, and returns a vector of coefficients indexed by «K». The vector of coefficients can be a much shorter description of the distribution than the data itself. This vector of coefficients can then be passed to the functions Keelin, DensKeelin, CumKeelin and CumKeelinInv, reducing the computation time required by those functions.

The Keelin distribution is a versatile continuous distribution that can assume the shape of almost any standard unbounded, semi-bounded on bounded continuous distribution. If you have univariate continuous data and don't know what distribution to use to model that data, with no reason to believe from first principles that the data needs to be of a particular distribution class, then the Keelin distribution is likely to be a good choice. There is no need to figure out whether your data best matches a LogNormal, Gamma, Beta or some other distribution type -- if it does happen to match one of those closely, the Keelin will usually find the same shape; however, it is capable of virtually the entire space of Skewness/Kurtosis combinations, and can even sometimes discover meaningful multimodal distinctions.

The Keelin distribution is introduced in the paper:

Parameters

  • «xi»: This can be either: (1) A representative sample of data points, with «pi» omitted, (2) a collection of fractile estimates (corresponding to the fractile levels in «pi»), or (3) a Keelin coefficient vector with the «flags»=1 bit set. In all cases, «xi» must be indexed by «I».
  • «pi»: (Optional): The fractile levels for the values in «xi», also indexed by «I». For example, when «pi» is 0.05, the corresponding value if «xi» is the 5th percentile.
  • «I»: (Optional): The index of «xi» and «pi». This can be omitted when either «xi» or «pi» is itself an index.
  • «lb», «ub»: (Optional) Upper and lower bounds. Set these if you know in advance that your quantity is bounded. When neither is specified, the distribution is unbounded (i.e., with tails going to -INF and INF). When one is set the distribution is semi-bounded, and when both are set it is fully bounded.
  • «nTerms»: (Optional) The number of basis terms used for the fit. This should be 2 or greater. See #Number of terms below.
  • «flags»: (Optional) 0 = Return coefficients. 2 = Return basis (see #Returning the basis).

To use

To use this function, you should created two indexes to pass to «I» and «K». Your «I» index indexes your data points. The «K» index will be used for the result, and typically its length determines the number of basis terms used (see Number of terms). The result is indexed by «K». If you omit «K», the function will create a local index named .K.

In some cases you may want to create a "panel" of distributions, where you fit the same data, but vary the number of basis terms. Since you will likely want these is a single array, you want them to share the same «K» index, even though the number of terms varies. In this case, you should make your «K» index long enough for the largest basis, and then pass the «nTerms» parameter explicitly (usually you will pass it a vector, varying «nTerm» across yet another index). For example:

Index NumTerms := [5, 10, 15, 20]
Index K := 1..20
Variable Coef := KeelinCoefficients( xi, pi, I, K, nTerms:NumTerms )

In this case, the result is null-padded.

Your distribution data will be in one of two forms:

  • A representative sample of points for your quantity, «xi». In this case, omit the «pi» parameter.

or

  • A set of ( «xi», «pi » ) fractile - fractile_level pairs. This is also equivalent to specifying points on the Cumulative Probability curve.

The first case is equivalent to the second case, when the «pi»s are evenly spaced.

The result

The result of the function is a coefficient vector indexed by «K». This vector can then be passed directly to any of the Keelin-distribution functions, namely:

In all these cases, the vector returned from KeelinCoefficients is passed as the «xi» parameter, and you «K» index must be passed as the index parameter «I» of these functions. Also, you must pass the 1 bit to the «flags» parameter. All of these functions name data parameter «xi», but they all also support an alias name for that parameter of «ai», so that you have an option of passing the coefficients using the named parameter convention using ai: to emphasize that these are coefficients, like this:

Keelin( ai: a, I:K, flags:1 )

We don't recommend spending much time trying to interpret the coefficients. The first coefficient will always be the median of the distribution, but from there the others are less obvious. The second coefficient tends to track to Variance, the third is tends to track Skewness and the fourth tends to track Kurtosis. They are not, however, these actual moments. It is possible to compute all moments of the distribution directly from these parameters, see the Keelin (2016) reference, cited above.

Bounds

When your quantity is unbounded, its distribution will have tails in both direction. In this case, you should omit the «ub» and «ub» parameters. If you know your quantity is bounded from below, then specify «lb», and if you know that your quantity is bounded from above, specify «ub». The distribution supports all combinations of unbounded, bounded and semi-bounded distributions in this way.

When you compute the coefficients with a particular combination of «lb» and «ub», you must specify the same «lb» and «ub» parameters when passing these coefficients to Keelin, DensKeelin, CumKeelin, or CumKeelinInv.

Returning the basis

Unless you are doing research on the Keelin distribution itself, you probably won't have a reason to access the basis. But, if you have a need, you can use this function to return the "basis" for the distribution. This is a 2-D matrix indexed by «I» and «J» and is a function of «pi», but does not depend of «xi». For example:

Index I := 1..9
Variable Percentile := Table(I)(0,001, 0.01, 0.1, 0.25, 0.5, 0.75, 0.9, 0.99, 0.999)
Variable Estimate := Table(I)( ... )
Index K := 1..8 { 8 basis terms }
Variable coeffs := KeelinCoefficients( Estimate, Percentile, I, K )
Variable sampBasis := KeelinCoefficients( Sample(Uniform(0,1)), Run, K, flags:2 )

For an unbounded Keelin MetaLog, the values can be obtained from the basis and coefficients using

Variable Samp := Sum( coeffs * sampBasis, K )

See Also

Comments


You are not allowed to post comments.