GetFract

Revision as of 18:52, 1 February 2018 by KMullins (talk | contribs) (→‎GetFract(x, p, I, w, discrete, domain))
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


GetFract(x, p, I, w, discrete, domain)

Computes an estimate of the «p»th fractile from sample «x». The «p»th fractile is the value of «x» for which the actual value has a probability of «p» of being less-than or equal to «x». The median is the p = 50% fractile, the median of a value is obtained as:

GetFract(x, 0.5)

Optional parameters

I

If the index «I» is specified, GetFract computes the «p»th fractile of a data set «x» along the index «I». If «I» is not specified, the fractile is computed along the Run index.

W

A weighting on the data points. GetFract can compute the percentile based on an unequal weighting, such that some points carry more weight than others. If specified, «w» should be indexed by «I» (or by Run, if «I» is omitted). When «w» and «I» are both omitted, GetFract uses the global weighting specified by the system variable SampleWeighting.

Discrete

A boolean parameter that explicitly controls whether the data is treated as discrete or continuous. Without this, various heuristic rules are used.

Domain

This is seldom used by Analytica modelers, although it could be in rare cases. Analytica itself makes use of this parameter internally. A variable containing a domain attribute can be specified here to indicate the set of possible values. Analytica can use that to figure out whether the value is discrete or continuous, and in the discrete case, it will use this domain to determine the ordering between possible values (which impacts the resulting percentile value).

The optional parameter, «domain», is almost never directly specified by an end-user (although it can be). Its main use is so GetFract can access the expression specified as the first parameter, when attempting to ascertain the domain of «x». It also determines the ordering of possible values in the discrete case.

Details and more examples

GetFract behaves somewhat differently depending on whether «x» is discrete or continuous. [[GetFract] determines whether «x» is discrete or continuous as follows. First, if «x» is not totally numeric, then it is treated as discrete. If «x» is numeric and the discrete parameter is specified, then it is treated as discrete if discrete = True, and as continuous if code>discrete = False. Next, if a variable identifier is specified for domain parameter, or if the domain parameter is omitted and a variable identifier appears as the first parameter to the function, then that variable's domain attribute is consulted. If the domain is set to continuous, then «x» is treated as continuous. If the domain is discrete, an explicit list or list of labels, or an index, then it is treated as discrete. Otherwise, «x» is treated as continuous.

When «x» is discrete, GetFract requires an ordering on the domain of possible value. It obtains that ordering as follows. If the «domainObj» parameter is a variable identifier, or if the expression sent as the first parameter to GetFract is a variable identifier, then the domain attribute of that variable is consulted. If the domain is an explicit list, list of strings, or index, the ordering that appears in that list or evaluated index is the assumed ordering. Otherwise, the standard Analytica sort order is applied to the values that appear in «x», with the "smallest" value being the zeroth fractile. Thus, strings will generally be in English lexical order, etc.

When GetFract computes the «p»th fractile for a continuous domain, it assumes the samples are discrete points on a continuous space, and assigns an estimate of the fractile level at each sample as

[math]\displaystyle{ p_i = \alpha_i \hat w_i + \sum_{j\lt i} \hat w_j }[/math]

where

[math]\displaystyle{ \alpha_i=\frac{i-1}{N-1}, \hat w_i =w_i/ \sum_i w_i }[/math]

Given this set of (xi, pi) points, GetFract linearly interpolates to determine the fractile level for «p».

When the fractile is estimated from a sample using this method, the range of values will tend to be "squeezed" relative to the theoretical inverse CDF of the distribution. This is because GetFract has no way of knowing from the sample whether there are infinite tails to the distribution, or how far the distribution extends beyond the mininum and maximum points found in the distribution. Therefore, the inverseCDF recovered by GetFract is essentially the original distribution with the tails truncated at the minimum and maximum points that occur in the sample. This distortion gets smaller with increasing sample size. It can also be minimized by using a weighted sampling distribution that includes samples farther out on the tails, weighting them accordingly.

When GetFract is applied to a discrete domain, the domain ordering is determined as discussed above, and the points occurring in «x» are sampled. If d_1, d_2, ..., d_m are the ordered domain values, then the value di is returned such that

[math]\displaystyle{ \sum_{j\lt i} \hat w_i \lt p \lt \sum_{j \le i} \hat w_j }[/math]

When the data «x» contains fewer than 2 numbers, GetFract returns NaN. When there is exactly 1 number, it seems like it really should just return that value, independent of «p», and so this case may change in the future.

To remove the bias from "squeezing" discussed above, you can use this alternative algorithm:

Function GetFractile(S: ContextSamp[R]; p: scalar; R: Index = Run)
Definition:
Var n := Size(R);
Var adj_p := (n*p - 0.5)/(n - 1);
GetFract(S, adj_p, R)

However, this variation returns meaningful results only when 0.5/sampleSize ≤ p ≤ 1-0.5/sampleSize (see SampleSize. Outside of that range, it is extrapolating in a fashion that may or may not be appropriate for your distribution.

The inverse of GetFract(x, p) is Probability(x <= x0).

See Also

Comments


You are not allowed to post comments.