Keelin (MetaLog) distribution/Ttail constraints

The Keelin distribution (as published) fits a curve to data points using ordinary least squares regression to obtain the quantile function for the distribution. But with this approach of using only ordinary least squares, a common problem occurs in which the tails of the quantile function reverse, with the left tail going right, or the right tail doing left, which of course is nonsense. The following figure shows an example, with the X & Y axes pivoted to match a conventional CDF curve.

Analytica's Keelin distribution deviates from the standard by adding the constraints that the tails go in the correct direction to the regression (i.e., doing a constrained regression), so that the Keelin distributions from Analytica differ from results in other software in the cases where the tails would go in nonsense directions. Of course, if you really want the nonsensical behavior, you can disable the use of these constraints by specifying the 32 bit in the optional «flags» parameter of each function.

This page explains the internal details of the algorithm that enforces tail constraints.

Tail constraints

The Keelin quantile function has the form:

[math]\displaystyle{ M(p) = f(p) + g(p) logit(p) }[/math]

where f(p) and g(p) are polynomials. In order to ensure that the tails point in the correct direction, we need to enforce these constraints during the regression:

[math]\displaystyle{ \lim_{p\rarr 0} M'(p) \gt 0 }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} M'(p) \gt 0 }[/math]

where

[math]\displaystyle{ M'(p) = {{d M(p)}\over{d p}} = f'(p) + g'(p) logit(p) + g(p) logit'(p) }[/math]

Note that:

[math]\displaystyle{ \lim_{p\rarr 0} logit(p) = -\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} logit(p) = +\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 0} logit'(p) = +\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} logit'(p) = +\infty }[/math]

Considering only the left tail ([math]\displaystyle{ p\rarr 0 }[/math]), we have that the left tail goes in the correct direction when

[math]\displaystyle{ g(0)\gt 0 }[/math]
[math]\displaystyle{ g(0)=0 }[/math] and [math]\displaystyle{ g'(0)\lt 0 }[/math]
[math]\displaystyle{ g(0)=0 }[/math] and [math]\displaystyle{ g'(0)=0 }[/math] and [math]\displaystyle{ f'(0)\ge 0 }[/math]

The first bullet comes directly from [math]\displaystyle{ M(0)\rarr -\infty }[/math]. The next two follow directly from [math]\displaystyle{ M'(0)\gt 0 }[/math]. Symmetrically, the right tail goes in the correct direction when:

[math]\displaystyle{ g(1)\gt 0 }[/math]
[math]\displaystyle{ g(1)=0 }[/math] and [math]\displaystyle{ g'(1)\gt 0 }[/math]
[math]\displaystyle{ g(1)=0 }[/math] and [math]\displaystyle{ g'(1)=0 }[/math] and [math]\displaystyle{ f'(1)\ge 0 }[/math]

Constrained regression

These articles detail how ordinary least squares regression can be augmented to enforce a finite number of inequality constraints:

Theil & Van de Panne (1960), "Quadratic programming as an extension of classical quadratic maximization.", Management Science, 7:1-
Gilberto A. Paula (1999), "Leverage in inequality-constrained regression models", 48(4):529-538. (See especially the first two pages)

Single tail constraint

To understand how it works, for simplicity, lets start first by enforcing only one tail constraint, in this case that the left tail goes in the correct direction. There are the steps:

Do an ordinary least squares fit. Denote the resulting coefficients as [math]\displaystyle{ \hat{a} }[/math].
Check whether [math]\displaystyle{ g(0)\gt 0 }[/math]. If so, we're done, [math]\displaystyle{ a = \hat{a} }[/math].
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce the constraint [math]\displaystyle{ g(0)=0 }[/math]. Denote these adjusted coefficients as [math]\displaystyle{ \tilde{a}_1 }[/math].
Check whether [math]\displaystyle{ g'(0)\lt 0 }[/math]. If so, we're done. [math]\displaystyle{ a = \tilde{a}_1 }[/math]
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce both constraints, [math]\displaystyle{ { g(0)=0, g'(0)=0} }[/math]. Denote the result as [math]\displaystyle{ \tilde{a}_2 }[/math]
Check whether [math]\displaystyle{ f'(0)\ge 0 }[/math]. If so, we're done, [math]\displaystyle{ a = \tilde{a}_2 }[/math]
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce the three constraints [math]\displaystyle{ { g(0)=0, g'(0)=0}, f'(0)=0 }[/math] to obtain [math]\displaystyle{ \tilde{a}_3 }[/math]. This is the solution.

This adjustments in the above steps require only a few matrix operations, not a new OLS fit. Let

[math]\displaystyle{ \hat{a} }[/math] = Regression( x, B(y), I, K )

where [math]\displaystyle{ B }[/math] is the Keelin basis for the percentiles assigned to data x, I is the data index, and [math]\displaystyle{ K }[/math] is the basis index (with n elements for n terms). We then express each of our constraints that we want to enforce in matrix form:

[math]\displaystyle{ C a = 0 }[/math]

where [math]\displaystyle{ C }[/math] is a qxn matrix (q=# of constraints, n=# of terms). The adjusted coefficients are

[math]\displaystyle{ \tilde{a} = -( C (B^T B)^{-1} C^T )^{-1} C \hat{a} }[/math]

This adjustment is the one applied in steps 3, 5 and 7 above. It is also worth noting that the intermediate matrix [math]\displaystyle{ (B^T B)^{-1} }[/math] needs to be computed only a single time, and indeed is usually computed during the initial OLS step.