Keelin (MetaLog) distribution/Ttail constraints

The Keelin distribution (as published) fits a curve to data points using ordinary least squares regression to obtain the quantile function for the distribution. But with this approach of using only ordinary least squares, a common problem occurs in which the tails of the quantile function reverse, with the left tail going right, or the right tail doing left, which of course is nonsense. The following figure shows an example, with the X & Y axes pivoted to match a conventional CDF curve.

Analytica's Keelin distribution deviates from the standard by adding the constraints that the tails go in the correct direction to the regression (i.e., doing a constrained regression), so that the Keelin distributions from Analytica differ from results in other software in the cases where the tails would go in nonsense directions. Of course, if you really want the nonsensical behavior, you can disable the use of these constraints by specifying the 32 bit in the optional «flags» parameter of each function.

This page explains the internal details of the algorithm that enforces tail constraints.

Tail constraints

The Keelin quantile function has the form:

[math]\displaystyle{ M(p) = f(p) + g(p) logit(p) }[/math]

where f(p) and g(p) are polynomials. In order to ensure that the tails point in the correct direction, we need to enforce these constraints during the regression:

[math]\displaystyle{ \lim_{p\rarr 0} M'(p) \gt 0 }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} M'(p) \gt 0 }[/math]

where

[math]\displaystyle{ M'(p) = {{d M(p)}\over{d p}} = f'(p) + g'(p) logit(p) + g(p) logit'(p) }[/math]

Note that:

[math]\displaystyle{ \lim_{p\rarr 0} logit(p) = -\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} logit(p) = +\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 0} logit'(p) = +\infty }[/math]

[math]\displaystyle{ \lim_{p\rarr 1} logit'(p) = +\infty }[/math]

Considering only the left tail ([math]\displaystyle{ p\rarr 0 }[/math]), we have that the left tail goes in the correct direction when

[math]\displaystyle{ g(0)\gt 0 }[/math]
[math]\displaystyle{ g(0)=0 }[/math] and [math]\displaystyle{ g'(0)\lt 0 }[/math]
[math]\displaystyle{ g(0)=0 }[/math] and [math]\displaystyle{ g'(0)=0 }[/math] and [math]\displaystyle{ f'(0)\ge 0 }[/math]

The first bullet comes directly from [math]\displaystyle{ M(0)\rarr -\infty }[/math]. The next two follow directly from [math]\displaystyle{ M'(0)\gt 0 }[/math]. Symmetrically, the right tail goes in the correct direction when:

[math]\displaystyle{ g(1)\gt 0 }[/math]
[math]\displaystyle{ g(1)=0 }[/math] and [math]\displaystyle{ g'(1)\gt 0 }[/math]
[math]\displaystyle{ g(1)=0 }[/math] and [math]\displaystyle{ g'(1)=0 }[/math] and [math]\displaystyle{ f'(1)\ge 0 }[/math]

Constrained regression

These articles detail how ordinary least squares regression can be augmented to enforce a finite number of inequality constraints:

Theil & Van de Panne (1960), "Quadratic programming as an extension of classical quadratic maximization.", Management Science, 7:1-
Gilberto A. Paula (1999), "Leverage in inequality-constrained regression models", 48(4):529-538. (See especially the first two pages)

Our algorithm builds off of the results from these papers, namely:

Theorem 1: Given the solution to the unconstrained least-squares regression, and a matrix C, the constrained regression under the constraint C a >= 0 can be obtained be adding an adjustment that can be computed from a series of matrix manipulations. (I.e., the smallest least-squares error under the =0 constraints).

Theorem 2: Given inequality constraints C a >= 0, if the unconstrained solution does not satisfy the constraints, then in the optimal constrained solution at least one of the row constraints is exactly zero.

Single tail constraint

To understand how it works, for simplicity, lets start first by enforcing only one tail constraint, in this case that the left tail goes in the correct direction, without enforcing the right tail constraint. There are the steps:

Do an ordinary least squares fit. Denote the resulting coefficients as [math]\displaystyle{ \hat{a} }[/math].
Check whether [math]\displaystyle{ g(0)\gt 0 }[/math]. If so, we're done, [math]\displaystyle{ a = \hat{a} }[/math].
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce the constraint [math]\displaystyle{ g(0)=0 }[/math]. Denote these adjusted coefficients as [math]\displaystyle{ \tilde{a}_1 }[/math].
Check whether [math]\displaystyle{ g'(0)\lt 0 }[/math]. If so, we're done. [math]\displaystyle{ a = \tilde{a}_1 }[/math]
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce both constraints, [math]\displaystyle{ { g(0)=0, g'(0)=0} }[/math]. Denote the result as [math]\displaystyle{ \tilde{a}_2 }[/math]
Check whether [math]\displaystyle{ f'(0)\ge 0 }[/math]. If so, we're done, [math]\displaystyle{ a = \tilde{a}_2 }[/math]
Adjust [math]\displaystyle{ \hat{a} }[/math] to enforce the three constraints [math]\displaystyle{ { g(0)=0, g'(0)=0}, f'(0)=0 }[/math] to obtain [math]\displaystyle{ a = \tilde{a}_3 }[/math]. This is the solution.

This algorithm can be depicted graphically with this diagram:

The adjustments in the above steps require only a few matrix operations, not a new OLS fit. Let

[math]\displaystyle{ \hat{a} }[/math] = Regression( x, B(y), I, K )

be the unconstrained regression fit, where [math]\displaystyle{ B(y) }[/math] is the Keelin basis for the percentiles, y, assigned to data x, I is the data index, and [math]\displaystyle{ K }[/math] is the basis index (with n elements for n terms). We then express each of our constraints that we want to enforce in matrix form:

[math]\displaystyle{ C a = 0 }[/math]

where [math]\displaystyle{ C }[/math] is a qxn matrix (q=# of constraints, n=# of terms). The adjusted coefficients are

[math]\displaystyle{ \tilde{a} = \hat{a} + (B^T B)^{-1} C^T \Lambda }[/math]

where the Lagrange multiplier vector is

[math]\displaystyle{ \Lambda = -( C (B^T B)^{-1} C^T )^{-1} C \hat{a} }[/math]

This adjustment is the one applied in steps 3, 5 and 7 above. It is also worth noting that the intermediate matrix [math]\displaystyle{ (B^T B)^{-1} }[/math] needs to be computed only a single time, and indeed is usually computed during the initial OLS step.

The intermediate matrix, [math]\displaystyle{ C (B^T B)^{-1} C^T }[/math] is sometimes singular. To handle this the algorithm performs this inverse using singular value decomposition. The matrix is first decomposed into [math]\displaystyle{ C (B^T B)^{-1} C^T = U W V^T }[/math], where W is diagonal. The diagonal terms are either replaced by their reciprocal or by 0 when they are less than [math]\displaystyle{ \epsilon }[/math] to get [math]\displaystyle{ W^{-1} }[/math] Then the inverse is [math]\displaystyle{ (C (B^T B)^{-1} C^T)^{-1} = U W^{-1} V^T }[/math].

To enforce only the right tail constraint (without enforcing the left), the algorithm is basically the same (i.e., completely symmetrical), as depicted graphically here:

Enforcing both tail constraints simultaneously

The enforcement of the left tail constraint interacts with the enforcement of the right tail constraint, so you can't simply run these sequentially. For example, adding in the left-tail constraint g(0)=0 might correct a bad right-tail at the same time. Or, adding in the right-tail constraint g'(1)=0 after the left-tail constraint g(0)=0 has been enforced could render the left tail constraint unnecessary, which would result in a less than optimal fit.

The algorithm for enforcing both tail constraints simultaneously is depicted graphically as

If there are two nodes in this graph that both satisfy their tests (the ones listed at the far right of its column and at the bottom of its row), where one node is upstream of the other, the upstream node is a better fit to the data. Hence, we want the most upstream node that passes its tests. What this means is that as soon as we find a node that passes its tests, we can eliminate all other downstream nodes from consideration.

The algorithm starts at the top-left node with no constraints (which corresponds to unconstrainted regression). Using the coefficients obtained, it checks the tests to the far right of its row as well as to the bottom of its column. If the tests pass, then that node satisfies the constraints and the optimal solution has been found.

Suppose at least one of its tests fails. Now we have to consider both its neighbors. As we proceed, we'll use the following color coding:

red = the test failed at that node
yellow = the constraint combination needs to be tried next
green = satisfies both tests
blue = we haven't tested it yet
white = Pruned from consideration.

Suppose the top-left node fails one of its tests (i.e., either g(0)<=0 or g(1)<=0). We are now it this search configuration:

Note: The bottom right node has not been tested, but we know a priori that the constraints will be satisfied there. However, it will be the least optimal fit compared to any other node that passes.

Next, process one of the yellow nodes, say { g(0)=0 }. Suppose this passes its tests, i.e., g'(0)<0 and g(1)>0. We now have

Because { g(0)=0 } passes, all downstream cases can be pruned since they can only be an inferior fit. However, we still need to try any remaining yellow nodes. Currently there is a single yellow node, let's suppose it fails.

And then suppose the next one, { g(1)=0, g'(1)=0} passes.

The search is now complete. We have two solutions that pass (which means the tails go in the correct directions in both cases), and since neither solution is upstream of the other, we have to compare their actual least square errors to determine which is the better fit.

At any time during the search, there will at most one yellow or green node in any given column (never both), and at most one yellow or green node in any row (never both), so in total the number of yellow plus green nodes will never exceed 4. Thus, you can keep track of the search state by simply remembering the row position of the first yellow or green node (if any) in each column (along with whether it is yellow or green), and the position of the first pruned node (if any) in that column.

Numeric stability issues

The tail constraints described above enforce constraints such as

[math]\displaystyle{ g(0) = a_1 - {1\over 2} a_2 + {1\over 4} a_5 - {1\over 8}a_7 + {1\over 16} a_9 - ... = 0 }[/math]

Later, when the resulting coefficients are used to compute the resulting distribution, this sum should come out to zero, but when there is a very tiny loss of precision in the [math]\displaystyle{ a_k }[/math] coefficients due to the finite precision of floating point numbers, the actual result of this sum is quite likely to be a tiny, non-zero, floating point number. We could expect this round-off is equally likely to be positive or negative, but in the event that it is negative, we end up with [math]\displaystyle{ g(0) \lt 0 }[/math], which despite our earnest efforts to enforce constraints, ends up with a tail reversal anyway. This isn't exactly because our coefficients are wrong -- it is because the desired coefficients are slightly different from the actual ones the computer was able to represent.

To avoid this instability, we need to modify the algorithm to enforce [math]\displaystyle{ g(0) - \epsilon = 0 }[/math], instead of [math]\displaystyle{ g(0) = 0 }[/math], where [math]\displaystyle{ \epsilon }[/math] is a small number near floating point precision, such as [math]\displaystyle{ \epsilon = 1e-14 }[/math].

Once you enforce [math]\displaystyle{ g(0) \gt \epsilon }[/math], you can probably get away with never enforcing [math]\displaystyle{ g'(0)\lt 0 }[/math] or [math]\displaystyle{ f'(0)\ge 0 }[/math], because the margin will ensure that [math]\displaystyle{ g(0)\gt 0 }[/math], satisfying the first condition. This reduces the above algorithm to a 2x2 grid of nodes rather than a 4x4 grid. However, if you do the 4x4 grid, the result may be a bit more "faithful".