User:Lchrisman/JQPN Cowlick
This page has notes by Lonnie on an oddity that I've named the "Cowlick" of the fully-bounded Johnson QPN distribution from the paper:
- Hadlock & Bickel, "Johnson Quantile-Parameterized Distributions", Decision Analysis 16(1), Jan 2019.
This oddity impacts the implementation of DensUncertainLMH, where I was hoping to use their algorithm but ran into this glitch. Other than this, I am enthusiastic about their paper and am hoping to use their algorithm. I want to have a robust built-in function in Analytica, but I need to explore whether this is something that needs to be fixed first. So this page contains my private notes, just so I can understand what is going on, perhaps find a fix.
The cowlick refers to an artifact in their fully-bounded distribution in which the density goes to infinity at the upper and lower bounds. At first I thought it happened at the upper bound when n=1 (when B < ave(L,H)), or goes to infinity at the lower bound when n=-1 (when B>ave(L,H)). But results below (in the absence of a mistake on my part) show that it is always there. (In the less extreme cases, the spike is so thin that you can only detect it by zeroing in very closely around the bound). This is an undesirable trait, and causes there to be what is essentially a step in the CDF at the upper bound, as if the continuous part of the CDF doesn't really go to the upper bound. It isn't a step in a strict sense (it is continuous), but it flicks up so fast at the end that it acts like a step. It would be better if it tended to ease up to the bound more smoothly. I think there are extreme cases where it would need to approach an infinite slope at the bounds (like where the 90th percentile gets arbitrarily close to the UB), but I'm seeing it were that doesn't seem necessary.
The figure here is the cowlick in the illustrative example from their paper, where you can see the density jumping to infinity at the upper bound.
This graph is the same as the graph that appears in their paper as Figure 7b, where you can make out this phenomena if you look carefully. Here is their graph with my arrow pointing out the start of the cowlick.
The CDF for the distribution parameterized by (-11, -10, -9.99, -9, -8) is shown next, where the cowlick causes a "step" to occur in the CDF at the upper bound. It isn't quite the smooth CDF I had hoped for.
The paper doesn't acknowledge this oddity, but it can be detected in their Figure 7b at the upper bound. The oddity does not happen with the unbounded and semi-bounded cases.
Cowlick ubiquity
Although the cowlick was quite obvious after I had implemented the algorithm, I found it to be non-trivial to prove that it happens. This is my attempt. I focus just on the upper bound first.
- Theorem: In the fully-bounded Hadlock & Bickel JQPN distribution, when [math]\displaystyle{ B\neq(L+H)/2 }[/math] the probability density [math]\displaystyle{ f_B }[/math] at the upper bound [math]\displaystyle{ u }[/math] and at the lower bound approaches infinity -- i.e.,
- [math]\displaystyle{ \lim_{x\rarr u} f_B(x) = \infty }[/math]
- where [math]\displaystyle{ f_B(x) = {{\partial F_B(x)}\over{\partial x}} }[/math] and [math]\displaystyle{ F_B(x) }[/math] is the cumulative probability function given in Eq (8) of the paper.
The paper uses a different equation, (7b) for the special-case [math]\displaystyle{ B=(L+H)/2 }[/math], so I've excluded that case.
Proof of the theorem
In the paper, the quantile function [math]\displaystyle{ Q_B(p) }[/math] is the inverse of the cumulative probability function [math]\displaystyle{ F_B(x) }[/math], where [math]\displaystyle{ x=l }[/math] corresponds to [math]\displaystyle{ p=0 }[/math] and [math]\displaystyle{ x=u }[/math] occurs when [math]\displaystyle{ p=1 }[/math]. Hence, the derivative of [math]\displaystyle{ F_B(x) }[/math] is the reciprocal of [math]\displaystyle{ Q_B(p) }[/math] at the corresponding [math]\displaystyle{ p }[/math]. Since the equations are a little simpler for the derivative of [math]\displaystyle{ Q_B }[/math], what I'll actually be showing is that
- [math]\displaystyle{ \lim_{p-\gt 1} {{\partial Q_B(p)}\over{\partial p}} = 0 }[/math]
which, one shown, will establish that [math]\displaystyle{ f_B(u) }[/math] approaches [math]\displaystyle{ \infty }[/math].
[math]\displaystyle{ Q_B }[/math] is defined in Eq (7) of the paper as
- [math]\displaystyle{ Q_B(p) = l + (u-l) \Phi\left( \xi+ \lambda \sinh( \delta (\Phi^{-1}(p) + n c)) \right) }[/math]
where [math]\displaystyle{ l, u }[/math] are the specified lower and upper bounds for the bounded distribution, [math]\displaystyle{ \Phi(x) }[/math] is the standard normal cumulative probability function, and [math]\displaystyle{ \xi, \lambda, \delta, n }[/math] and [math]\displaystyle{ c }[/math] are defined in the paper as a function of the distribution parameters.
From their definitions, [math]\displaystyle{ c\gt 0, \delta\gt 0, \lambda\gt 0 }[/math] always.
To simplify notation, I make this variable substitution:
- [math]\displaystyle{ z = \delta( \Phi^{-1}(p) + n c) }[/math]
yielding
- [math]\displaystyle{ Q_B(z) = l + (u-l) \Phi\left( \xi+ \lambda \sinh( z ) \right) }[/math]
and
- [math]\displaystyle{ {{\partial Q_B(p)}\over{\partial p}} = {{\partial Q_B(z)}\over{\partial z}} {{\partial z}\over{\partial p}} = {{\partial Q_B(z)}\over{\partial z}} \sqrt{2\pi} \delta e^{{1\over 2} \Phi^{-1}(p)^2 } = {{\partial Q_B(z)}\over{\partial z}} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta - n c)^2} }[/math]
where [math]\displaystyle{ x\rarr u }[/math] corresponds to [math]\displaystyle{ p\rarr 1 }[/math] and [math]\displaystyle{ z \rarr \infty }[/math], so
- [math]\displaystyle{ \lim_{p-\gt 1} {{\partial Q_B(p)}\over{\partial p}} = \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2} (z/\delta - n c)^2} {{\partial Q_B(z)}\over{\partial z}} }[/math]
and I need to show that this approaches 0.
Take the derivative (I used Wolfram Alpha's online derivative calculator):
- [math]\displaystyle{ {{\partial Q_B(z)}\over{\partial z}} = {{(u-l) \lambda}\over{2 \sqrt{2 \pi}}} \left( e^z + e^{-z} \right) e^{-{1\over 2} \left( {1\over 2} \lambda \left(e^z - e^{-z}\right) + \xi\right)^2} }[/math]
Expand the limit
- [math]\displaystyle{ \begin{array}{rcl} \lim_{p\rarr 1} {{\partial Q_B(p)}\over{\partial p}} &=& \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta- n c)^2} {{\partial Q_B(z)}\over{\partial z}} \\ &=& \lim_{z\rarr \infty} \sqrt{2\pi} \delta e^{{1\over 2}(z/\delta - n c)^2} {{(u-l) \lambda}\over{2 \sqrt{2 \pi}}} \left( e^z + e^{-z} \right) e^{-{1\over 2} \left( {1\over 2} \lambda \left(e^z - e^{-z}\right) + \xi\right)^2} \\ &=& {1\over 2} (u-l)\delta \lambda \left( \lim_{z\rarr \infty} e^z + e^{-z} \right) \left( \lim_{z\rarr \infty} e^{{1\over 2} (z/\delta- n c)^2} \right) \left( \lim_{z\rarr \infty} e^{-{1\over 2} \left( {1\over 2} \lambda \left( e^z - e^{-z}\right) + \xi\right)^2} \right) \\ &=& {1\over 2} (u-l)\delta \lambda \left( \lim_{z\rarr \infty} e^z \right) \left( \lim_{z\rarr \infty} e^{{1\over 2} \delta^{-2} z^2} \right) \left( \lim_{z\rarr \infty} e^{-{1\over 4} \lambda^2 {e^z}^2} \right) \\ &=& {1\over 2} (u-l)\delta \lambda \lim_{z\rarr \infty} e^z e^{{1\over 2} \delta^{-2} z^2} e^{-{1\over 4} \lambda^2 e^{z^2}} \\ &=& 0 \end{array} }[/math]
Last line follows because the [math]\displaystyle{ e^{-{1\over4} \lambda^2 e^{z^2}} }[/math] term goes to zero and dominates both [math]\displaystyle{ e^z }[/math] and [math]\displaystyle{ e^{{1\over 2}\delta^{-2} z^2} }[/math].
Note: I did not use the fact that [math]\displaystyle{ n=1 }[/math], so the same proof works for [math]\displaystyle{ z \rarr -\infty }[/math], which if correct would mean that the cowlick appears on both bounds always. I thought this indicated I made a mistake, but indeed after zooming in at extremely high resolution to the computed CDF and PDF, we found that there is an extremely thin spike even in the n=-1
cases. So I now think the spike is always there, just too thin to detect in the less extreme cases.