LChrisman/Boundary Method for Feasible MetaLog
Brainstorm page: This page is here to aid the development of an idea that might lead to an algorithm for reliably and efficiently finding the best fit feasible MetaLog distribution.
Notation
- [math]\displaystyle{ k }[/math] = number of MetaLog terms
- [math]\displaystyle{ x_i\in \Re, y_i\in [0,1] }[/math] = data points. Denote [math]\displaystyle{ x=[x_i], y=[y_i] }[/math].
- [math]\displaystyle{ {a} }[/math] = quantile coefficients, [math]\displaystyle{ [a_1, ..., a_j, ..., a_k] }[/math]
- [math]\displaystyle{ B(y):[0,1]\rarr \Re^k }[/math] = Basis at scalar y, where [math]\displaystyle{ 0\lt y\lt 1 }[/math] is a probability
- [math]\displaystyle{ = [B_1(y), B_2(y), ..., B_j(y),...B_k(y)] }[/math]
- [math]\displaystyle{ B_j(y) = (y-0.5)^{\lfloor \frac{j-1}{2} \rfloor} (\mathbf{1}_{j \in \mu} + \ell(y) \mathbf{1}_{j \in s} ) }[/math]
- [math]\displaystyle{ \hat{B} = \left[ \begin{array}{c} B(y_1) \\ B(y_2) \\ \vdots \\ B(y_m) \end{array} \right] }[/math] = The basis [math]\displaystyle{ m \times k }[/math] matrix.
- [math]\displaystyle{ {b}(y) : [0,1]\rarr \Re^k }[/math] = the derivative basis, [math]\displaystyle{ [b_1(y), b_2(y), ..., b_j(y),...b_k(y)] }[/math]
- [math]\displaystyle{ b_j(y) = {{\partial B_j(y)}\over{\partial y}} = }[/math]
- [math]\displaystyle{ M(y):[0,1]\rarr \Re = {a} \cdot {B}(y) }[/math] = y-th quantile
- [math]\displaystyle{ M'(y):[0,1]\rarr\Re = {a} \cdot {b}(y) }[/math] = inverse density at [math]\displaystyle{ y^{th} }[/math] quantile
The feasible cone
At each value of [math]\displaystyle{ y }[/math] there is a half-hyperplane in [math]\displaystyle{ k }[/math]-dimensions defined by [math]\displaystyle{ M'(y)\le 0 }[/math].
The cone of feasible solutions is given by the intersection of [math]\displaystyle{ {M'}(y) \ge 0 }[/math] for all [math]\displaystyle{ 0\le y \le 1 }[/math]. Each solution is a point is k-dimensional space.
- The feasible set is convex
- The feasible set is closed (i.e., solutions on its boundary are feasible).
- There is at least one [math]\displaystyle{ y }[/math] that corresponds to each point on the boundary.
- A [math]\displaystyle{ y }[/math] that corresponds to the boundary is the y-value where the density is just touching zero.
- There may be more than value of [math]\displaystyle{ y }[/math] associated with the same boundary point. This would mean that the density touches zero at two (or more) points.
Conjecture: The same code results using [math]\displaystyle{ 0 \lt y \lt 1 }[/math].
Loss
The quality of a function is measured by an L2 loss function:
- [math]\displaystyle{ \mathcal{L}(a) = \mathcal{L}(a ; x,y) = ||M(y) - x ||_2 = \sum_{i=1}^m \left( a\cdot \hat{B} - x \right)^2 = (a\cdot \hat{B} - x )^T (a\cdot \hat{B} - x) }[/math]
Note that x,y are constant for the problem of finding the best fit.
Denote the unconstrained optimal solution as
- [math]\displaystyle{ a^* = \arg\min_a \mathcal{L}(a) }[/math]
This is the solution found using ordinary least squares regression.
The loss function, [math]\displaystyle{ \mathcal{L}(a) }[/math], is a parabola in [math]\displaystyle{ k }[/math]-dimensional space, and therefore can be also represented as
- [math]\displaystyle{ \mathcal{L}(a ; a^*, \Sigma) = (a - a^*)^T \Sigma^{-1} (a - a^*) + \mathcal{L}(a^*) }[/math]
Since [math]\displaystyle{ \mathcal{L}(a^*) }[/math] is constant, it can be omitted while computing the optimization.
You can find the parameters of the parabola as follows.
- [math]\displaystyle{ a^* = \left( B^T B\right)^{-1} B^T x }[/math]
- [math]\displaystyle{ \Sigma = \sigma^2 \left( B^T B\right)^{-1} }[/math]
- [math]\displaystyle{ \sigma^2 = {1\over{m-k}} \mathcal{L}(a^*) }[/math]
Enable comment auto-refresher