LChrisman/Dual method for feasible MetaLog
This page documents an idea for computing the best-fit guaranteed-feasible Keelin (MetaLog) distribution. This is in the "idea" stage so may contain errors and may or may not turn out to be a viable approach.
The algorithm solves the problem in the dual space.
Primal problem
- Minimize[math]\displaystyle{ _a }[/math] [math]\displaystyle{ Loss(a) }[/math]
- s.t. [math]\displaystyle{ M'( y ; a ) \ge 0 }[/math] for all [math]\displaystyle{ y∈(0,1) }[/math]
The parts of this problem are:
- [math]\displaystyle{ Loss(a) = || M(\hat y;a) - \hat x ||_2 }[/math], where [math]\displaystyle{ (\hat x, \hat y) }[/math] are the data points.
- [math]\displaystyle{ M(y;a) = a \cdot B(y) }[/math], where [math]\displaystyle{ B(y) }[/math] is the Keelin basis function.
Dual problem
- Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\lambda) }[/math]
- s.t. [math]\displaystyle{ \lambda(y)\ge 0 }[/math] for all [math]\displaystyle{ y\in(0,1) }[/math]
where the Lagrangian function is
- [math]\displaystyle{ L(a,\lambda) = Loss(a) - \int_0^1 \lambda(y) M'(y;a) dy }[/math]
Since [math]\displaystyle{ \lambda(y)\ne 0 }[/math] only when [math]\displaystyle{ M'(y;a)=0 }[/math], and since [math]\displaystyle{ M'(y;a) }[/math] can have at most [math]\displaystyle{ m = 2\lfloor (k-1)/2 \rfloor }[/math] roots, where [math]\displaystyle{ k }[/math] is the number of MetaLog terms, we can re-write the dual problem using a finite number of Lagrangian multipliers, [math]\displaystyle{ \Lambda = [\lambda_1, ..., \lambda_m] }[/math], corresponding to one for each root of [math]\displaystyle{ M' }[/math].
- Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\Lambda) }[/math]
- s.t. [math]\displaystyle{ \lambda_j\ge 0 }[/math] for all [math]\displaystyle{ j\in\{1,...,m\} }[/math]
where
- [math]\displaystyle{ L(a,\Lambda) = Loss(a) - \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]
and [math]\displaystyle{ y_j }[/math] are the roots of [math]\displaystyle{ M'(y,a) }[/math].
- [math]\displaystyle{ dualityGap(a, \Lambda) = \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]
Lagrangian is quadratic
The [math]\displaystyle{ Loss(a) }[/math] part of [math]\displaystyle{ L(a,\Lambda) }[/math] is obviously a convex parabola in [math]\displaystyle{ a }[/math], and then the duality gap term adds [math]\displaystyle{ \lambda_j a_k }[/math] terms, also quadratic (but not convex).
This opens the door to a matrix algebra solution to the dual problem (given the location of the zeros of [math]\displaystyle{ M' }[/math]). I derive that solution in this section.
- [math]\displaystyle{ \nabla_a L(a,\Lambda) = 2\sum_i \left( a^T B(\hat y_i) - \hat x_i\right) B(\hat y_i) - \sum_j \lambda_j B'(y_j) = 0 }[/math]
where [math]\displaystyle{ \hat y_i }[/math] are known data points, and not-hat [math]\displaystyle{ y_j }[/math] are carrier indices.
Let
- [math]\displaystyle{ Q = 2\sum_i B(\hat y_i) B(\hat y_i)^T }[/math], a [math]\displaystyle{ k \times k }[/math] positive semi-definite matrix.
- [math]\displaystyle{ c = 2 \sum_i \hat x_i B(\hat y_i) }[/math], a [math]\displaystyle{ k }[/math]-vector.
- [math]\displaystyle{ D = [B'(y_1),...,B'(y_m)] }[/math], a [math]\displaystyle{ k \times m }[/math] matrix.
The stationarity condition becomes:
- [math]\displaystyle{ Q a - c - D \Lambda = 0 }[/math]
so
- [math]\displaystyle{ a = Q^{-1}( c + D \Lambda) }[/math]
When the carrier indicies are roots of [math]\displaystyle{ M' }[/math] then all constraints are active. But to be a bit more general, let's first partition the constraints into the active set, [math]\displaystyle{ \Lambda_A }[/math] and [math]\displaystyle{ D_A }[/math] and the inactive set [math]\displaystyle{ \Lambda_I }[/math] and [math]\displaystyle{ D_I }[/math], where [math]\displaystyle{ A \cup I = \{1,2,...m\} }[/math]. Complementary slackness (i.e., [math]\displaystyle{ \lambda_j ( a^T B'(y_j) ) = 0 }[/math]) implies [math]\displaystyle{ \Lambda_I=\vec 0 }[/math].
Now we have
- [math]\displaystyle{ a = Q^{-1}( c + D_A \Lambda_A) }[/math]
- [math]\displaystyle{ \left( Q^{-1}( c + D_A \Lambda_A) \right)^T B'(y_j) = 0 }[/math], for [math]\displaystyle{ j\in A }[/math].
Rewrite this using
- [math]\displaystyle{ \Omega = D_A^T Q^{-1} D_A }[/math], a [math]\displaystyle{ |A|\times|A| }[/math] matrix.
- [math]\displaystyle{ s = D_A^T Q^{-1} c }[/math], a [math]\displaystyle{ |A| }[/math]-vector.
which becomes
- [math]\displaystyle{ s + \Omega \Lambda_A = 0 }[/math]
so that we have the solution via matrix algebra:
- [math]\displaystyle{ \Lambda_A = \Omega^{-1} s }[/math]
- [math]\displaystyle{ a = Q^{-1}( c + D_A \Lambda_A) }[/math]
This closed-form solution is contingent on a correct guess regarding which constraints are active.
Enable comment auto-refresher