Difference between revisions of "LChrisman/Dual method for feasible MetaLog"

Revision as of 16:03, 20 September 2024

This page documents an idea for computing the best-fit guaranteed-feasible Keelin (MetaLog) distribution. This is in the "idea" stage so may contain errors and may or may not turn out to be a viable approach.

The algorithm solves the problem in the dual space.

Primal problem

Minimize[math]\displaystyle{ _a }[/math] [math]\displaystyle{ Loss(a) }[/math]

s.t. [math]\displaystyle{ M'( y ; a ) \ge 0 }[/math] for all [math]\displaystyle{ y∈(0,1) }[/math]

The parts of this problem are:

[math]\displaystyle{ Loss(a) = || M(\hat y;a) - \hat x ||_2 }[/math], where [math]\displaystyle{ (\hat x, \hat y) }[/math] are the data points.

[math]\displaystyle{ M(y;a) = a \cdot B(y) }[/math], where [math]\displaystyle{ B(y) }[/math] is the Keelin basis function.

Dual problem

Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\lambda) }[/math]

s.t. [math]\displaystyle{ \lambda(y)\ge 0 }[/math] for all [math]\displaystyle{ y\in(0,1) }[/math]

where

[math]\displaystyle{ L(a,\lambda) = Loss(a) - \int_0^1 \lambda(y) M'(y;a) dy }[/math], is the Lagrangian function.

Since [math]\displaystyle{ \lambda(y)\ne 0 }[/math] only when [math]\displaystyle{ M'(y;a)=0 }[/math], and since [math]\displaystyle{ M'(y;a) }[/math] can have at most [math]\displaystyle{ m = 2\lfloor (k-1)/2 \rfloor }[/math] roots, where [math]\displaystyle{ k }[/math] is the number of MetaLog terms, we can re-write the dual problem using a finite number of Lagrangian multipliers, [math]\displaystyle{ \Lambda = [\lambda_1, ..., \lambda_m] }[/math], corresponding to one for each root of [math]\displaystyle{ M' }[/math].

Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\Lambda) }[/math]

s.t. [math]\displaystyle{ \lambda_j\ge 0 }[/math] for all [math]\displaystyle{ j\in\{1,...,m\} }[/math]

where

[math]\displaystyle{ L(a,\Lambda) = Loss(a) - \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]

and [math]\displaystyle{ y_j }[/math] are the roots of [math]\displaystyle{ M'(y,a) }[/math].

[math]\displaystyle{ dualityGap(a, \Lambda) = \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]

Lagrangian is quadratic

The [math]\displaystyle{ Loss(a) }[/math] part of [math]\displaystyle{ L(a,\Lambda) }[/math] is obviously a convex parabola in [math]\displaystyle{ a }[/math], and then the duality gap term adds [math]\displaystyle{ \lambda_j a_k }[/math] terms, also quadratic (but not convex).

This opens the door to a matrix algebra solution to the dual problem (given the location of the zeros of [math]\displaystyle{ M' }[/math]). I derive that solution in this section.

[math]\displaystyle{ \nabla_a L(a,\Lambda) = 2\sum_i \left( a^T B(\hat y_i) - \hat x_i\right) B(y_i) - \sum_j \lambda_j B'(y_j) = 0 }[/math]

@@ Line 20: / Line 20: @@
 :<math>L(a,\lambda) = Loss(a) - \int_0^1 \lambda(y) M'(y;a) dy</math>, is the Lagrangian function.
-Since <math>\lambda(y)\ne 0</math> only when <math>M'(y;a)=0</math>, and since <math>M'(y;a)</math> can have at most <math>m = 2\lfloor (k-1)/2 \rfloor</math> roots, where <math>k</math> is the number of MetaLog terms, we can re-write the dual problem using a finite number of Lagrangian multipliers, <math>\lambda_1, ..., \lambda_m</math>, corresponding to one for each root of <math>M'</math>.
+Since <math>\lambda(y)\ne 0</math> only when <math>M'(y;a)=0</math>, and since <math>M'(y;a)</math> can have at most <math>m = 2\lfloor (k-1)/2 \rfloor</math> roots, where <math>k</math> is the number of MetaLog terms, we can re-write the dual problem using a finite number of Lagrangian multipliers, <math>\Lambda = [\lambda_1, ..., \lambda_m]</math>, corresponding to one for each root of <math>M'</math>.
-:Maximize<math>_\lambda</math> <math>\inf_a L(a,[\lambda_1,...,\lambda_m])</math>
+:Maximize<math>_\lambda</math> <math>\inf_a L(a,\Lambda)</math>
 :s.t. <math>\lambda_j\ge 0</math> for all <math>j\in\{1,...,m\}</math>
 where
-:<math>L(a,[\lambda_1,...,\lambda_m]) = Loss(a) - dualityGap(a,[\lambda_1,...,\lambda_m])</math>
+:<math>L(a,\Lambda) = Loss(a) - \sum_{j=1}^m \lambda_j M'(y_j ; a)</math>
-:<math>dualityGap(a,[\lambda_1,...,\lambda_m]) = \sum_{j=1}^m \lambda_j M'(y_j ; a)</math>
 and <math>y_j</math> are the roots of <math>M'(y,a)</math>.
+:<math>dualityGap(a, \Lambda) = \sum_{j=1}^m \lambda_j M'(y_j ; a)</math>
+== Lagrangian is quadratic ==
+The <math>Loss(a)</math> part of <math>L(a,\Lambda)</math> is obviously a convex parabola in <math>a</math>, and then the duality gap term adds <math>\lambda_j a_k</math> terms, also quadratic (but not convex).
+This opens the door to a matrix algebra solution to the dual problem (given the location of the zeros of <math>M'</math>). I derive that solution in this section.
+:<math>\nabla_a L(a,\Lambda) = 2\sum_i \left( a^T B(\hat y_i) - \hat x_i\right) B(y_i)
+- \sum_j \lambda_j B'(y_j) = 0</math>