Difference between revisions of "LChrisman/Dual method for feasible MetaLog"

Revision as of 17:30, 20 September 2024

This page documents an idea for computing the best-fit guaranteed-feasible Keelin (MetaLog) distribution. This is in the "idea" stage so may contain errors and may or may not turn out to be a viable approach.

The algorithm solves the problem in the dual space.

Primal problem

Minimize[math]\displaystyle{ _a }[/math] [math]\displaystyle{ Loss(a) }[/math]

s.t. [math]\displaystyle{ M'( y ; a ) \ge 0 }[/math] for all [math]\displaystyle{ y∈(0,1) }[/math]

where

[math]\displaystyle{ Loss(a) = || M(\hat y;a) - \hat x ||_2 }[/math], where [math]\displaystyle{ (\hat x, \hat y) }[/math] are the data points.

[math]\displaystyle{ M(y;a) = a \cdot B(y) }[/math], where [math]\displaystyle{ B(y) }[/math] is the Keelin basis function.

Dual problem

Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\lambda) }[/math]

s.t. [math]\displaystyle{ \lambda(y)\ge 0 }[/math] for all [math]\displaystyle{ y\in(0,1) }[/math]

where the Lagrangian function is

[math]\displaystyle{ L(a,\lambda) = Loss(a) - \int_0^1 \lambda(y) M'(y;a) dy }[/math]

Since [math]\displaystyle{ \lambda(y)\ne 0 }[/math] only when [math]\displaystyle{ M'(y;a)=0 }[/math], and since [math]\displaystyle{ M'(y;a) }[/math] can have at most [math]\displaystyle{ m = 2\lfloor (k-1)/2 \rfloor }[/math] roots, where [math]\displaystyle{ k }[/math] is the number of MetaLog terms, we can re-write the dual problem using a finite number of Lagrangian multipliers, [math]\displaystyle{ \Lambda = [\lambda_1, ..., \lambda_m] }[/math], corresponding to one for each root of [math]\displaystyle{ M' }[/math].

Maximize[math]\displaystyle{ _\lambda }[/math] [math]\displaystyle{ \inf_a L(a,\Lambda) }[/math]

s.t. [math]\displaystyle{ \lambda_j\ge 0 }[/math] for all [math]\displaystyle{ j\in\{1,...,m\} }[/math]

where

[math]\displaystyle{ L(a,\Lambda) = Loss(a) - \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]

and [math]\displaystyle{ y_j }[/math] are the roots of [math]\displaystyle{ M'(y,a) }[/math].

[math]\displaystyle{ dualityGap(a, \Lambda) = \sum_{j=1}^m \lambda_j M'(y_j ; a) }[/math]

Lagrangian is quadratic

The [math]\displaystyle{ Loss(a) }[/math] part of [math]\displaystyle{ L(a,\Lambda) }[/math] is obviously a convex parabola in [math]\displaystyle{ a }[/math], and then the duality gap term adds [math]\displaystyle{ \lambda_j a_k }[/math] terms, also quadratic (but not convex).

This opens the door to a matrix algebra solution to the dual problem (given the location of the zeros of [math]\displaystyle{ M' }[/math]).

Closed-form solution to dual

Given a carrier set, this section derives a closed-form solution to solve the dual for [math]\displaystyle{ a }[/math] and [math]\displaystyle{ \Lambda }[/math]. The solution is the correct, optimal solution in the event that the carrier set contains the actual zeros of [math]\displaystyle{ M' }[/math].

[math]\displaystyle{ \nabla_a L(a,\Lambda) = 2\sum_i \left( a^T B(\hat y_i) - \hat x_i\right) B(\hat y_i) - \sum_j \lambda_j B'(y_j) = 0 }[/math]

where [math]\displaystyle{ \hat y_i }[/math] are known data points, and not-hat [math]\displaystyle{ y_j }[/math] are carrier indices.

Let

[math]\displaystyle{ Q = 2\sum_i B(\hat y_i) B(\hat y_i)^T }[/math], a [math]\displaystyle{ k \times k }[/math] positive semi-definite matrix.

[math]\displaystyle{ c = 2 \sum_i \hat x_i B(\hat y_i) }[/math], a [math]\displaystyle{ k }[/math]-vector.

[math]\displaystyle{ D = [B'(y_1),...,B'(y_m)] }[/math], a [math]\displaystyle{ k \times m }[/math] matrix.

The stationarity condition becomes:

[math]\displaystyle{ Q a - c - D \Lambda = 0 }[/math]

so

[math]\displaystyle{ a = Q^{-1}( c + D \Lambda) }[/math]

When the carrier indicies are roots of [math]\displaystyle{ M' }[/math] then all constraints are active. But to be a bit more general, let's first partition the constraints into the active set, [math]\displaystyle{ \Lambda_A }[/math] and [math]\displaystyle{ D_A }[/math] and the inactive set [math]\displaystyle{ \Lambda_I }[/math] and [math]\displaystyle{ D_I }[/math], where [math]\displaystyle{ A \cup I = \{1,2,...m\} }[/math]. Complementary slackness (i.e., [math]\displaystyle{ \lambda_j ( a^T B'(y_j) ) = 0 }[/math]) implies [math]\displaystyle{ \Lambda_I=\vec 0 }[/math].

Now we have

[math]\displaystyle{ a = Q^{-1}( c + D_A \Lambda_A) }[/math]

[math]\displaystyle{ \left( Q^{-1}( c + D_A \Lambda_A) \right)^T B'(y_j) = 0 }[/math], for [math]\displaystyle{ j\in A }[/math].

Rewrite this using

[math]\displaystyle{ \Omega = D_A^T Q^{-1} D_A }[/math], a [math]\displaystyle{ |A|\times|A| }[/math] matrix.

[math]\displaystyle{ s = D_A^T Q^{-1} c }[/math], a [math]\displaystyle{ |A| }[/math]-vector.

which becomes

[math]\displaystyle{ s + \Omega \Lambda_A = 0 }[/math]

so that we have the solution via matrix algebra:

[math]\displaystyle{ \Lambda_A = \Omega^{-1} s }[/math]

[math]\displaystyle{ a = Q^{-1}( c + D_A \Lambda_A) }[/math]

This closed-form solution is contingent on a correct guess regarding which constraints are active. Also, the closed-form solution does not enforce the constraint of the dual problem that [math]\displaystyle{ \Lambda_A\ge 0 }[/math].

Iterate to find active set

Step 1: Initialize the active set [math]\displaystyle{ A }[/math] with an initial guess. Perhaps as the current carrier set.
Step 2: Solve [math]\displaystyle{ \Lambda_A = \Omega^{-1} s }[/math]
Step 3: If [math]\displaystyle{ \Lambda_A\ge 0 }[/math], proceed to Step 4.; If any [math]\displaystyle{ \lambda_j\lt 0\lt math\gt , move \lt math\gt j }[/math] from [math]\displaystyle{ A }[/math] to [math]\displaystyle{ I }[/math] and go to Step 2.
Step 4: Compute [math]\displaystyle{ a = Q^{-1}( c + D_A \Lambda_A) }[/math]
Step 5: Check the primal feasibility for all inactive constraints.; If [math]\displaystyle{ M'(y_j;a)\lt 0 }[/math] for any [math]\displaystyle{ j\in I }[/math], move [math]\displaystyle{ j }[/math] from [math]\displaystyle{ I }[/math] to [math]\displaystyle{ A }[/math] and return to Step 2.

Proposition: This iterative algorithm will always terminate in a finite number of steps under the standard conditions for convex quadratic programming problems with linear constraints. Proof: I think this is known as the active set method is LSIP, and is known to converge. Each adjustment to the active set moves the solution in a feasible direction that improves the objective function or maintains optimality. Since [math]\displaystyle{ m }[/math] is a small finite number, there are a finite number of possible Active sets that can ever be visited (certainly bounded by [math]\displaystyle{ 2^m }[/math]). There might be a possible situation where a j could be in either set, and hence could be moved back and forth without improving (or degrading) the objective, or where numeric round-off causes a similar effect, which should be preventable using a lexigraphical order to break ties when multiple options exist.

Solution algorithm

The previous section finds the optimal solution (using a nearly closed-form solution) given a "guess" about the location of the zeros of [math]\displaystyle{ M' }[/math]. With a correct guess, or a superset, it finds the solution (the best-fit feasible MetaLog). Now we need to specify the algorithm to converge to the correct guess.

Step 1: Initialize the carrier set with a guess for the zeros of [math]\displaystyle{ M'(y;a^*) }[/math]. Use the zeros of [math]\displaystyle{ M'(y;a^{(0)}) }[/math], where [math]\displaystyle{ a^{(0)} }[/math] is the solution to the unconstrained MetaLog.
Step 2: Find [math]\displaystyle{ \Lambda }[/math] and [math]\displaystyle{ a }[/math] using the active set method of the previous section, given this carrier set.
Step 3: Compute the zeros of [math]\displaystyle{ M'(y;a) }[/math] with this new [math]\displaystyle{ a }[/math].
Step 4: Refine the carrier set using this new set of zeros.; First, shrink the carrier set by removing points where [math]\displaystyle{ \lambda_j=0 }[/math] and [math]\displaystyle{ M'(y_j;a)\gt 0 }[/math].; Then add the new zeros to the carrier set.
Step 5: Iterate to convergence.

Conjectures (that I think are provable):

Efficiency: The carrier set size never exceeds [math]\displaystyle{ 2 m }[/math].
Every iteration reduces the duality gap.
Converges to the global optima.

Note on proof: There is a variant of Step 4 that does not shrink the carrier set before adding the new zeros. It appears to be more direct to prove convergence with that variant, possibly using existing theorems from LSIP literature. The shrinking step doesn't change the solution since we are removing carrier indices that didn't need to be there in the previous iteration.

@@ Line 8: / Line 8: @@
 :s.t. <math>M'( y ; a ) \ge 0</math> for all <math>y∈(0,1)</math>
-The parts of this problem are:
+where
 :<math>Loss(a) = || M(\hat y;a) - \hat x ||_2</math>, where <math>(\hat x, \hat y)</math> are the data points.
 :<math>M(y;a) = a \cdot B(y)</math>, where <math>B(y)</math> is the Keelin basis function.