LChrisman/Computing Derivatives for MetaLog
This page outlines an algorithm for computing the [math]\displaystyle{ i^{th} }[/math] derivative of a Keelin MetaLog.
From Eq(6) in the Baucells, Chrisman, Keelin paper draft:
- [math]\displaystyle{ M^{(i)}(y) = \mu^{(i)}(y) + s^{(i)} \ell(y) + \sum_{j=0}^{i-1} s^{(j)}(y) \ell^{(i-j)}(y) }[/math]
thus in
- [math]\displaystyle{ y^i (1-y)^i M^{(i)}(y) = y^i (1-y)^i \mu^{(i)}(y) + y^i (1-y)^i s^{(i)}(y) \ell(y) + y^i (1-y)^i \sum_{j=0}^{i-1} s^{(j)}(y) \ell^{(i-j)}(y) }[/math]
the first and third term are polynomials, and the middle term is a polynomial times a logit. Hence, this can always be written in the form:
- [math]\displaystyle{ y^i (1-y)^i M^{(i)}(y) = poly_1(y) + poly_2(y) \ell(y) }[/math]
where
- [math]\displaystyle{ poly_2(y) = y^i (1-y)^i s^{(i)} }[/math]
- [math]\displaystyle{ poly_1(y) = y^i (1-y)^i \mu^{(i)}(y) + T(y) }[/math]
- [math]\displaystyle{ T(y) = y^i (1-y)^i \sum_{j=0}^{i-1} s^{(j)}(y) \ell^{(i-j)}(y) }[/math]
When [math]\displaystyle{ i\ge\lfloor (k+1)/2 \rfloor }[/math], then [math]\displaystyle{ poly_2(y)=0 }[/math] so that [math]\displaystyle{ y^i (1-y)^i M^{(i)}(y) }[/math] is a simple polynomial.
- [math]\displaystyle{ \ell^{(i)}(y) = (i-1)! {{y^i - (-1)^i(1-y)^i}\over{y^i (1-y)^i}} }[/math]
- [math]\displaystyle{ \begin{array}{rcl} y^i (1-y)^i \ell^{(i-j)}(y) &=& y^i (1-y)^i (i-j-1)! {{y^{i-j} - (-1)^{i-j}(1-y)^{i-j}}\over{y^{i-j} (1-y)^{i-j}}} \\ &=& y^j (1-y)^j (i-j-1)! \left(y^{i-j} - (-1)^{i-j}(1-y)^{i-j} \right) \end{array} }[/math]
[math]\displaystyle{ poly_1 }[/math] is a polynomial with degree [math]\displaystyle{ \max(deg(s),deg(\mu))+i = \lfloor{{k-1}\over 2}\rfloor + i }[/math].
[math]\displaystyle{ poly_2 }[/math] is a polynomial with degree [math]\displaystyle{ deg(s)+i }[/math], except when [math]\displaystyle{ i\gt deg(s) }[/math], in which case it has degree 0.
Total number of coefficients in [math]\displaystyle{ poly_1 }[/math] and [math]\displaystyle{ poly_2 }[/math] is
- [math]\displaystyle{ \begin{array}{ll} 1+k+i & \mbox{if } k \mod 4=1 \\ 2+k+i & \mbox{if } k \mod 2=0 \\ 3+k+i & \mbox{if } k \mod 4=3 \end{array} }[/math]
Change of variable
[math]\displaystyle{ u = y-0.5 }[/math]
This is so that the coefficients of [math]\displaystyle{ \mu(u) }[/math] and [math]\displaystyle{ s(u) }[/math] are the same coefficients that appear in [math]\displaystyle{ a }[/math]. If we instead used [math]\displaystyle{ s(y) }[/math], then these would require multiplying out [math]\displaystyle{ (y - 0.5)^j }[/math] for each coefficient, and would not be the same polynomial coefficients as in [math]\displaystyle{ a }[/math].
- [math]\displaystyle{ M(u) = \mu(u) + s(u) \ell(u+0.5) }[/math]
- [math]\displaystyle{ y^i (1-y)^i = \left({1\over 4} - u^2\right)^i = R_i(u) }[/math]
- [math]\displaystyle{ \left({1\over 4} - u^2\right)^i M^{(i)}(u) = \left[ R_i(u) \mu^{(i)}(u) + T(u) \right] + \left[ R_i(u) s^{(i)}(u) \right] \ell(u+0.5) }[/math]
[math]\displaystyle{ \begin{array}{rcl} \left({1\over 4} - u^2\right)^i \ell^{(i-j)}(u) &=& R_j(u) (i-j-1)! \left( (0.5+u)^{i-j} - (-1)^{i-j}(0.5-u)^{i-j} \right) \\ &=& (i-j-1)! R_j(u) N_{i-j}(u) \end{array} }[/math]
[math]\displaystyle{ T(u) = T(y-0.5) = \sum_{j=0}^{i-1} (i-j-1)! s^{(j)}(u) R_j(u) N_{i-j}(u) }[/math]
ith derivative finding algorithm
The algorithm uses arithmetic operations on a polynomial-coefficients data type. Here [math]\displaystyle{ T(u) }[/math], [math]\displaystyle{ s^{(j)} }[/math], [math]\displaystyle{ R(u) }[/math] and [math]\displaystyle{ N(u) }[/math] are all polynomials. The order of the derivative, [math]\displaystyle{ i }[/math] is given. Note: [math]\displaystyle{ deg(s) \lt \lfloor(k-1)/2\rfloor }[/math].
First compute T(u):
- For each [math]\displaystyle{ j=0 \mbox{ to } i-1 }[/math]
- Compute [math]\displaystyle{ s^{(j)}(u) }[/math]
- Compute [math]\displaystyle{ R_j(u) = (0.5+u)^j (0.5-u)^j }[/math]
- Compute [math]\displaystyle{ N_{i-j}(u) = (0.5+u)^{i-j} - (-1)^{i-j} ( 0.5-u)^{i-j} }[/math]
- Compute [math]\displaystyle{ T_j(u) = (i-j-1)! \cdot s^{(j)}(u) \cdot R(u) \cdot N(u) }[/math]
- [math]\displaystyle{ T(u) = \sum_{j=0}^i T_j(u) }[/math]
Note that [math]\displaystyle{ R_j }[/math] and [math]\displaystyle{ N_j }[/math] are polynomials that only vary with [math]\displaystyle{ j }[/math], and hence can each be cached in a 1-D table for speedup.
Return these two polynomials (i.e., the coefficient vectors):
- poly coefs = [math]\displaystyle{ R_i(u) \mu'(u) + T(u) }[/math]
- logit poly coefs = [math]\displaystyle{ R_i(u) s'(u) }[/math]
These are the coefficients for representing [math]\displaystyle{ R_i(u) M^{(i)}(u) }[/math], which is a polynomial that has the same zeros as [math]\displaystyle{ M^{(i)}(u) }[/math].
Zero finding
To find the zeros of [math]\displaystyle{ M^{(i)}(y) }[/math], first find the zeros of [math]\displaystyle{ M^{(i+1)}(y) }[/math], which are the minima and maxima of [math]\displaystyle{ M^{(i)}(y) }[/math]. We should add to this list of extrema y=0 and y=1. Each adjacent extrema with opposite sign will bound one zero of [math]\displaystyle{ M^{(i)}(y) }[/math]. Thus, we have a finite set of intervals, each containing a single zero.
The base case for the recursion is [math]\displaystyle{ i=\lfloor (k+1)/2 \rfloor }[/math], in which case [math]\displaystyle{ y^i (1-y)^i M'(y) = y^i (1-y)^i \mu^{(i)} + T(y) }[/math] is a polynomial, for which a standard polynomial root finder is used. This has the same (interior) roots as [math]\displaystyle{ M'(y) }[/math].
To find the zero in each interval, simply conduct a binary search within each interval using [math]\displaystyle{ y^i (1-y)^i M^{(i)}(y) = y^i (1-y)^i s^{(i)} + T(y) + y^i (1-y)^i s^{(i)} \ell(y) }[/math]
It is possible to speed the search up within each interval with a bounded Newton algorithm. For this, we need the derivative
- [math]\displaystyle{ {{\partial y^i (1-y)^i M^{(i)}(y)} \over {\partial y}} = {{\partial y^i (1-y)^i}\over {\partial y}} M^{(i)}(y) + y^i (1-y)^i M^{(i+1)}(y) }[/math]
where [math]\displaystyle{ y^i (1-y)^i M^{(i+1)}(y) }[/math] is obtained from the previous recursion step (at [math]\displaystyle{ i+1 }[/math]). Also
- [math]\displaystyle{ {{\partial y^i (1-y)^i}\over {\partial y}} = i (1-y)^i y^{i-1} - i (1-y)^{i-1} y^i }[/math]
Adding [math]\displaystyle{ y=0 }[/math] or [math]\displaystyle{ y=1 }[/math] to the list of zeros when [math]\displaystyle{ M^{(i)}(0)=0 }[/math] or </math>M^{(i)}(1)=0</math> is optional for our algorithm, since the end points are dealt with already (e.g., each recursion adds them as extrema automatically, and this doesn't affect the optimization. Stated more simply, without adding those, this is an interior zeros finder.
Computational complexity
The complexity of the zero finding algorithm (to find all zeros for any (or all) derivate) is
- [math]\displaystyle{ O( b k^4) }[/math]
where
- [math]\displaystyle{ k }[/math] = # of MetaLog terms.
- [math]\displaystyle{ b }[/math] = number of bits precision in each zero
- The algebraic manipulation to compute [math]\displaystyle{ T(y) }[/math] is [math]\displaystyle{ O( k^2 ) }[/math], which arises from the complexity of polynomial multiplication, where the number of terms is upper bounded by [math]\displaystyle{ k }[/math].
- This is done at (up to) [math]\displaystyle{ O(k) }[/math] levels of recursion.
- The search at each level occurs on at most [math]\displaystyle{ O(k) }[/math] intervals, where the binary search requires at most [math]\displaystyle{ b }[/math] steps to obtain [math]\displaystyle{ b }[/math] bits of precision.
- The polynomial zero finder at the base-case recursion occurs only once. [math]\displaystyle{ O(k^3) }[/math] methods exist, which is lower than our derived total complexity, so this doesn't appear. (In fact, I think interval refinement methods using Bernstein polynomials can actually to it in [math]\displaystyle{ O(k b) }[/math]).
Enable comment auto-refresher