LChrisman/Bayes rule for MetaLog

An idea I have for computing the MetaLog posterior distribution from a MetaLog prior. I don't know yet if it is solid.

Incorporating data point weights

A building block is weighted-data fitting.

Let [math]\displaystyle{ [ x_1, ... x_n ] }[/math] be a set of data points with weights [math]\displaystyle{ [w_1,...,w_n] }[/math] where [math]\displaystyle{ \sum_i w_i = 1 }[/math].

To fit this data, use the points [math]\displaystyle{ \{ (x_1, y_1), ..., (x_n, y_n) \} }[/math] where [math]\displaystyle{ y_i = {1 \over 2} w_j + \sum_{j=1}^{i-1} w_j }[/math].

For equally weighted points this becomes [math]\displaystyle{ y_i = (i-0.5)/n }[/math].

Find the "best weighted fit" MetaLog by solving:

[math]\displaystyle{ argmin_a \sum_i (x_i - M(y_i;a))^2 }[/math]

s.t. [math]\displaystyle{ M'(y;a)\ge 0 }[/math] for all [math]\displaystyle{ y\in (0,1) }[/math]

Denote the solution as [math]\displaystyle{ a^*( x, w ) }[/math].

I'll use weighted-fitting as a sub-routine.

Computing the posterior

Given:

M(y ; a_{prior} ) = The quantile function for the prior distribution.
L(x | data) = A likelihood function

We compute the posterior by:

Sample [math]\displaystyle{ \hat{x}_i = M( u_i ; a_{prior} ), i=1..m }[/math] for some large [math]\displaystyle{ m }[/math].
Set [math]\displaystyle{ \hat{w}_i = {{L(x_i | data)}\over{\sum_i L(x_i | data)}} }[/math]
Compute [math]\displaystyle{ a_{posterior} = a^*( \hat x, \hat w ) }[/math]

Why does this work?

The posterior distribution has the form [math]\displaystyle{ p(x|data) \propto p(x) L(x|data) }[/math], which is essentially an important sampling where [math]\displaystyle{ w_i = L(x_i|data) }[/math] and the sampling distribution is p(x).

The posterior is an approximation, but because the MetaLog has unlimited shape flexibility, it enables us to continue to match the shape.

Convergence

Likelihood sampling works great when the sampling distribution is close to the target distribution, which in this case would be when the posterior doesn't change much relative to the prior (i.e., [math]\displaystyle{ L(x|data)\lt math\gt is large). We'd expect it to be a poorer fit when \lt math\gt L(x|data) }[/math] is very small, as often happens in Bayesian inference problems.

We might be able to iterate on this to get a better fit. Once you do a weighted fit, sample a new [math]\displaystyle{ [ x_1,...,x_m] }[/math] from this intermediate posterior, which would now be a new sampling distribution. We would now need to adjust the weights by the density at [math]\displaystyle{ x_i }[/math] (the sampling distribution is slightly different from the prior), but we would get a new data set with a distribution closer to the posterior, thus leading to a more solid posterior fit.