How to Fit a Distribution to Data

Revision as of 22:17, 24 July 2008 by Lchrisman (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This article discusses techniques for fitting a given distribution type to historical data. The problem of determining which distribution type best reflects a data set is a bit different and is not covered here.

Generalized regression techniques such as Logistic Regression are used to predict the probability of an outcome from many input variables. This article covers a simpler topic -- estimating a marginal distribution without any conditionality on any inputs. For example, given a set of data between 0 and 1, how would you find the parameters of the best fit Beta distribution?

Once a distribution type has been identified, the parameters to be estimated have been fixed, so that a best-fit distribution is usually defined as the one with the maximum likelihood parameters given the data.

Specific Estimation Formulae

Many textbooks provide parameter estimation formulas or methods for most of the standard distribution types. Use of these are, by far, the easiest and most efficient way to proceed. For example, the parameters of a best-fit Normal distribution are just the sample Mean and sample standard deviation.

The book Uncertainty by Morgan and Henrion, Cambridge University Press, provides parameter estimation formula for many common distributions (Normal, LogNormal, Exponential, Poisson, Gamma, Weibull, Uniform, Triangular, and Beta). Estimation formula for other distribution types can often be found on Wikipedia.

Some estimation formula are summarized here. Data is denoted by x, and the index of the data by I.

Normal(m,s)

m = Mean(x,I)
s = SDeviation(x,I)

LogNormal(med,gsdev)

med = Exp(Mean(Ln(x)))
gsdev = Exp(SDeviation(Ln(x)))

Exponential(m)

m = Mean(x,I)

Poisson(m)

m = Mean(x,I)

Gamma(a,b)

a = (Mean(x,I) / SDeviation(x,I))^2
b = Mean(x,I) / SDeviation(x,I)^2
Comments


You are not allowed to post comments.