Log-normal distribution

Release:	… • 6.0 • 6.1 • 6.2 • 6.3 • 6.4 • 6.5 • 6.6

A Log-normal distribution is a continuous distribution whose logarithm is normally distributed. In other words, Ln(x) has a Normal distribution when x has a log-normal distribution.

LogNormal(median:3,stddev:2) →

Log-normal distributions are useful for many quantities that are always positive and have long upper tails, such as concentration of a pollutant, or amount of rainfall. The distribution is semi-bounded (positive-only) and unimodal, and often has a long right tail.

The central limit theorem says that the product of a long series of independent and identically distributed positive random variables converges to a log-normal distribution for any positive, finite-variance distribution.

Functions

The log-normal is specified by specifying any two of the following four parameters.

median

The Median, must be >0.

gsdev

The geometric standard deviation>=1.

mean

The arithmetic Mean, >0

stddev

The arithmetic standard deviation, >=0.

A named-parameter convention is recommended, such as:

LogNormal( gsdev:1.5, mean: 4 )

Since the logarithm of a LogNormally-distributed random variable is normally distributed, you can also specify a LogNormal distribution as Exp(Normal(μ,σ)), where μ and σ are the mean and standard deviation of Ln(x). Beware that this is sometimes done in literature, even though the parameters refer to the distribution of Ln(x) rather than of x itself. This is equivalent to writing LogNormal(Exp(μ),Exp(σ)), since the logarithm of the geometric standard deviation is the arithmetic standard deviation of Ln(x).

LogNormal(median, gsdev, mean, stddev, over)

The distribution function. Use this to specify that a chance variable or uncertain quantity is log-normally distributed. You must specify exactly two of the core parameters.

To create independent and identically distributed log-normal distributions along one or more indexes, specify those indexes using the optional «over» parameter.

Generates a sample with a lognormal distribution given «median» and «gsdev» (geometric standard deviation), or «mean» and «stddev» (standard deviation).

DensLogNormal( x, median, gsdev, mean, stddev )

The analytic probability density function. Returns the probability density at «x». Exactly two of the parameters «median», «gsdev», «mean», or «stddev» must be provided.

CumLogNormal( x, median, gsdev, mean, stddev )

The analytic cumulative density function. Returns the probability that the outcome is less than or equal to «x».

Exactly two of the parameters «median», «gsdev», «mean», or «stddev» must be provided.

CumLogNormalInv( p, median, gsdev, mean, stddev )

The inverse cumulative density function (aka quantile function). Returns the «p»^th fractile/quantile/percentile.

Exactly two of the parameters «median», «gsdev», «mean», or «stddev» must be provided.

Statistics

Examples

A Normal distribution is symmetric around its mean:

If x := Normal(mean, sdev), then P(x <= mean - sdev) = P(x >= mean + sdev) = .15.

Analogously, a lognormal distribution is ratio-symmetric around its median:

If y := LogNormal(median, gsdev), then P(y <= median/gsdev) = P(y >= median*gsdev) = .15.

If you specify no parameters, it defaults to standard lognormal -- i.e. whose natural logarithm is a unit normal, mean 0 and standard deviation 1.

You can actually specify any two of the four parameters, from which it can compute the other two:

LogNormal(median: med, gsdev: gs) or just LogNormal(med, gs)

LogNormal(median: med, stddev: sd)

LogNormal(median: med, mean: mu)

LogNormal(mean: mu, stddev: s)

LogNormal(mean: mu, gsdev: gs)

LogNormal(gsdev: gs, stddev: sd)

If you specify more than two parameters, it will give an error.

Like other distributions, you can also give one or more «Over» indexes. These cause it to generate an array of independent lognormal distributions over the specified index(es). For example,

LogNormal(m, gsd, Over: i)

Syntax:

LogNormal(median, gsdev, mean, stddev: Optional Positive; over: ... Optional Atom)

Parameter Estimation

Suppose X contains sampled historical data indexed by I, and consisting solely of positive values. To estimate the parameters of the best-fit LogNormal distribution, the following parameter estimation formulae can be used:

«median» := Median(X, I) or Exp(Mean(Ln(X), I))

«gsdev» := Exp(SDeviation(Ln(X), I))

A more general form, with one extra degree-of-freedom, is the LogNormal with an offset, i.e.,:

LogNormal(median, gsdev) - offset

The more general form can be adapted to data sets containing negative numbers. The offset is constrained so that

offset > -Min(X, I)

To my knowledge, a closed form formula for offset does not exist, so that finding the optimal value of offset requires a 1-D search or optimization. However, I have found that the following heuristic estimation formulae comes extremely close to the best-fit parameters with offset:

offset := -Min(X, I) + 2*(Median(X, I) - Min(X, I))/Sum(1, I)

median := Median(X + offset, I)

gsdev := Exp(SDeviation(Ln(X + offset), I))

From Median and Percentile

Suppose you have a median (m) and 95th percentile estimate (p95). This is enough to uniquely determine the log normal distribution. In this case the «gsdev» is given by

[math]\displaystyle{ \left( {p95 \over m} \right)^{\left({1 \over {\Phi^{-1}(0.95)}}\right)} = \left( {p95 \over m} \right)^{0.6079568319149189} }[/math]

where [math]\displaystyle{ \Phi^{-1}(p) }[/math] is the CumNormalInv function. Hence the distribution expression is

LogNormal(m, (p95/m)^0.6079568319149189 )

The general form for the [math]\displaystyle{ p^{th} }[/math] percentile, when [math]\displaystyle{ p\gt 0.5 }[/math], where [math]\displaystyle{ q }[/math] is the percentile estimate, is

LogNormal( m, (q/m)^(1/CumNormalInv(p)) )

For [math]\displaystyle{ p\lt 0.5 }[/math], the expression is

LogNormal( m, (m/q)^(1/CumNormalInv(1-p)) )