Monte Carlo and probabilistic simulation

Analytica User GuideExpressing UncertaintyMonte Carlo and probabilistic simulation

Probabilistic simulation means simulating probabilistic variables by selecting a random sample from each distribution. Analytica offers four sampling methods, Monte Carlo simulation, Median Latin hypercube (the default), Random Latin hypercube, and Sobol sampling (new to Analytica 5.0). We describe each of them, and then explain how to select among them.

Monte Carlo sampling

The most widely used sampling method is known as Monte Carlo, named after the randomness in games of chance, such as at the famous casino in Monte Carlo. In this method, each of the m sample points for each uncertainty quantity, X, is generated at random from X with probability proportional to the probability density (or probability mass for discrete quantities) for X. Analytica generates m uniform random values, u_i, for i = 1, 2,...m, between 0 and 1, using the specified random number method (see below). It then uses the inverse of the cumulative probability distribution to generate the corresponding values of X,

X_i where P() = u_i for i = 1, 2,...m

With the simple Monte Carlo method, each value of every random variable X in the model, including those computed from other random quantities, is a sample of m independent random values from the true probability distribution for X. You can therefore use standard statistical methods to estimate the accuracy of statistics, such as the estimated mean or fractile (percentile) of a distribution, as described in Selecting the Sample Size.

Median Latin hypercube

Median Latin hypercube sampling is the default method: It divides each uncertain quantity X into m equiprobable intervals, where m is the sample size. The sample points are the medians of the m intervals, that is, the fractiles:

X_i where P() = (i - 0.5)/m, for i = 1, 2,...m.

These points are then randomly shuffled so that they are no longer in ascending order, to avoid nonrandom correlations among different quantities.

Random Latin hypercube

The random Latin hypercube method is similar to the median Latin hypercube method except that, instead of using the median of each of the m equiprobable intervals, it samples at random from each interval. With random Latin hypercube sampling, each sample is a true random sample from the distribution, as in simple Monte Carlo. However, the samples are not totally independent because they are constrained to have one sample from each of the n intervals.

Sobol sampling

The Sobol Sampling method is a quasi-Monte Carlo method that attempts to spread sample points out more evenly in probability space across multiple dimensions than Monte Carlo or Latin hypercube methods. Latin hypercube methods spread samples out uniformly for each scalar quantity separately, but since it samples from each quantity (dimension) independently, the coverage of the multidimensional space may not be very uniform. Sobol samples in a way that coordinates across multiple dimensions, creating a more even sampling than Latin hypercube. It does this by applying Sobol sequences to each scalar quantity.

Sobol sampling comes with a much stronger convergence guarantee than pure Monte Carlo sampling. Monte Carlo sampling error converges as [math]\displaystyle{ O(1/\sqrt{n}) }[/math] whereas Sobol converges as [math]\displaystyle{ O(\log(n)^d / n) }[/math], where d is the number of uncertain scalar quantities. For a fixed d, Sobol's convergence rate at extremely large n approaches [math]\displaystyle{ O(1/n) }[/math], often seen as the holy grail of simulation. However, since [math]\displaystyle{ \log(n)^d }[/math] is a very large number when d is even of moderate size, the guaranteed bound is not as useful as you might think.

Choosing a sampling method

The advantage of Latin hypercube methods is that they provide more uniform distributions of samples for each distribution than simple Monte Carlo sampling. Median Latin hypercube, since it uses the median of each equiprobable interval is even more uniformly distributed than random Latin hypercube. If you display the PDF of a variable that is defined as a single continuous distribution, or is a function of just one continuous uncertain variable, the distribution usually looks fairly smooth even with a small sample size (such as 20) with median Latin hypercube sampling -- where simple Monte Carlo results looks quite noisy.

The advantage of Latin hypercube in reducing noise reduces when the result depends on two or more uncertain quantities that have comparable effects on the result, with the noise increasing with the number of uncertain quantities performance of the Latin hypercube. For more than 5 or so uncertain quantities, Latin hypercube methods might not be discernibly better than simple Monte Carlo. Since the median Latin hypercube method is sometimes much better, and almost never worse than the others, Analytica uses it as the default method.

When not to use Latin hypercube sampling

In very rare situations, median Latin hypercube can produce poor results -- when the model includes a periodic function (like a Sin function) and the period is similar to the size of the equiprobable intervals on the uncertain parameter. For example:

X := Uniform(1, Samplesize)

Y := Sin(2*Pi*X)

In this case, median Latin hypercube sampling gives bad results --- so you should use random Latin hypercube or simple Monte Carlo, which avoids this problem. But, the vast majority of models have no periodic function of this kind, so you do not need to worry about the reliability of median Latin hypercube sampling.