SampleSize


SysVar SampleSize

This system variable contains the current sample size used in Monte Carlo simulations. You can use this value in expressions. You can equivalently obtain the sample size using Size(Run).

The sample size is changed using the Uncertainty Setup dialog, which you can get to on the menu from ResultUncertainty Options... or by pressing Ctrl+U. Then select Analysis option: Uncertainty Sample'.

Selecting a Sample Size

How do you select the appropriate sample size for your problem?

There is a basic tradeoff when selecting sample size. Larger sample sizes require longer evaluation times and more memory, while smaller sample sizes result in greater sampling error. While perfecting your model's logic, you may want to use a very small sample size, even though the sampling error would be unacceptable, but then when you're ready to compute your final results, up the sample size accordingly.

Most people substantially over-estimate the sample sizes they really need for good results for their specific problem. This is partly because the appearance of PDF graphs are among the most sensitive to sampling error, and irregular appearance of PDF graphs often leads people to think they need very large sample sizes. But even with choppy-looking PDFs, it may be the case that the sampling error for specific fractile levels, mean and standard deviations is quite small.

However, if you are require extreme fractile levels, accurate higher-order moments (e.g., Skewness, Kurtosis), or highly reliable sensitivity/importance estimates, it really may require very large sample sizes to achieve low sampling error.

The best way to figure out the appropriate tradeoffs for your application is by experimentation designed to empirically measure and estimate your sampling error at particular sample sizes. Select a fixed sample size, and then evaluate your key results. Reset (e.g., but changing the sample size to something else, then resetting it back again) and repeat. Do this several times to see how much your results vary from run to run. By observing that variation, you are directly observing the sample error.

The experiment can automated (for each fixed sample size) by introducing an index and allowing array abstraction to repeat the experiment for you. Your uncertain results will all be indexed by this new index, and you can directly view the variation across your results. Doing this, however, requires that you add an «over» parameter to all your calls to distribution functions. For example, if you have:

Variable X := Normal(a, b)

you'll need to change it to:

Variable X := Normal(a, b, over: meta_run)

and define your index:

Index Meta_run := 1..10

Having done this, then you can examine your results of interest and see how these vary over the Meta_run index. This measures the observed sampling error in the fractile levels of interest:

Variable Est_samp_err := SDeviation(ProbBands(My_result), Meta_run)

If you surgically introduce Meta_run into your model in this fashion, it turns out to be better to define Meta_run as a variable object, rather than as an index. If you do that, then you can easily change its definition to a scalar, e.g.:

Variable Meta_run := 0

when you aren't performing this meta-analysis of sampling error.

Theoretical determination of Sample Size

In many cases, you can also compute the "theoretical" sample size required with a little algebra. This is detailed in the Selecting the Sample Size in the Analytica User Guide.

See Also

Comments


You are not allowed to post comments.