Kernel Density Smoothing
new to Analytica 4.4
Density Estimation Methods
Analytica represents the uncertainty of a variable as a Monte Carlo sample of representative points. The various uncertainty result views, including the Probability Density view, are all derived from the underlying sample when the result window is shown. Analytica has two basic methods for obtaining the estimate of the probability density from the underlying sample: Histogramming and Kernel Density Smoothing. The method to be used can be selected via the Uncertainty Options dialog as seen in the images above. The smoothing method is new to Analytica 4.4.
Histogram derives an estimate of the probability density of a continuous distribution by dividing the range of possible values into distinct bins, and then counting how many sample points land in each bin.
Smoothing estimates the probability density by replacing each sample point with a small Gaussian curve (called the Kernel) and then sums up all the Gaussian curves to obtain a net smoothed curve.
Histograms
The histogram method divides the range of possible values into distinct non-overlapping bins, then counts how many samples land in each bin. The center value for each bin is then plotted, with the density estimate equal to the fraction of points that landed in that bin divided by the width of the bin.
There are two basic ways to divide the range of possible values into bins: Equal-X or Equal-P.
The Equal-X method divides the range from the smallest to largest occurring value into equal sized bins. The number of points landing in each bin may vary considerably from bin to bin, and some bins may have no points in them at all.
The Equal-P method divides the cumulative probability axis (from 0.0 to 1.0) into equal sized intervals, causing the bins to be sized so that an approximately equal number of points land in each bin. The width of each bin varies.
In normal Monte Carlo sampling, every sample point is weighted equally. Some techniques such as rare-event modeling, importance sampling, and Bayesian likelihood posterior computations make use of non-equally weighted sampling. When these techniques are employed, two methodologies for Equal-P sampling are possible: Equal weighted steps and Equal sample steps. With equally weighted probability steps, bins are sized so that each bin ends up with approximately the same total probability weight. With equal sample steps, bins are sized so that a nearly equal number of points land in each bin. In the latter case, the weight of points may vary substantially from bin to bin (in fact, some bins may end up with zero weight when points with zero weight exist).
When using the histogram method, you have control over the average number of samples per bin. This parameter controls both how smooth the resulting PDF estimate is, and how many points are plotted. As you increase sample size, you will often want to increase this value as well, since with more sample points you are able to attain a smoother meaningful curve.
The PDF result for a histogrammed PDF uses the Step-line style by default, to emphasize the bins involved. From Graph Setup... you can change this to a normal line style for a different (smoother) effect.
Kernel Density Smoothing
If you want to use the ramdom sample to get an idea of the underlying PDF, you can improve on the histogram using some kind of smoothing technique. There are various smoothing techniques you might try. But one smoothing technique that, in some cases, produces rather awesome results is called Kernel Density Smoothing, based on a technique called Kernel Density Estimation (KDE). If you click on the radio button "Smoothing" this activates that technique.
Kernel Density Estimation is a general approach to the smoothing problem. In 4.4 we are using one variation, based on what is called the Fast Gaussian Transform. In essence, we replace each sample point x from your random sample by a smear over a normal distribution of values that sample point x might have had. This normal distribution has a standard deviation, or bandwidth, let's call it h. To get the KDE curve, we simply add up all these normal distributions for all the sample points. So, if you select "Smoothing," the curve you see plotted is this KDE curve.
This is a Fast Gaussian Transform because if you try to calculate the sum of all these Gaussians in a naive fashion, and your sample size is large (e.g., 1,000,000) computation time can be huge. But through a trick involving such esoteric techniques as Hermite Series and Taylor Series, computation time can be reduced significantly, and that is the secret of the Fast Gaussian Transform.
You will notice that if you click on "Smoothing," the panel is altered. The "Samples per PDF Step interval" text field disappears, as also the radio buttons for "Equal X axis steps" and so on. These no longer apply.
Degree of Smoothing
Analytica analyzes the underlying sampling data and the sample size to arrive at a suggested degree of smoothing (the optimal bandwidth). You can, however, override this to obtain greater detail or greater smoothness as you see appropriate for particular cases. The smoothness factor gives you these options:
- maximum detail: minimum h value
- medium detail: medium low h value
- default: system determined h value
- medium smoothing: medium high h value
- maximum smoothing: maximum h value
As the bandwidth h decreases, the KDE smoothed PDF gets more sensitive to the random variation in your random sample, and can get quite wavy. As bandwidth h increases, the KDE smoothed PDF gets less sensitive to random variation and gets lots smoother, however, the match to the true underlying PDF may not be so good. Greater degrees of smoothing will artificially increase the apparent variance of KDE curve and lower the peak.
Determining the optimal bandwidth, h, value is a difficult problem in general. The smoothing factor is offered as a way to try different h values, and judge, by eyeballing the graphs produced, what looks best.
One clue here: compare the KDE smoothed graph with the histogram, to determine what smoothing factor seems to smooth the original histogram best. Also, in this process, for the histogram, try different "Samples per PDF step interval" values, since the histogram's random variation is sensitive to this. Find the best fit, between combinations of the smoothing factor and the samples per PDF step interval, and that might well be your best estimate of the underlying PDF.
See Also
- Pdf(..) function




Enable comment auto-refresher