Kernel Density Smoothing

Revision as of 21:39, 29 June 2011 by Lchrisman (talk | contribs)


new to Analytica 4.4

Density Estimation Methods

Histogram options
Kernel Density Smoothing options

In Analytica 4.4, if you go to the Uncertainty Options dialog (an option under the Results drop down menu), and select Probability Density from the Analysis option drop down menu, you will see the Probability Density panel has changed. There are two new radio buttons: "Histogram" and "Smoothing."

"Histogram" estimates the probability density of a continuous distribution by dividing the range of possible values into distinct bins, and then counting how many sample points land in each bin.

Smoothing estimates the probability density by replacing each sample point with a small Gaussian curve (called the Kernel) and then sums up all the Gaussian curves to obtain a net smoothed curve. This is a new feature in 4.4.

PDF result via Histogramming
PDF result via Kernel density estimation

When you generate a random variable, either from a built in distribution, or from a sequence of calculations based on random distributions, there is an underlying theoretical PDF. Before 4.4, you are able to graph this PDF as a histogram, and able to use this histogram for further calculations. The histogram gives you an indication of what the underlying PDF is, but can be quite sensitive to your random sampling methodology.

Kernel Density Smoothing

If you want to use the ramdom sample to get an idea of the underlying PDF, you can improve on the histogram using some kind of smoothing technique. There are various smoothing techniques you might try. But one smoothing technique that, in some cases, produces rather awesome results is called Kernel Density Smoothing, based on a technique called Kernel Density Estimation (KDE). If you click on the radio button "Smoothing" this activates that technique.

Kernel Density Estimation is a general approach to the smoothing problem. In 4.4 we are using one variation, based on what is called the Fast Gaussian Transform. In essence, we replace each sample point x from your random sample by a smear over a normal distribution of values that sample point x might have had. This normal distribution has a standard deviation, or bandwidth, let's call it h. To get the KDE curve, we simply add up all these normal distributions for all the sample points. So, if you select "Smoothing," the curve you see plotted is this KDE curve.

This is a Fast Gaussian Transform because if you try to calculate the sum of all these Gaussians in a naive fashion, and your sample size is large (e.g., 1,000,000) computation time can be huge. But through a trick involving such esoteric techniques as Hermite Series and Taylor Series, computation time can be reduced significantly, and that is the secret of the Fast Gaussian Transform.

You will notice that if you click on "Smoothing," the panel is altered. The "Samples per PDF Step interval" text field disappears, as also the radio buttons for "Equal X axis steps" and so on. These no longer apply.

Degree of Smoothing

Analytica analyzes the underlying sampling data and the sample size to arrive at a suggested degree of smoothing (the optimal bandwidth). You can, however, override this to obtain greater detail or greater smoothness as you see appropriate for particular cases. The smoothness factor gives you these options:

  • maximum detail: minimum h value
  • medium detail: medium low h value
  • default: system determined h value
  • medium smoothing: medium high h value
  • maximum smoothing: maximum h value

As the bandwidth h decreases, the KDE smoothed PDF gets more sensitive to the random variation in your random sample, and can get quite wavy. As bandwidth h increases, the KDE smoothed PDF gets less sensitive to random variation and gets lots smoother, however, the match to the true underlying PDF may not be so good. Greater degrees of smoothing will artificially increase the apparent variance of KDE curve and lower the peak.

Determining the optimal bandwidth, h, value is a difficult problem in general. The smoothing factor is offered as a way to try different h values, and judge, by eyeballing the graphs produced, what looks best.

One clue here: compare the KDE smoothed graph with the histogram, to determine what smoothing factor seems to smooth the original histogram best. Also, in this process, for the histogram, try different "Samples per PDF step interval" values, since the histogram's random variation is sensitive to this. Find the best fit, between combinations of the smoothing factor and the samples per PDF step interval, and that might well be your best estimate of the underlying PDF.

See Also

  • Pdf(..) function
Comments


You are not allowed to post comments.