Difference between revisions of "Kernel Density Smoothing"

Revision as of 19:48, 29 June 2011

This page to describe the new "Smoothing" option on the Uncertainty Settings→Probability Density pane: What it is, how it smooths. Maybe some warning about how smoothing causes an increase in the apparent variance. Why it may be preferable to using a histogram method. How the PDF function can be used in unusual cases where you want more control -- a custom smoothing factor, fewer or more plotted points, etc.

Written assuming reader understands Monte Carlo samples, the basics of continuous probability distributions, but doesn't know what KDE is.

In Analytica 4.4, if you go to the Uncertainty Options dialog (an option under the Results drop down menu), and select Probability Density from the Analysis option drop down menu, you will see the Probability Density panel has changed. There are two new radio buttons: "Histogram" and "Smoothing."

"Histogram" will give you a Probability Density Function (PDF) graph which is a histogram, or step function, as before.

However, "Smoothing" will give you a smoothed, continuous curve for your PDF. This is a new feature in 4.4.

When you generate a random variable, either from a built in distribution, or from a sequence of calculations based on random distributions, there is an underlying theoretical PDF. Before 4.4, you are able to graph this PDF as a histogram, and able to use this histogram for further calculations. The histogram gives you an indication of what the underlying PDF is, but can be quite sensitive to your random sampling methodology.

If you want to use the ramdom sample to get an idea of the underlying PDF, you can improve on the histogram using some kind of smoothing technique. There are various smoothing techniques you might try. But one smoothing technique that, in some cases, produces rather awesome results is called Kernel Density Smoothing, based on a technique called Kernel Density Estimation (KDE). If you click on the radio button "Smoothing" this activates that technique.

Kernel Density Estimation is a general approach to the smoothing problem. In 4.4 we are using one variation, based on what is called the Fast Gaussian Transform. In essence, we replace each sample point x from your random sample by a smear over a normal distribution of values that sample point x might have had. This normal distribution has a standard deviation, or bandwidth, let's call it h. To get the KDE curve, we simply add up all these normal distributions for all the sample points. So, if you select "Smoothing," the curve you see plotted is this KDE curve.

This is a Fast Gaussian Transform because if you try to calculate the sum of all these Gaussians in a naive fashion, and your sample size is large (e.g., 1,000,000) computation time can be huge. But through a trick involving such esoteric techniques as Hermite Series and Taylor Series, computation time can be reduced significantly, and that is the secret of the Fast Gaussian Transform.

@@ Line 13: / Line 13: @@
 When you generate a random variable, either from a built in distribution, or from a sequence of calculations based on random distributions, there is an underlying theoretical PDF. Before 4.4, you are able to graph this PDF as a histogram, and able to use this histogram for further calculations. The histogram gives you an indication of what the underlying PDF is, but can be quite sensitive to your random sampling methodology.
-If you want to use the ramdom sample to get an idea of the underlying PDF,
+If you want to use the ramdom sample to get an idea of the underlying PDF, you can improve on the histogram using some kind of smoothing technique. There are various smoothing techniques you might try. But one smoothing technique that, in some cases, produces rather awesome results is called Kernel Density Smoothing, based on a technique called Kernel Density Estimation (KDE). If you click on the radio button "Smoothing" this activates that technique.
+Kernel Density Estimation is a general approach to the smoothing problem. In 4.4 we are using one variation, based on what is called the Fast Gaussian Transform. In essence, we replace each sample point x from your random sample by a smear over a normal distribution of values that sample point x might have had. This normal distribution has a standard deviation, or bandwidth, let's call it h. To get the KDE curve, we simply add up all these normal distributions for all the sample points. So, if you select "Smoothing," the curve you see plotted is this KDE curve.
+This is a Fast Gaussian Transform because if you try to calculate the sum of all these Gaussians in a naive fashion, and your sample size is large (e.g., 1,000,000) computation time can be huge. But through a trick involving such esoteric techniques as Hermite Series and Taylor Series, computation time can be reduced significantly, and that is the secret of the Fast Gaussian Transform.