Cdf and Pdf Functions
PDF(x) and CDF(x)
These functions generate histograms for a sample for quantity X. They are the same as the functions available in the PDF and CDF uncertainty view for a graph or table. But, they return their results as arrays available for further processing or export. PDF generates a mass function or density function according to whether X is discrete or continuous. CDF generates a cumulative mass or cumulative distribution function.
- Parameters
- In addition to their required parameter, X, they have the following optional parameters:
where:
- X: The sample data points, indexed by I
- I: The index over which they generate the histogram. By default this is Run -- i.e. a Monte Carlo sample -- but you can also specify another index to generate a histogram over another dimension.
- w: The sample weights. Can be used to weight each sample point differently.
- discrete: Set true or false to force discrete or continuous treatment (see below).
- method: Selects the histograming method used (equal-X, equal-sample-P, equal-weighted-P). Otherwise it uses the system default set in the Uncertainty setup dialog from the Result menu.
- samplesPerStep: Controls bin size. Otherwise it uses the system default set in the Uncertainty setup dialog from the Result menu.
- domain: variable containing the domain information.
These can be applied to a weighted sample.
- syntax
- PDF(x: [I]; I: IndexType=Run; w: NonNegative[I]=SampleWeighting;
discrete: optional boolean; method, samplesPerStep: optional positive; domain: Unevaluated = x)
CDF(x: [I]; I: IndexType=Run; w: NonNegative [I]=SampleWeighting; discrete: Optional Boolean; method, samplesPerStep: Optional Positive; domain: Unevaluated = x)
= Simple Usage
The simplest and most typical usage returns the PDF or CDF table that you would see in a result view when viewing the result in PDF or CDF mode. To get this result, the functions are called with a single parameter, e.g.:
PDF(Ch1) CDF(Ch1)
Here the distribution, Ch1, contains uncertainty, and is therefore has a sample indexed by Run.
Histogram of Data
PDF and CDf can be applied to arrays of data indexed by something other than Run to generate a histogram. For example, to histogram a quantity X along index J, use:
PDF( X, J ) CDF( X, J )
Is the distribution discrete or continuous?
PDF(X) generate a probability mass function or density function according to whether it thinks X is discrete or continuous. CDF(x) does the same, generating a cumulative mass or cumulative probability function. If X contains text values it knows X must be discrete. If X contains numbers with few or no identical values, it guesses continuous. If X contains numbers with many identical values, it guesses discrete.
Usually, they guess correctly. But, sometimes, such as with discrete distributions over a wide range of integers, it may be ambiguous. In such cases, there are two ways to make sure it does what you want:
- If X is a variable, you can specify its Domain attribute as:
- Continuous
- Discrete Numeric, Categorical, List of Numbers, List of Labels, or Index -- all of which it treats a discrete.
- Automatic is the default, meaning Analytica guesses.
- If X is an expression, specify the optional parameter Discrete to PDF or CDF as True or False
If X contains text values, i.e. categorical data, you may want to control the order of the categories, e.g. ["Low", "Medium", "High"]. You can do this by specifying the its Domain as a List of Labels with these values, or as an Index, referring to an Index using them. Alternatively, you can provide a list of labels to the optional Domain parameter of PDF or CDF. If X is an expression rather than a variable, this is your only choice.
Weighted Data
Normally, each point in a data set or sample carries equal weight. However, in some situations data or sample points may have unequal weights. When the running index is Run (i.e., the case of variables with uncertainty), the SampleWeighting system variable provides the default weighting (which itself defaults to equally weighted points). The default weighting can be provided explicity using the w parameter, for example:
PDF( Total_revenue, w: (SalesByRegion<ProjectedSales)[Region='East Coast'] )
This expression computes the posterior probability of total revenue given that the east coast sales are less than projected, which is accomplished by providing a zero weight for all points not consistent with the assumption.
Detailed Description
PDF and CDF behave differently depending on whether the domain of x is discrete or continuous. PDF and CDF determine whether the domain is discrete or continuous as follows. If x contains non-numerics, then the domain is discrete. Otherwise, if the optional discrete parameter is specified, its value (true=discrete, false=continuous) is used. Otherwise, if the domain parameter contains a variable identifier, the domain attribute for that variable is consulted. (The user of PDF/CDF would seldom, if ever, explicitly specify the domain parameter, but if the first parameter to PDF/CDF is a variable identiifer, then the domain parameter will pick that up). If the domain attribute is set to Continuous, then a continuous domain is used. If it is set to Discrete (numeric or categorical), if the domain is an explicit list or list of labels, or if it is set to an Index, then a discrete domain is used. Otherwise (i.e., the domain attribute is automatic", or the domain parameter is not a variable identiifer, PDF uses some heuristics to "guess" whether x is discrete numeric or continuous. The heuristics judge such things as whether the value in x appear to be regular integer multiples (as would occur from a discrete distribution such as Poisson or Binomial).
When PDF/CDF uses a discrete domain, the domain parameter contains a variable identifier, and the domain attribute of that variable contains an explicit list of values, or an index with explicit values, then those values are used, in that order, as the domain of PDF/CDF. If no such domain declaration is available, then the set of unique values in x are used as the domain. If a variable with an explicit domain was found, that variable serves as the index of possible values. If so such domain variable was utilized, a local magic "magic" local index named "PossibleValues" is used. The result is indexed either by this domain index or the local "PossibleValues" index. The value in each cell of the array is the relative frequency of occurrence of that value.
When PDF uses a continuous domain, the result will be indexed by "Step" and "DensityIndex" (plus any abstracted indexes in the parameters). Step is a "magic" local index with the name "Step". DensityIndex is a system variable index containing two elements, ["X", "Y"]. The "X" column of the result contains the centroid for each "bin" of the histogram, while the "Y" column contains the density estimate for that bin.
To construct a continuous PDF/CDF, the algorithm must partition the set of reals into bins. The key operation is determining where to place these bins (or, more accurately, the boundaries between these bins). There are three algorithms that may be employed for doing this: EqualX (method=0), Equal Weighted Prob (method=1) and Equal Sample Prob (method=2). EqualX divides the range of value occuring in X into equal sided intervals. The Equal Weighted P selects bins with variable sizes so that each bin contains the same amount of weighted probability mass. The Equal Sample P method selects bins with variable sizes so that roughly the same number of points fall into each bin. (With a constant weighting, Equal Weighted P and Equal Sample P should be identical, up to numeric round-off effects). The samplesPerStep controls how finely partitioned the histogram is, specifying how many points, on average, should land in each bin. These controls can be supplied explicitly via optional parameters to PDF or CDF, or if they aren't specified, PDF will obtain them from the settings specified on the Uncertainty Settings dialog. If x (i.e., domain) is a variable identifier, then the local settings for that variable are used, otherwise, the local settings for the variable whose definition contains the call to PDF is used. If that is not set, then the global settings are used. Analytica defaults to an EqualX method for PDF, and an EqualP method for CDFs.
Once the bins are selected, the density estimate is just the ratio of the proportion of points in the bin divided by the bin's width.
Enable comment auto-refresher