Difference between revisions of "Statistical functions"

(Created page with "Category:Analytica User Guide <breadcrumbs>Analytica User Guide > Statistics, Sensitivity, and Uncertainty Analysis > {{PAGENAME}}</breadcrumbs> __TOC__ ==See Also== <f...")
 
Line 4: Line 4:
 
__TOC__
 
__TOC__
  
 +
Statistical functions compute a statistic from a probability distribution. More precisely, they estimate
 +
the statistic from a random sample of values representing a probabilistic value. Common
 +
examples are Mean, Variance, Correlation, and Getfract (which returns a fractile or percentile).
 +
The [[uncertainty view]] options (page 29) available in the Result window use these functions.
 +
 +
'''Statistical functions force prob mode evaluation:''' Unlike other functions, statistical functions usually force their main parameter(s) to be evaluated in prob mode (probabilistically) and they return a nonprobabilistic value — whether they are evaluated in a mid mode or prob mode. For example:
 +
 +
Chance X := Normal(0, 1)
 +
Variable X90 := Getfract(X, .9)
 +
X90 &rarr; 1.259
 +
 +
Evaluating variable X90 causes variable X to be evaluated in prob mode, so that Getfract(X,
 +
90%) can estimate the 90th percentile (0.9 fractile) of the distribution for X. X90 itself has only a
 +
mid value, and no probabilistic value. The exception is the Mid(x) function that forces X to be
 +
evaluated in mid mode, no matter the evaluation context.
 +
 +
'''Statistics from nonprobabilistic arrays:''' The default usage of statistical functions is over a probability distribution, represented as a random sample indexed by Run. You can also use these functions to compute statistics over an array with a different index by specifying that index explicitly. This is often useful for computing statistics from data tables — including if you want to fit a probability distribution to a set of data. For example, suppose Data is an array of imported measurements:
 +
 +
Index K := 1..1000
 +
Variable Data:= Table(K)(123.4, 252.9, 221.4, ...)
 +
Variable Xfitted := Normal(Mean(Data, K), Sdeviation(Data, K)
 +
 +
<code>Xfitted</code> is a normal distribution fitted to <code>Data</code> with the same mean and standard deviation.
 +
 +
<tip title="Tip">
 +
All statistical functions produce estimates from the underlying random sample for each probabilistic
 +
quantity. These estimates are not exact, but vary from one evaluation to the next due to the
 +
variability inherent in random sampling. Hence, your results might not exactly match the results
 +
shown in the examples here. For greater precision, use a larger sample size (see “[[Appendix A:
 +
Selecting the Sample Size]]” on page 424 on how to select a sample size).
 +
</Tip>
 +
 +
'''Notation in formulas:''' The formulas used to define statistics use this notation:
 +
 +
xi The ith sample value of probabilistic variable x
 +
The mean of probabilistic variable x (see “Mean(x)” on page 297)
 +
s Standard deviation (see “Sdeviation(x)” on page 297)
 +
m Sample size (see “Appendix A: Selecting the Sample Size” on page 424)
 +
 +
'''Statistics and textvalued distributions:'''Most statistical functions require their parameters to be numerical. A few statistical functions, those that only requiring ordinal (ordered) values, also work on distributions with text values (whose domain is a list of labels), namely Frequency (use Frequency(X, X)), Mid, Min, Max, Probability_bands, and Sample. These functions assume the values are ordered as specified in the domain list of labels, e.g., Low, Mid, High.
 +
 +
'''Example model:''' The examples in this section use the following variables:
 +
Variable Alt_ fuel_ price := Normal(1.25, 0.1)
 +
Variable Fuel_price := Normal(1.19, 0.1)
 +
Variable Skfuel_price := Beta(4, 2,1,1.5)
 +
 +
==Mean(x)==
 +
Returns an estimate of the mean of x if x is probabilistic. Otherwise, returns x.
 +
[[Mean]](x) uses this formula.
 +
 +
<center><math>
 +
\frac{1}{m} \sum_{i=1}^{m} x_i = \bar x
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Examples:'''
 +
 +
<code>Mean(Fuel_price) &rarr; 1.19</code>
 +
 +
<code>Mean(Skfuel_price) &rarr 1.33</code>
 +
 +
==Median(x)==
 +
Returns an estimate of the median of x from its sample if x is probabilistic. When x is non-probabilistic, returns x. Equivalent to GetFract(x,0.5).
 +
 +
'''Library:''' Statistical
 +
 +
'''Examples:'''
 +
<code>Median(Fuel_price) &rarr; 1.19</code>
 +
 +
==Sdeviation(x)==
 +
Returns an estimate of the standard deviation of x from its sample if x is probabilistic. If x is nonprobabilistic, returns 0.
 +
 +
[[Sdeviation]](x) uses this formula.
 +
 +
<center><math>
 +
\frac{1}{m - 1} \sum_{i=1}^{m} (x_i - \bar x)^2 = \sigma
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''  <code>Sdeviation(Fuel_price) &rarr; 0.10</code>
 +
 +
==Variance(x)==
 +
 +
Returns an estimate of the variance of x if x is probabilistic. If x is non-probabilistic, returns 0.
 +
[[Variance]]() uses this formula.
 +
 +
<center><math>
 +
\frac{1}{m - 1} \sum_{i=1}^{m} (x_i - \bar x)^2 = \sigma^2
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Variance(Fuel_price) &rarr; 0.01</code>
 +
 +
==Skewness(x)==
 +
Returns an estimate of the skewness of x. x must be probabilistic.
 +
 +
Skewness is a measure of the asymmetry of the distribution. A positively skewed distribution has
 +
a thicker upper tail than lower tail, while a negatively skewed distribution has a thicker lower tail
 +
than upper tail. A normal distribution has a skewness of zero.
 +
 +
[[Skewness]]() uses this formula.
 +
 +
<center><math>
 +
\frac{1}{m} \sum_{i=1}^{m} [\frac {x_{i} - \bar x}{\sigma}]^3
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Skewness(Skfuel_price) &rarr; -0.45</code>
 +
 +
==Kurtosis(x)==
 +
Returns an estimate of the kurtosis of x. x must be probabilistic.
 +
 +
Kurtosis is a measure of the peakedness of a distribution. A distribution with long thin tails has a
 +
positive kurtosis. A distribution with short tails and high shoulders, such as the uniform distribution,
 +
has a negative kurtosis. A normal distribution has zero kurtosis.
 +
 +
'''[[Kurtosis]](x)''' uses this formula.
 +
 +
<center><math>
 +
(\frac{1}{m} \sum_{i=1}^{m} [\frac {x_{i} - \bar x}{\sigma}]^4) - 3
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Kurtosis(Skfuel_prices) &rarr; -0.48</code>
 +
 +
==Probability(b)==
 +
Returns an estimate of the probability or array of probabilities that the Boolean value b is True.
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Probability(Fuel_price < 1.19) &rarr; 0.5</code>
  
 
==See Also==
 
==See Also==
 
<footer>Statistics, Sensitivity, and Uncertainty Analysis / {{PAGENAME}} / Weighted statistics and w parameter</footer>
 
<footer>Statistics, Sensitivity, and Uncertainty Analysis / {{PAGENAME}} / Weighted statistics and w parameter</footer>
 +
 +
==GetFract(x, p, I)==
 +
Returns an estimate of the pth fractile (also known as quantile or percentile) of x over index I. The index I is optional. If it is omitted, the function operates over the Run index and returns probability fractiles. This is the value of x such that x has a probability p of being less than that value. If x is constant over index I --for example, a non-probabilistic variable using Run as the fractile index-- ,all fractiles are equal to x.
 +
 +
The value of p must be a number or array of numbers between 0 and 1, inclusive.
 +
 +
'''Library:''' Statistical
 +
 +
'''Examples:'''
 +
 +
<code>Getfract(x, 0.5)returns an estimate of the median of x.</code>
 +
 +
<code>Getfract(Fuel_price, 0.5) &rarr 1.19</code>
 +
The following returns a table containing estimates of the 10%ile and 90%ile values, that is, an 80% confidence interval.
 +
 +
<code>Index Fract := [0.1, 0.9]</code>
 +
<code>Getfract(Fuel_price, Fract) &rarr;</code>
 +
 +
{| class="wikitable"
 +
! colspan="2" | Fract &#9654;
 +
|-
 +
! 0.10
 +
! 0.90
 +
|-
 +
| 1.06
 +
| 1.32
 +
|}
 +
 +
==ProbBands(x)==
 +
Returns an estimate of probability or “confidence” bands for x if x is probabilistic. Otherwise
 +
returns x for every band. The probabilities are specified in the [[Uncertainty Setup dialog]]
 +
(page 257), Probability Bands option.
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Probbands(Fuel_price) &rarr;<code>
 +
 +
{| class="wikitable"
 +
! colspan="5" | Probability &#9654;
 +
|-
 +
! 0.05
 +
! 0.25
 +
! 0.5
 +
! 0.75
 +
! 0.95
 +
|-
 +
| 1.025
 +
| 1.123
 +
| 1.19
 +
| 1.257
 +
| 1.355
 +
|}
 +
 +
==Covariance(x, y)==
 +
Returns an estimate of the covariance of uncertain variables x and y. If x or y are non-probabilistic, it returns 0. The covariance is a measure of the degree to which x and y both tend to be in the upper (or lower) end of their ranges at the same time. Specifically, it is defined as:
 +
 +
<center><math>
 +
\sum_{i=1}^{n} (x_i - \bar x)(y_i - \bar y)
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
Suppose you have an array x of uncertain quantities indexed by i:
 +
 +
Index i := 1..5
 +
Variable x := Array(i, […])
 +
 +
You can compute the covariance matrix of each element of X against each other’s element
 +
(over i), thus:
 +
 +
INDEX j := CopyIndex(I)
 +
Covariance(x, x[i=j])
 +
 +
We create index j as a copy of index i and then create a copy of x that replaces i by j so that the covariance is computed for each slice of x over i against each slice over j. The result is the covariance matrix indexed by i and j. Each diagonal element contains the variance of the variable, since Variance(x) = Covariance(x, x). You can use this same method to generate a correlation matrix using the Correlation() or Rank_correl() functions described below.
 +
 +
==Correlation(x, y)==
 +
Returns an estimate of the correlation between the probabilistic expressions x and y, where -1 means perfectly negatively correlated, 0 means no correlation, and 1 means perfectly positively correlated.
 +
 +
Correlation(x, y), a measure of probabilistic dependency between uncertain variables, is sometimes known as the Pearson product moment coefficient of correlation, r. It measures the strength of the linear relationship between x and y, using the formula:
 +
 +
<center><math>
 +
\frac{\sum_i (x_i - \bar x)(y_i - \bar y)}{\sum_i (x_i - \bar x)^2 \times \sum_i (y_i - \bar y)^2}
 +
</math></center>
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:''' With sampleSize set to 100 and number format set to two decimal digits:
 +
 +
Correlation(Alt_fuel_price + Fuel_price, Fuel_price) &rarr; 0.71
 +
 +
Correlation of two independent, uncorrelated distributions approaches 0 as the sample size
 +
approaches infinity.
 +
 +
'''Example:''' With sampleSize = 20:
 +
 +
Correlation(Normal(1.19, 0.1), Normal(1.19, 0.1)) &rarr; -.28
 +
 +
With sampleSize = 1000:
 +
 +
Correlation(Normal(1.19, 0.1), Normal(1.19, 0.1)) &rarr; 0.03
 +
 +
==Rankcorrel(x, y)==
 +
Returns an estimate of the rank-order correlation coefficient between the distributions x and y. x and y must be probabilistic.
 +
 +
Rankcorrel(x,y), a measure of the dependence between x and y, is sometimes known as Spearman’s rank correlation coefficient, rs.
 +
 +
Rank-order correlation is measured by computing the ranks of the probability samples, and then computing their correlation. By using the rank order of the samples, the measure of correlation is not affected by skewed distributions or extreme values, and is, therefore, more robust than simple correlation. Rank-order correlation is used for [[importance analysis]] (page 303).
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
 +
With <code>sampleSize = 100:</code>
 +
 +
<code>Rankcorrel(Fuel_price, Alt_fuel_price) &rarr; .02</code>
 +
 +
==Frequency(x, i)==
 +
If x is a discrete uncertain variable, returns an array indexed by i, giving the frequency, or number of occurrences of discrete values i. i must contain unique values; if numeric, the values must be increasing.
 +
 +
If x is a continuous uncertain variable and i is an index of numbers in increasing order, it returns an array indexed by i, with the count of values in the sample x that are equal to or less than each value of i and greater than the previous value of i.
 +
 +
If x is non-probabilistic, Frequency() returns sampleSize for each value of i equal to x.
 +
Since Frequency() is computed by counting occurrences in the probabilistic sample, it is a function of sampleSize (see “Uncertainty Setup dialog” on page 257). If you want the relative frequency rather than the count of each value, divide the result by sampleSize.
 +
 +
'''Library:''' Statistical
 +
 +
'''Example''' (continuous):
 +
 +
<code>Index Index_a := [1.2,1.25]</code>
 +
 +
<code>Frequency(Fuel_price, Index_a) &rarr; </code>
 +
 +
{| class="wikitable"
 +
! colspan="2" | Index_a &#9654;
 +
|-
 +
! 1.2
 +
! 1.25
 +
|-
 +
| 54
 +
| 19
 +
|}
 +
 +
'''Example''' (discrete):
 +
<code>Bern_out: [0,1]</code>
 +
 +
(Possible outcomes of the Bernoulli Distribution.)
 +
 +
<code>With Samplesize = 100:</code>
 +
<code>Frequency(Bernoulli (0.3), Bern_out) &rarr;</code>
 +
 +
{| class="wikitable"
 +
! colspan="2" | Bern_out &#9654;
 +
|-
 +
! 0
 +
! 1
 +
|-
 +
| 70
 +
| 30
 +
|}
 +
 +
<code>With Samplesize = 25:</code>
 +
<code>Frequency(Bernoulli (0.3), Bern_out) &rarr;</code>
 +
 +
{| class="wikitable"
 +
! colspan="2" | Bern_out &#9654;
 +
|-
 +
! 0
 +
! 1
 +
|-
 +
| 70
 +
| 30
 +
|}
 +
 +
(Compare to the [[Bernoulli example]] on page 267.)
 +
 +
==Mid(x)==
 +
Returns the mid value of x. Unlike other statistical functions, Mid() forces deterministic evaluation in contexts where x would otherwise be evaluated probabilistically.
 +
The mid value is calculated by substituting the median for most full probability distributions in the definition of a variable or expression, and using the mid value of any inputs. The mid value of a variable or expression is not necessarily equal to its true median, but is usually close to it.
 +
 +
'''Library:''' Statistical
 +
 +
'''Example:'''
 +
<code>Mid(Fuel_price) &rarr; 1.19</code>
 +
 +
==Sample(x)==
 +
Forces x to be evaluated probabilistically and returns a sample of values from the distribution of x in an array indexed by the system variable Run. If x is not probabilistic, it just returns its mid value. The system variable sampleSize specifies the size of this sample. You can set sampleSize in the [[Uncertainty Setup dialog]] (page 257).
 +
 +
Library Statistical
 +
 +
When to use Use when you want to force probabilistic evaluation.
 +
 +
Example Here are the first six values of a sample:
 +
Sample(Fuel_price) &rarr;
 +
 +
{| class="wikitable"
 +
! colspan="6" | Iteration(Run) &#9654;
 +
|-
 +
! 1
 +
! 2
 +
! 3
 +
! 4
 +
! 5
 +
! 6
 +
|-
 +
| 1.191
 +
| 1.32
 +
| 1.19
 +
| 1.164
 +
| 1.191
 +
| 0.962
 +
|}

Revision as of 04:57, 17 December 2015

Statistical functions compute a statistic from a probability distribution. More precisely, they estimate the statistic from a random sample of values representing a probabilistic value. Common examples are Mean, Variance, Correlation, and Getfract (which returns a fractile or percentile). The uncertainty view options (page 29) available in the Result window use these functions.

Statistical functions force prob mode evaluation: Unlike other functions, statistical functions usually force their main parameter(s) to be evaluated in prob mode (probabilistically) and they return a nonprobabilistic value — whether they are evaluated in a mid mode or prob mode. For example:

Chance X := Normal(0, 1)
Variable X90 := Getfract(X, .9)
X90 → 1.259

Evaluating variable X90 causes variable X to be evaluated in prob mode, so that Getfract(X, 90%) can estimate the 90th percentile (0.9 fractile) of the distribution for X. X90 itself has only a mid value, and no probabilistic value. The exception is the Mid(x) function that forces X to be evaluated in mid mode, no matter the evaluation context.

Statistics from nonprobabilistic arrays: The default usage of statistical functions is over a probability distribution, represented as a random sample indexed by Run. You can also use these functions to compute statistics over an array with a different index by specifying that index explicitly. This is often useful for computing statistics from data tables — including if you want to fit a probability distribution to a set of data. For example, suppose Data is an array of imported measurements:

Index K := 1..1000
Variable Data:= Table(K)(123.4, 252.9, 221.4, ...)
Variable Xfitted := Normal(Mean(Data, K), Sdeviation(Data, K)

Xfitted is a normal distribution fitted to Data with the same mean and standard deviation.

Tip

All statistical functions produce estimates from the underlying random sample for each probabilistic quantity. These estimates are not exact, but vary from one evaluation to the next due to the variability inherent in random sampling. Hence, your results might not exactly match the results shown in the examples here. For greater precision, use a larger sample size (see “[[Appendix A: Selecting the Sample Size]]” on page 424 on how to select a sample size).

Notation in formulas: The formulas used to define statistics use this notation:

xi The ith sample value of probabilistic variable x The mean of probabilistic variable x (see “Mean(x)” on page 297) s Standard deviation (see “Sdeviation(x)” on page 297) m Sample size (see “Appendix A: Selecting the Sample Size” on page 424)

Statistics and textvalued distributions:Most statistical functions require their parameters to be numerical. A few statistical functions, those that only requiring ordinal (ordered) values, also work on distributions with text values (whose domain is a list of labels), namely Frequency (use Frequency(X, X)), Mid, Min, Max, Probability_bands, and Sample. These functions assume the values are ordered as specified in the domain list of labels, e.g., Low, Mid, High.

Example model: The examples in this section use the following variables: Variable Alt_ fuel_ price := Normal(1.25, 0.1) Variable Fuel_price := Normal(1.19, 0.1) Variable Skfuel_price := Beta(4, 2,1,1.5)

Mean(x)

Returns an estimate of the mean of x if x is probabilistic. Otherwise, returns x. Mean(x) uses this formula.

[math]\displaystyle{ \frac{1}{m} \sum_{i=1}^{m} x_i = \bar x }[/math]

Library: Statistical

Examples:

Mean(Fuel_price) → 1.19

Mean(Skfuel_price) &rarr 1.33

Median(x)

Returns an estimate of the median of x from its sample if x is probabilistic. When x is non-probabilistic, returns x. Equivalent to GetFract(x,0.5).

Library: Statistical

Examples: Median(Fuel_price) → 1.19

Sdeviation(x)

Returns an estimate of the standard deviation of x from its sample if x is probabilistic. If x is nonprobabilistic, returns 0.

Sdeviation(x) uses this formula.

[math]\displaystyle{ \frac{1}{m - 1} \sum_{i=1}^{m} (x_i - \bar x)^2 = \sigma }[/math]

Library: Statistical

Example: Sdeviation(Fuel_price) → 0.10

Variance(x)

Returns an estimate of the variance of x if x is probabilistic. If x is non-probabilistic, returns 0. Variance() uses this formula.

[math]\displaystyle{ \frac{1}{m - 1} \sum_{i=1}^{m} (x_i - \bar x)^2 = \sigma^2 }[/math]

Library: Statistical

Example: Variance(Fuel_price) → 0.01

Skewness(x)

Returns an estimate of the skewness of x. x must be probabilistic.

Skewness is a measure of the asymmetry of the distribution. A positively skewed distribution has a thicker upper tail than lower tail, while a negatively skewed distribution has a thicker lower tail than upper tail. A normal distribution has a skewness of zero.

Skewness() uses this formula.

[math]\displaystyle{ \frac{1}{m} \sum_{i=1}^{m} [\frac {x_{i} - \bar x}{\sigma}]^3 }[/math]

Library: Statistical

Example: Skewness(Skfuel_price) → -0.45

Kurtosis(x)

Returns an estimate of the kurtosis of x. x must be probabilistic.

Kurtosis is a measure of the peakedness of a distribution. A distribution with long thin tails has a positive kurtosis. A distribution with short tails and high shoulders, such as the uniform distribution, has a negative kurtosis. A normal distribution has zero kurtosis.

Kurtosis(x) uses this formula.

[math]\displaystyle{ (\frac{1}{m} \sum_{i=1}^{m} [\frac {x_{i} - \bar x}{\sigma}]^4) - 3 }[/math]

Library: Statistical

Example: Kurtosis(Skfuel_prices) → -0.48

Probability(b)

Returns an estimate of the probability or array of probabilities that the Boolean value b is True.

Library: Statistical

Example: Probability(Fuel_price < 1.19) → 0.5

See Also

GetFract(x, p, I)

Returns an estimate of the pth fractile (also known as quantile or percentile) of x over index I. The index I is optional. If it is omitted, the function operates over the Run index and returns probability fractiles. This is the value of x such that x has a probability p of being less than that value. If x is constant over index I --for example, a non-probabilistic variable using Run as the fractile index-- ,all fractiles are equal to x.

The value of p must be a number or array of numbers between 0 and 1, inclusive.

Library: Statistical

Examples:

Getfract(x, 0.5)returns an estimate of the median of x.

Getfract(Fuel_price, 0.5) &rarr 1.19 The following returns a table containing estimates of the 10%ile and 90%ile values, that is, an 80% confidence interval.

Index Fract := [0.1, 0.9] Getfract(Fuel_price, Fract) →

Fract ▶
0.10 0.90
1.06 1.32

ProbBands(x)

Returns an estimate of probability or “confidence” bands for x if x is probabilistic. Otherwise returns x for every band. The probabilities are specified in the Uncertainty Setup dialog (page 257), Probability Bands option.

Library: Statistical

Example: Probbands(Fuel_price) →

Probability ▶
0.05 0.25 0.5 0.75 0.95
1.025 1.123 1.19 1.257 1.355

Covariance(x, y)

Returns an estimate of the covariance of uncertain variables x and y. If x or y are non-probabilistic, it returns 0. The covariance is a measure of the degree to which x and y both tend to be in the upper (or lower) end of their ranges at the same time. Specifically, it is defined as:

[math]\displaystyle{ \sum_{i=1}^{n} (x_i - \bar x)(y_i - \bar y) }[/math]

Library: Statistical

Suppose you have an array x of uncertain quantities indexed by i:

Index i := 1..5
Variable x := Array(i, […])

You can compute the covariance matrix of each element of X against each other’s element (over i), thus:

INDEX j := CopyIndex(I)
Covariance(x, x[i=j])

We create index j as a copy of index i and then create a copy of x that replaces i by j so that the covariance is computed for each slice of x over i against each slice over j. The result is the covariance matrix indexed by i and j. Each diagonal element contains the variance of the variable, since Variance(x) = Covariance(x, x). You can use this same method to generate a correlation matrix using the Correlation() or Rank_correl() functions described below.

Correlation(x, y)

Returns an estimate of the correlation between the probabilistic expressions x and y, where -1 means perfectly negatively correlated, 0 means no correlation, and 1 means perfectly positively correlated.

Correlation(x, y), a measure of probabilistic dependency between uncertain variables, is sometimes known as the Pearson product moment coefficient of correlation, r. It measures the strength of the linear relationship between x and y, using the formula:

[math]\displaystyle{ \frac{\sum_i (x_i - \bar x)(y_i - \bar y)}{\sum_i (x_i - \bar x)^2 \times \sum_i (y_i - \bar y)^2} }[/math]

Library: Statistical

Example: With sampleSize set to 100 and number format set to two decimal digits:

Correlation(Alt_fuel_price + Fuel_price, Fuel_price) → 0.71

Correlation of two independent, uncorrelated distributions approaches 0 as the sample size approaches infinity.

Example: With sampleSize = 20:

Correlation(Normal(1.19, 0.1), Normal(1.19, 0.1)) → -.28

With sampleSize = 1000:

Correlation(Normal(1.19, 0.1), Normal(1.19, 0.1)) → 0.03

Rankcorrel(x, y)

Returns an estimate of the rank-order correlation coefficient between the distributions x and y. x and y must be probabilistic.

Rankcorrel(x,y), a measure of the dependence between x and y, is sometimes known as Spearman’s rank correlation coefficient, rs.

Rank-order correlation is measured by computing the ranks of the probability samples, and then computing their correlation. By using the rank order of the samples, the measure of correlation is not affected by skewed distributions or extreme values, and is, therefore, more robust than simple correlation. Rank-order correlation is used for importance analysis (page 303).

Library: Statistical

Example:

With sampleSize = 100:

Rankcorrel(Fuel_price, Alt_fuel_price) → .02

Frequency(x, i)

If x is a discrete uncertain variable, returns an array indexed by i, giving the frequency, or number of occurrences of discrete values i. i must contain unique values; if numeric, the values must be increasing.

If x is a continuous uncertain variable and i is an index of numbers in increasing order, it returns an array indexed by i, with the count of values in the sample x that are equal to or less than each value of i and greater than the previous value of i.

If x is non-probabilistic, Frequency() returns sampleSize for each value of i equal to x. Since Frequency() is computed by counting occurrences in the probabilistic sample, it is a function of sampleSize (see “Uncertainty Setup dialog” on page 257). If you want the relative frequency rather than the count of each value, divide the result by sampleSize.

Library: Statistical

Example (continuous):

Index Index_a := [1.2,1.25]

Frequency(Fuel_price, Index_a) →

Index_a ▶
1.2 1.25
54 19

Example (discrete): Bern_out: [0,1]

(Possible outcomes of the Bernoulli Distribution.)

With Samplesize = 100: Frequency(Bernoulli (0.3), Bern_out) →

Bern_out ▶
0 1
70 30

With Samplesize = 25: Frequency(Bernoulli (0.3), Bern_out) →

Bern_out ▶
0 1
70 30

(Compare to the Bernoulli example on page 267.)

Mid(x)

Returns the mid value of x. Unlike other statistical functions, Mid() forces deterministic evaluation in contexts where x would otherwise be evaluated probabilistically. The mid value is calculated by substituting the median for most full probability distributions in the definition of a variable or expression, and using the mid value of any inputs. The mid value of a variable or expression is not necessarily equal to its true median, but is usually close to it.

Library: Statistical

Example: Mid(Fuel_price) → 1.19

Sample(x)

Forces x to be evaluated probabilistically and returns a sample of values from the distribution of x in an array indexed by the system variable Run. If x is not probabilistic, it just returns its mid value. The system variable sampleSize specifies the size of this sample. You can set sampleSize in the Uncertainty Setup dialog (page 257).

Library Statistical

When to use Use when you want to force probabilistic evaluation.

Example Here are the first six values of a sample: Sample(Fuel_price) →

Iteration(Run) ▶
1 2 3 4 5 6
1.191 1.32 1.19 1.164 1.191 0.962
Comments


You are not allowed to post comments.