Rank

Revision as of 04:41, 1 November 2015 by JHernandez2 (talk | contribs)


Returns an array of the rank vaules of X across index I.

Rank(x, i)

The lowest value in x has a rank value of 1, the next-lowest has a rank of 2, and so on. i is optional if x is one-dimensional. If i is omitted when x has more than one dimension, the innermost dimension is ranked. Since you, as a modeler, have little control over which dimension is the inner dimension, you should always specify i unless you can guarantee that xwill always be one-dimensional.

Advanced Variations

Rank(X : Array[I,keyIndex] ; I : Index ; rankType : optional ; keyIndex : Index ; descending, caseInsensitive : optional boolean[KeyIndex],
     passNaNs : optional boolean )

Case Sensitivity

When ranking text values, Rank treats text values as being case-sensitive with capital letters preceding lower case letters. So, for example, "Zebra" gets a lower rank than "apply". In Analytica 4.2 or later, you can supply the optional parameter caseInsensitive:true:

Rank(D,I,caseInsensitive:true)

Case sensitivity only impacts text values.

Descending rank

You can easily reverse the rank of numeric arrays, where the largest numbers receive the lowest ranks, by simply using:

Rank(-X,I)

For arrays containing text values, in Analytica 4.2 and later you can specify the optional descending:true parameter:

Rank(X,I,descending:true)

which uses the reverse rank order for text as well as numbers.

Multi-key ranking

When two values are tied for the same rank, a second array can be used to break the tie. This is refered to as a multi-key sort (or multi-key rank). The first array is the primary key, the next is the secondary key, and so on. Analytica 4.2's Rank function support multi-key ranking. To use, your keys must all share a common index, I. You must then introduce a new index, keyIndex, to dimension your keys (any number of keys may be used), and bundle your keys into a 2-D array indexed by I and keyIndex. For example, the following ranks by age, then for ties by gender:

Index keyIndex := ['age','gender'];
Rank( Array(keyIndex,[age,gender]), I, keyIndex )

Treatment of NaN and Null values

When a NaN value occurs in your data, Rank can either pass it through as a NaN or assign it a numeric rank. The optional passNaNs parameter controls this behavior. The special value NaN indicates an indeterminate real number, which in theory cannot be compared to other numbers, so the ordering of NaN by Rank doesn't actually make logical sense. By passing NaN values through without assigning an actual rank, you may catch errors in your model that lead to the introduction of the NaN value in the first place, since these errors continue to be propagated to your results. This is often a desirable property.

By default, Rank assigns an arbitrary ranking to NaN values -- placing them between -INF and any finite numeric value. To pass NaNs, include the optional parameter: Rank(X,I,passNaNs:true)

Null values are sometimes used for missing data, and in these cases can also be passed using the passNulls:true parameter:

Rank(X,I,passNulls:true). 

When this is not specified, Null values are assigned a rank (with Null coming after all numeric values).

Examples

Basic Example

Rank(Years) →

Variable Rate_of_inflation :=

Years ▶
2005 2006 2007 2008 2009
1 2 3 4 5

Rank(Car_prices, Car_type) → Car_type ↓ , Years →

2005 2007 2007 2008 2009
VW 1 1 1 1 1
Honda 2 2 2 2 2
BMW 3 3 3 3 3

Optional Type Parameter Example:

Index RankType := [-1,0,1, Null]

Rank(NumRepairs,CarNum,Type:RankType → Rank_type ↓ , CarNum →

1 2 3 4 5 6 7
-1 7 2 6 2 2 1 2
0 7 3.5 6 3.5 3.5 1 3.5
1 7 5 6 5 5 1 5
Null 7 2 6 3 4 1 5

Multi-Key Example:

Rank(NumMaintEvents,CarNum,KeyIndex:MaintType) → CarNum →

1 2 3 4 5 6 7
7 4 6 2 3 1 5

See Example variables for example array variables used here and below.

Optional Parameters

Type

If two (or N) values are equal, they receive the same rank and the next higher value receives a rank 2 (or N) higher. You can use an optional parameter, Type, to control which rank is assigned to equal values. By default, the lowest rank is used, equivalent to Rank(x,i,Type:-1). Alternatively, Rank(x,i,Type:0) uses the mid-rank and Rank(x,i,Type:1) uses the upper-rank. Rank(x,i,Type:Null) assigns a unique rank to every element (the numbers 1 thru N) in which tied elements may have different ranks.

keyIndex

A multi-key rank can be processed by indexing each key with a new index, and specifying this index for the optional keyIndex parameter. In a multi-key rank, x[@KeyIndex=1] determines the rank order, except that ties are then resolved using x[@KeyIndex=2], any ties there are resolved using x[@KeyIndex=3], and so on.

descending

Rank(x,i,descending:true) assigns the largest value a rank 1, the second largest a rank 2, and so on.

caseInsensitive

When x contains textual values, the optional boolean parameter caseInsensitive:true ignores upper-lower case differences during the comparisons. The parameters descending and caseInsensitive may also be indexed by the keyIndex when they vary by key.

passNaNs, passNulls

By default, Rank assigns an arbitrary ranking to NaN or Null values. Alternatively, you can pass these through to the result as NaN or Null using Rank(x,i,passNaNs:true, passNulls:true).

History

New to 4.0

The rank type determines how values in X that occur multiple times are ranked. For example, if the value x=5 occurs 6 times, and there are 3 other values in X that are less than 5, then x=5 appears in the sort-order at positions 4,5,6,7,8, and 9. The lower-rank of x=5 is 4, the upper-rank of x=5 is 9, and the mid-rank is 6.5. Note that a mid-rank is not necessary a valid index position (since it may be fractional), so if you intend to use the result of Rank in a slice function, you should use either the lower-rank (the default) or the upper-rank.

New to 4.2

When you want every element to be assigned a unique rank, you can specify rankType:null. Tied elements will be given unique ranks (with their position in the original array being used to break the tie).

The RankCorrel function also allows you to select which rank type to use, but it uses mid-rank by default in Analytica 4.0.

See Also

Comments


You are not allowed to post comments.