Set Functions

Revision as of 22:43, 22 September 2010 by Lchrisman (talk | contribs) (hyperlink)

New to Analytica 4.3

Sets

In mathematics, a set is a collection of non-repeated elements. The functions described on this page operate on sets that are represented in Analytica as a reference to list or 1-D array. With this representation, a set is seen as an atomic element by Analytica's array abstraction, thus allowing all these functions to fully array abstract even when the collections comprising the sets are of different lengths or have different indexes.

The following demonstrates this representation:

Var A_list := ['a','b','c','d'];
Var A_set := \A_list;
...

The backslash in front of A_list turns the list into a set in the manner expected by functions here.

Suppose you have a 4-dimensional array, A, indexed by In1, In2, In3 and In4. The expression \[In4]A returns a 3-D array of sets, each set being one of the vectors indexed by In4. As seen, when using the reference operator, \ you can specify in brackets which index becomes the set dimension.

Literal Sets

To create a set from a literal list, you must either specify the Null dimension to the reference operator, or you must surround the brackets by parentheses. You cannot simply place a backslash in front of a literal list, since the backslash operator sees brackets and assumes that the brackets are specifying the indexes to swallow. Here are two examples of how to express a literal set:

\[ Null ][1,2,3]
\([1,2,3])

but

\[1,2,3]    { **** Does not work **** }

The Empty Set

When using the set functions described below, Null does not have the same meaning as the empty set. Just as other array functions (e.g., Sum, Max, etc.) ignore Null elements, so do functions SetIntersection, SetUnion, etc. The intersection of a set with Null is therefore the set itself, not the empty set. The empty set is specified as:

\([])      { The empty set }

Converting a Set to a List

The dereference operator, # is used to convert a set back into a list. This operation does not array-abstract, so you can apply it to a single set, but not to an array of sets.

Function SetContains

Function SetContains( set, element )

Returns true if element is contained in the set.

SetContains( \Sequence(7,1000,7), [770,775,777] ) → [1,0,7]

Function SetsAreEqual

Function SetsAreEqual( sets, I, ignoreNull )

Returns true when all the sets passed into the first parameter have exactly the same elements, without regard to duplicates or ordering, and ignoring Null values (unless «ignoreNull» is explicitly specified to be false).

Var L1 := [1,1,1,2,3];
Var L2 := [3,2,2,1];
Var L3 := [2,3,1,Null];
SetsAreEqual( [\L1,\L2,\L3] ) → 1

In this example, all three sets are treated as the set {1,2,3}. But:

SetsAreEqual( [\L1,\L2,\L3], ignoreNull:false ) → 0

With the optional parameter, the set \L3 is then understood to include the Null value.

The following tests whether every row of a table contains the same set of items (ignoring ordering), where T is indexed by Row and Col:

SetsAreEqual( \[Col]T, Row )

The first parameter specifies that each Col-vector (i.e., each row) is taken as a set. The index parameter, Row, specifie that the comparison takes place along the Row index of T.

Function SetIntersection

Function SetIntersection( sets, I, resultIndex, keepNulls )

Returns the set of elements in common to all the sets specified in the first parameter, «sets».

The first parameter is a list or array of sets. When this is or might be a multi-dimensional array, then the second parameter, «I» specifies the index to operate over.

Consider the following array:

A := J →
I ↓ 3 6 9 2 5 8
7 4 1 8 5 2
4 8 2 6 0 3

To intersect the rows of A, finding all elements in common to all rows, then treat each J-vector as a set (\[J]A) and operate over the I index as follows:

#SetIntersection( \A[J], I ) → [2,8]

Optionally you can map the result onto a pre-existing index. When a «resultIndex» is provided, an array is returned on that index containing the resulting elements. When «resultIndex» is not provided, the result is a set (a reference to a list). When «resultIndex» is too short to accomodate all elements in the result, only the first Size(resultIndex) elements of the result are returned. When it is too short, the final cells are padded with Null.

SetIntersection( \A[J], I, resultIndex: J )
J →
2 8 «null» «null» «null» «null»

To find the set of elements that two indexes have in common, use:

#SetIntersection( [\I,\J] )

Function SetUnion

Function SetUnion( sets, I, resultIndex, keepNulls )

Returns the set of all unique non-Null elements occurring in any of the sets passed in the first parameter, «sets».

The first parameter is an array of sets. To find the union of the elements in set of lists or indexes, use:

#SetUnion( [\L1,\L2,\L3,\L4] )

Since the result is a set, i.e., a reference to a list, the dereference operator, # is applied to the result. That de-referenced result can then be used to define a new index.

To find the union of all unique elements occurring along the rows of a 2-dimensional array indexed by I and J, use:

#SetUnion( \A[J], I )

This turns each row (each row being a slice along I indexed by J) into a set, resulting in a 1-D array of sets indexed by I, and then applies the union operation along the I dimension.

To include Null values in the result, specify the optional parameter «keepNulls» as true.

Var L1 := [1,3,Null,5];
Var L2 := [1,2,4,5];
SetUnion( [\L1,\L2] ) → \[1,3,5,2,4]
SetUnion( [\L1,\L2], keepNulls:true ) → \[1,3,Null,5,2,4]

If you specify the optional «resultIndex», the result is returned as an array (rather than as a reference) along the indicated index. The number of elements in the result is truncated to the index's length, or padded with Null values if the index is longer than necessary.


Function SetDifference

Function SetDifference( originalSet, remove, remove2, remove3, ..., resultIndex, keepNull)

Returns the unique set of non-Null elements in «originalSet» that do not appear in any of the other sets, «remove».... The result is a set (reference to a list) unless «resultIndex» is specified. If «resultIndex» is specified, then an array indexed by the result index is returned, truncated to the length of «resultIndex» or padded with Null values.

SetDifference(\Sequence(1,10),\Sequence(2,10,2),\Sequence(3,10,3)) → \[1,5,7]

You can also remove individual elements:

SetDifference(\Sequence(1,4),3) → \[1,2,4]

Null values are not included in the result unless «keepNull» is specified as true.

Miscellaneous Usage Notes

  • When provided with only a single set, SetUnion(s), SetIntersection(s) and SetDifference(s) have the effect of removing duplicate (and Null) values. Hence, when A is an array without any Null values, these are all equivalent:
A[I=Unique(A,I)]
SetUnion([\A])
SetIntersection([\A])
SetDifference(\A)
  • When using an index, I, SetContains(\I,x) and @[I=x]>0 are essentially equivalent. They are exactly equivalent when the IndexValue(I) does not contain any handles, or is a meta-only index (or MetaIndex).

See Also

Comments


You are not allowed to post comments.