Set Functions

Revision as of 01:39, 20 May 2015 by Max (talk | contribs)


Sets

In mathematics, a set is a collection of unique elements. Analytica represents a set as a reference to a list or 1-D array, for example:

Var A_list := ['a','b','c','d'];
Var A_set := \ A_list;

The backslash, "\", in front of A_list returns a Reference to the list. The Set Functions described below work on Sets represented like this.

This representation means a set is an atomic element (just a Reference), which allows Set functions to fully array abstract. They wouldn't be able to array abstract if sets were simply lists, for example if operating on multiple lists with different lengths or different indexes.

The following demonstrates this representation:


Suppose you have a 4-dimensional array, A, indexed by In1, In2, In3 and In4. The expression \[In4]A returns a 3-D array of sets, each set being one of the vectors indexed by In4. As seen, when using the reference operator, \ you can specify in brackets which index becomes the set dimension.

Literal Sets

To create a set from a literal list, you must enclose it in parentheses preceded by backslash "\":

either specify the Null dimension to the reference operator, or you must surround the brackets by parentheses:

\([1,2,3])

It does not work if you omit the parentheses due to a syntactic ambiguity:

\[1,2,3]    { **** Does not work **** }

A more obscure approach to create a Set from a literal list is to specify a Null index after the backslash "\":

\[Null][1,2,3]


The Empty Set

You can specify the Empty set (that contains nothing) simply as:

\([])      { The empty set }

The expression Null does not have the same meaning as the empty set. Just as array functions Sum, Max, etc. ignore Null elements, so do set functions SetIntersection, SetUnion, etc. Thus, the intersection of a set S with Null is therefore the set S, not the empty set.

Converting a Set to a List

You can obtain a list from a Set using the dereference operator, #.

  # A_set  → ['a','b','c','d']


This operation does not array-abstract: You can apply it to a single set, but not to an array of sets.

Function SetContains

Function SetContains(s, element)

Returns true if element is contained in the set s.

SetContains(\(1 .. 10), [9, 10, 11] ) → [1, 1, 0]

Note that [9, 10, 11] is a list of potential set elements (not itself a set).

Function SetsAreEqual

Function SetsAreEqual(sets, I, ignoreNull)

Returns true if the first parameter is a list of sets that have exactly the same elements, ignoring duplicates or ordering (and ignoring Null values, unless optional parameter «ignoreNull» is specified as False).

Var L1 := [1, 1, 1, 2, 3];
Var L2 := [3, 2, 2, 1];
Var L3 := [2, 3, 1, Null];
SetsAreEqual([\L1, \L2, \L3]) → 1

In this example, all three sets are treated as the set {1, 2, 3}. But:

SetsAreEqual([\L1, \L2, \L3], IgnoreNull: False) → 0

With IgnoreNull set to False, the set \L3 includes the Null value, and so is not identical to sets \L1 and \L2.

If Table T is indexed by Row and Col, this expression tests if each Row contains the same items (ignoring ordering or repeated items):

SetsAreEqual(\[Col]T, Row)

The first parameter specifies that each Col-vector (i.e., each row) is taken as a set. The index parameter, Row, specifies that the comparison takes place along the Row index of T.

Function SetIntersection

Function SetIntersection(sets, I, resultIndex, keepNulls)

Returns the set of elements in common to all the sets specified in the first parameter, «sets».

The first parameter is a list or array of sets. When this is or might be a multi-dimensional array, the optional second parameter, «I» specifies the index to operate over.

Consider the following array:

A := J →
I ↓ 3 6 9 2 5 8
7 4 1 8 5 2
4 8 2 6 0 3

To intersect the rows of A, finding all elements in common to all rows, then treat each J-vector as a set (\[J]A) and operate over the I index as follows:

#SetIntersection( \[J]A, I ) → [2,8]

Optionally you can map the result onto a pre-existing index. When a «resultIndex» is provided, an array is returned on that index containing the resulting elements. When «resultIndex» is not provided, the result is a set (a reference to a list). When «resultIndex» is too short to accomodate all elements in the result, only the first Size(resultIndex) elements of the result are returned. When it is too short, the final cells are padded with Null.

SetIntersection( \[J]A, I, resultIndex: J )
J →
2 8 «null» «null» «null» «null»

To find the set of elements that two indexes have in common, use:

#SetIntersection( [\I,\J] )

Function SetUnion

Function SetUnion(sets, I, resultIndex, keepNulls)

Returns the set of all unique non-Null elements occurring in any of the sets passed in the first parameter, «sets».

The first parameter is an array of sets. To find the union of the elements in set of lists or indexes, use:

#SetUnion( [\L1,\L2,\L3,\L4] )

Since the result is a set, i.e., a reference to a list, the dereference operator, # is applied to the result. That de-referenced result can then be used to define a new index.

To find the union of all unique elements occurring along the rows of a 2-dimensional array indexed by I and J, use:

#SetUnion( \A[J], I )

This turns each row (each row being a slice along I indexed by J) into a set, resulting in a 1-D array of sets indexed by I, and then applies the union operation along the I dimension.

To include Null values in the result, specify the optional parameter «keepNulls» as true.

Var L1 := [1,3,Null,5];
Var L2 := [1,2,4,5];
SetUnion( [\L1,\L2] ) → \[1,3,5,2,4]
SetUnion( [\L1,\L2], keepNulls:true ) → \[1,3,Null,5,2,4]

If you specify the optional «resultIndex», the result is returned as an array (rather than as a reference) along the indicated index. The number of elements in the result is truncated to the index's length, or padded with Null values if the index is longer than necessary.

Function SetDifference

Function SetDifference( originalSet, remove, remove2, remove3, ..., resultIndex, keepNull)

Returns the unique set of non-Null elements in «originalSet» that do not appear in any of the other sets, «remove».... The result is a set (reference to a list) unless «resultIndex» is specified. If «resultIndex» is specified, then an array indexed by the result index is returned, truncated to the length of «resultIndex» or padded with Null values.

SetDifference(\Sequence(1,10),\Sequence(2,10,2),\Sequence(3,10,3)) → \[1,5,7]

You can also remove individual elements:

SetDifference(\Sequence(1,4),3) → \[1,2,4]

Null values are not included in the result unless «keepNull» is specified as true.

Miscellaneous Usage Notes

  • When provided with only a single set, SetUnion(s), SetIntersection(s) and SetDifference(s) have the effect of removing duplicate (and Null) values. Hence, when A is an array without any Null values, these are all equivalent:
A[I=Unique(A,I)]
#SetUnion([\A])
#SetIntersection([\A])
#SetDifference(\A)
  • When using an index, I, SetContains(\I,x) and @[I=x]>0 are essentially equivalent. They are exactly equivalent when the IndexValue(I) does not contain any handles, or is a meta-only index (or MetaIndex).

See Also

History

These set functions were New to Analytica 4.3

Comments


You are not allowed to post comments.