Difference between revisions of "Set Functions"

Latest revision as of 18:22, 31 March 2021

Analytica represents a set as a reference to a list or 1-D array. It offers all the usual set operations, including #Setunion, #SetIntersection, #SetDifference, and #SetContains.

Introducing sets

In mathematics, a set is an unordered collection of unique elements. Analytica represents a set as a reference to a list or 1-D array, for example:

Local A_list := ['a', 'b', 'c', 'd'];

Local A_set := \ A_list;

The backslash, \, in front of A_list returns a Reference to the list. All the Set Functions described below work on Sets represented like this. Using a Reference to a list allows Set functions to fully array abstract. If they operated directly on lists rather than references to lists, they wouldn't work properly when operating on multiple lists with different lengths or different indexes.

Literal Sets

To create a set from a literal list, you must precede it by backslash, \, to create a reference, and enclose it in parentheses :

\([1, 2, 3])

If you omit the parentheses, it gives an error due to a syntactic ambiguity:

\[1, 2, 3] { **** Does not work **** }

Although a list is ordered and may contain duplicates, the set operations ignore the order and any duplicates. Any Set result contains only unique elements. They also ignore Null elements, unless you specify optional parameter keepNulls: True.

The Empty Set

You can specify the Empty set (that contains nothing) simply as:

\([]) { The empty set }

The expression Null is not the same as the empty set. Set functions SetIntersection, SetUnion, etc. ignore Null elements, just as array functions Sum, Max, etc. ignore them. Thus, the intersection of a set S with Null is therefore the set S, not the empty set.

Convert a Set to a List

You can obtain a list from a Set using the dereference operator, #.

# A_set → ['a', 'b', 'c', 'd']

This operation does not array-abstract: You can apply it to a single set, but not to an array of sets.

Summary of Set functions

Most of the set functions take a list or array of one or more sets as their primary function.

SetUnion(sets): Returns a set that is the union of the sets -- that is, all elements that occur in one or more of the sets.

SetIntersection(sets): Returns a set that is the intersection of the sets -- that is, all elements that occur in one or more of the sets.

#SetIntersection, #SetDifference, and #SetContains.

Function SetIntersection

SetIntersection(sets, I, resultIndex, keepNulls)

Returns the set of elements in common to all the sets specified in the first parameter, «sets».

The first parameter is a list or array of sets. If this is or might be a multi-dimensional array, you should specify the optional second parameter, «I» as the index to operate over.

Consider the following array:

A :=

	J ▶
I ▼	3	6	9	2	5	8
	7	4	1	8	5	2
	4	8	2	6	0	3

To intersect the rows of A -- i.e. find the elements common to all rows -- treat each J-vector as a set (\[J]A) and operate over the I index:

#SetIntersection(\[J]A, I) → [2, 8]

Optionally you can map the result onto a pre-existing index. When you provide a «resultIndex», it returns an array over that index containing the resulting elements. When «resultIndex» is not provided, the result is a set (a reference to a list). If «resultIndex» is too short to accommodate all elements in the result, it includes only the first Size(resultIndex) elements of the result. If it's too short, it pads the final cells with Null.

SetIntersection(\[J]A, I, resultIndex: J) →

J →
2	8	«null»	«null»	«null»	«null»

To find the set of elements that two indexes have in common, use:

#SetIntersection([\I, \J])

Function SetUnion

SetUnion(sets, I, resultIndex, keepNulls)

Returns the set of all unique non-Null elements occurring in any of the sets passed in the first parameter, «sets».

The first parameter is an array of sets. To find the union of the elements in set of lists or indexes, use:

#SetUnion([\L1, \L2, \L3, \L4])

Since the result is a set, i.e., a reference to a list, the dereference operator, # is applied to the result. That de-referenced result can then be used to define a new index.

To find the union of all unique elements occurring along the rows of a 2-dimensional array indexed by I and J, use:

#SetUnion(\A[J], I)

This turns each row (each row being a slice along I indexed by J) into a set, resulting in a 1-D array of sets indexed by I, and then applies the union operation along the I dimension.

To include Null values in the result, specify the optional parameter «keepNulls» as true.

Local L1 := [1, 3, Null, 5];

Local L2 := [1, 2, 4, 5];

SetUnion( [\L1, \L2] ) → \[1, 3, 5, 2, 4]

SetUnion( [\L1, \L2], keepNulls: true) → \[1, 3, Null, 5, 2, 4]

If you specify the optional «resultIndex», the result is returned as an array (rather than as a reference) along the indicated index. The number of elements in the result is truncated to the index's length, or padded with Null values if the index is longer than necessary.

Function SetDifference

SetDifference(originalSet, remove, remove2, remove3, ..., resultIndex, keepNull)

Returns the unique set of non-Null elements in «originalSet» that do not appear in any of the other sets, «remove».... The result is a set (reference to a list) unless «resultIndex» is specified. If «resultIndex» is specified, then an array indexed by the result index is returned, truncated to the length of «resultIndex» or padded with Null values.

SetDifference(\Sequence(1, 10), \Sequence(2, 10, 2), \Sequence(3, 10, 3)) → \[1, 5, 7]

You can also remove individual elements:

SetDifference(\Sequence(1, 4), 3) → \[1, 2, 4]

Null values are not included in the result unless «keepNull» is specified as true.

Usage

When provided with only a single set, SetUnion(s), SetIntersection(s) and SetDifference(s) have the effect of removing duplicate (and Null) values. Hence, when A is an array without any Null values, these are all equivalent:

A[I = Unique(A, I)]

#SetUnion([\A])

#SetIntersection([\A])

#SetDifference(\A)

When using an index, I, then SetContains(\I, x) and @[I = x] > 0 are essentially equivalent. They are exactly equivalent when the IndexValue(I) does not contain any handles, or is a MetaOnly index (or MetaIndex).

Function SetContains

SetContains(s, element)

Returns true if element is contained in the set «s». Unlike other Set functions, the result is not a Set. It is a simple Truth value -- or an array of Booleans if the second parameter, element, is an array, for example:

SetContains(\(1 .. 10), [9, 10, 11]) → [1, 1, 0]

Note that the second parameter [9, 10, 11] is a list of potential set elements, not itself a set.

Function SetsAreEqual

SetsAreEqual(sets, I, ignoreNull)

Returns true if the first parameter is a list of sets that have exactly the same elements, ignoring duplicates or ordering (and ignoring Null values, unless optional parameter «ignoreNull» is specified as False).

Local L1 := [1, 1, 1, 2, 3];

Local L2 := [3, 2, 2, 1];

Local L3 := [2, 3, 1, Null];

SetsAreEqual([\L1, \L2, \L3]) → 1

In this example, all three sets are treated as the set {1, 2, 3}. But:

SetsAreEqual([\L1, \L2, \L3], IgnoreNull: False) → 0

With «ignoreNull» set to False, the set \L3 includes the Null value, and so is not identical to sets \L1 and \L2.

If Table T is indexed by Row and Col, this expression tests if each Row contains the same items (ignoring ordering or repeated items):

SetsAreEqual(\[Col]T, Row)

The first parameter specifies that each Col-vector (i.e., each row) is taken as a set. The index parameter, Row, specifies that the comparison takes place along the Row index of T.

History

The set functions were introduced in Analytica 4.3.

@@ Line 1: / Line 1: @@
 [[Category:Set Functions]]
-''new to Analytica 4.3''
+Analytica represents a set as a [[Using References|reference]] to a list or 1-D array.  It offers all the usual set operations, including [[#Setunion]], [[#SetIntersection]], [[#SetDifference]], and [[#SetContains]].
-= Sets =
+__TOC__
-In mathematics, a ''set'' is a collection of non-repeated elements.  The functions described on this page operate on sets that are represented in Analytica as a [[Using References|reference]] to list or 1-D array.  With this representation, a set is seen as an atomic element by Analytica's array abstraction, thus allowing all these functions to fully array abstract even when the collections comprising the sets are of different lengths or have different indexes.
+== Introducing sets  ==
-The following demonstrates this representation:
+In mathematics, a set is an unordered collection of unique elements.  Analytica represents a set as a [[Using References|reference]] to a list or 1-D array, for example:
+:<code>[[Local]] A_list := ['a', 'b', 'c', 'd'];</code>
+:<code>[[Local]] A_set := \ A_list;</code>
- [[Var]] A_list := ['a','b','c','d'];
+The backslash, <code>\</code>, in front of <code>A_list</code> returns a [[Using References|Reference]] to the list.  All the Set Functions described below work on Sets represented like this.  Using a [[Reference]] to a list allows Set functions to fully [[Array Abstraction|array abstract]]. If they operated directly on lists rather than references to lists, they wouldn't work properly when operating on multiple lists with different lengths or different indexes.
- [[Var]] A_set := [[Using References|\]]A_list;
- ...
-The backslash in front of ''A_list'' turns the list into a ''set'' in the manner expected by functions here.
+=== Literal Sets ===
+To create a set from a literal list, you must precede it by backslash, <code>\</code>, to create a reference, and enclose it in parentheses :
+:<code>\([1, 2, 3])</code>
-Suppose you have a 4-dimensional array, A, indexed by In1, In2, In3 and In4.  The expression <code>\[In4]A</code> returns a 3-D array of sets, each set being one of the vectors indexed by In4.  As seen, when using the [[Using References|reference operator, \]] you can specify in brackets which index becomes the ''set dimension''.
+If you omit the parentheses, it gives an error due to a syntactic ambiguity:
+:<code>\[1, 2, 3]    { **** Does not work **** }</code>
-== Literal Sets ==
+Although a list is ordered and may contain duplicates, the set operations ignore the order and any duplicates. Any Set result contains only unique elements. They also ignore [[Null]] elements, unless you specify optional parameter <code>keepNulls: True</code>.
-To create a set from a literal list, you must either specify the ''Null dimension'' to the [[Using References|reference operator]], or you must surround the brackets by parentheses.  You cannot simply place a backslash in front of a literal list, since the [[Using References|backslash operator]] sees brackets and assumes that the brackets are specifying the indexes to swallow.  Here are two examples of how to express a literal set:
+=== The Empty Set ===
+You can specify the Empty set (that contains nothing) simply as:
+:<code>\([])      { The empty set }</code>
- [[Using References|\]][ [[Null]] ][1,2,3]
+The expression [[Null]] is not the same as the empty set.  Set functions '''SetIntersection''', '''SetUnion''', etc. ignore [[Null]] elements, just as array functions [[Sum]], [[Max]], etc. ignore them. Thus, the intersection of a set <code>S</code> with [[Null]] is therefore the set <code>S</code>, not the empty set.
- [[Using References|\]]([1,2,3])
-but
- [[Using References|\]][1,2,3]    { **** Does not work **** }
-== Converting a Set to a List ==
+=== Convert a Set to a List ===
-The [[Using References|dereference operator, #]] is used to convert a set back into a list.  This operation does not array-abstract, so you can apply it to a single set, but not to an array of sets.
+You can obtain a list from a Set using the [[Using References|dereference operator, #]].
+:<code># A_set  &rarr; ['a', 'b', 'c', 'd']</code>
-= SetContains =
+This operation does not [[Array Abstraction|array-abstract]]: You can apply it to a single set, but not to an array of sets.
- Function SetContains( set, element )
+=== Summary of Set functions ===
-Returns true if element is contained in the set.
+Most of the set functions take a list or array of one or more sets as their primary function.
- ''SetContains''( \[[Sequence]](7,1000,7), [770,775,777] ) &rarr; [1,0,7]
+SetUnion(sets):  Returns a set that is the union of the sets -- that is, all elements that occur in one or more of the sets.
-= SetsAreEqual =
+SetIntersection(sets):  Returns a set that is the intersection of the sets -- that is, all elements that occur in one or more of the sets.
-  Function SetsAreEqual( sets'', I, ignoreNulls'' )
+SetIntersection(sets):  Returns a set that is the intersection of the sets -- that is, all elements that occur in one or more of the sets.
-Returns true when all the sets passed into the first parameter have exactly the same elements, without regard to duplicates or ordering, and ignoring [[Null]] values (unless «ignoreNulls» is explicitly specified to be false).
+[[#SetIntersection]], [[#SetDifference]], and [[#SetContains]].
- Var L1 := [1,1,1,2,3];
+== Function SetIntersection ==
- Var L2 := [3,2,2,1];
- Var L3 := [2,3,1,Null];
- '''SetsAreEqual'''( [\L1,\L2,\L3] ) &rarr; 1
-In this example, all three sets are treated as the set {1,2,3}.  But:
+:'''SetIntersection'''(sets'', I, resultIndex, keepNulls'')
- '''SetsAreEqual'''( [\L1,\L2,\L3], ignoreNulls:false ) &rarr; 0
+Returns the set of elements in common to all the sets specified in the first parameter, «sets».
-With the optional parameter, the set \L3 is then understood to include the [[Null]] value.
+The first parameter is a list or array of sets.  If this is or might be a multi-dimensional array, you should specify the optional second parameter, «I» as the index to operate over.
-The following tests whether every row of a table contains the same set of items (ignoring ordering), where T is indexed by Row and Col:
+Consider the following array:
+:<code>A := </code>
+:{| class="wikitable"
+! !! colspan="6" | J &#9654;
+|-
+! rowspan=3 | I &#9660;
+| 3 || 6 || 9 || 2 || 5 || 8
+|-
+| 7 || 4 || 1 || 8 || 5 || 2
+|-
+| 4 || 8 || 2 || 6 || 0 || 3
+|}
- SetsAreEqual( \[Col]T, Row )
+To intersect the rows of <code>A</code> -- i.e. find the elements common to all rows -- treat each <code>J</code>-vector as a set <code>(\[J]A)</code> and operate over the <code>I</code> index:
-The first parameter specifies that each ''Col''-vector (i.e., each row) is taken as a set.  The index parameter, ''Row'', specifie that the comparison takes place along the ''Row'' index of ''T''.
+:<code>#SetIntersection(\[J]A, I) &rarr; [2, 8]</code>
-= SetIntersection =
+Optionally you can map the result onto a pre-existing index.  When you provide a «resultIndex», it returns an array over that index containing the resulting elements.  When «resultIndex» is not provided, the result is a set (a reference to a list).  If «resultIndex» is too short to accommodate all elements in the result, it includes only the first <code>Size(resultIndex)</code> elements of the result.  If it's too short, it pads the final cells with [[Null]].
- Function SetUnion( sets'', I, resultIndex, keepNulls'' )
+:<code>SetIntersection(\[J]A, I, resultIndex: J) &rarr;</code>
+:{|class="wikitable"
+! colspan="6" | J &rarr;
+|-
+| 2 || 8 || «null» || «null» || «null» || «null»
+|}
-= SetUnion =
+To find the set of elements that two indexes have in common, use:
- Function SetUnion( sets'', I, resultIndex, keepNulls'' )
+:<code>#SetIntersection([\I, \J])</code>
-= SetDifference =
+== Function SetUnion ==
-  Function SetDifference( originalSet, remove'', remove2, remove3, ..., resultIndex, keepNull'')
+:'''SetUnion'''(sets'', I, resultIndex, keepNulls'')
+Returns the set of all unique non-[[Null]] elements occurring in any of the sets passed in the first parameter, «sets».
+The first parameter is an array of sets.  To find the union of the elements in set of lists or indexes, use:
+:<code>#SetUnion([\L1, \L2, \L3, \L4])</code>
+Since the result is a ''set'', i.e., a [[Using References|reference]] to a list, the [[Using References|dereference operator, #]] is applied to the result.  That de-referenced result can then be used to define a new index.
+To find the union of all unique elements occurring along the rows of a 2-dimensional array indexed by <code>I</code> and <code>J</code>, use:
+:<code>#SetUnion(\A[J], I)</code>
+This turns each row (each row being a slice along <code>I</code> indexed by <code>J</code>) into a set, resulting in a 1-D array of sets indexed by <code>I</code>, and then applies the union operation along the <code>I</code> dimension.
+To include [[Null]] values in the result, specify the optional parameter «keepNulls» as true.
+:<code>[[Local]] L1 := [1, 3, Null, 5];</code>
+:<code>[[Local]] L2 := [1, 2, 4, 5];</code>
+:<code>SetUnion( [\L1, \L2] ) &rarr; \[1, 3, 5, 2, 4]</code>
+:<code>SetUnion( [\L1, \L2], keepNulls: true) &rarr; \[1, 3, Null, 5, 2, 4]</code>
+If you specify the optional «resultIndex», the result is returned as an array (rather than as a reference) along the indicated index.  The number of elements in the result is truncated to the index's length, or padded with [[Null]] values if the index is longer than necessary.
+== Function SetDifference ==
+:'''SetDifference'''(originalSet, remove'', remove2, remove3, ..., resultIndex, keepNull'')
+Returns the unique set of non-[[Null]] elements in «originalSet» that do not appear in any of the other sets, «remove»....  The result is a set (reference to a list) unless «resultIndex» is specified.  If «resultIndex» is specified, then an array indexed by the result index is returned, truncated to the length of «resultIndex» or padded with [[Null]] values.
+:<code>SetDifference(\Sequence(1, 10), \Sequence(2, 10, 2), \Sequence(3, 10, 3)) &rarr; \[1, 5, 7]</code>
+You can also remove individual elements:
+:<code>SetDifference(\Sequence(1, 4), 3) &rarr; \[1, 2, 4]</code>
+[[Null]] values are not included in the result unless «keepNull» is specified as true.
+== Usage==
+When provided with only a single set, '''SetUnion'''(s), '''SetIntersection'''(s) and '''SetDifference'''(s) have the effect of removing duplicate (and [[Null]]) values.  Hence, when <code>A</code> is an array without any [[Null]] values, these are all equivalent:
+:<code>A[I = Unique(A, I)]</code>
+:<code>#SetUnion([\A])</code>
+:<code>#SetIntersection([\A])</code>
+:<code>#SetDifference(\A)</code>
+When using an index, <code>I</code>, then <code>SetContains(\I, x)</code> and <code>@[I = x] > 0</code> are essentially equivalent.  They are exactly equivalent when the [[IndexValue]](I) does not contain any [[handle]]s, or is a [[MetaOnly]] index (or [[MetaIndex]]).
+== Function SetContains ==
+:'''SetContains'''(s, element)
+Returns true if element is contained in the set «s». Unlike other Set functions, the result is not a Set. It is a simple Truth value -- or an array of Booleans if the second parameter, element, is an array, for example:
+:<code>SetContains(\(1 .. 10), [9, 10, 11]) &rarr; [1, 1, 0]</code>
+Note that the second parameter <code>[9, 10, 11]</code> is a list of potential set elements, not itself a set.
+== Function SetsAreEqual ==
+:'''SetsAreEqual'''(sets'', I, ignoreNull'')
+Returns true if the first parameter is a list of sets that have exactly the same elements, ignoring duplicates or ordering (and ignoring [[Null]] values, unless optional parameter «ignoreNull» is specified as <code>False</code>).
+:<code>[[Local]] L1 := [1, 1, 1, 2, 3];</code>
+:<code>[[Local]] L2 := [3, 2, 2, 1];</code>
+:<code>[[Local]] L3 := [2, 3, 1, Null];</code>
+:<code>SetsAreEqual([\L1, \L2, \L3]) &rarr; 1</code>
+In this example, all three sets are treated as the set <code>{1, 2, 3}</code>.  But:
+:<code>SetsAreEqual([\L1, \L2, \L3], IgnoreNull: False) &rarr; 0</code>
+With «ignoreNull» set to <code>False</code>, the set <code>\L3</code> includes the [[Null]] value, and so is not identical to sets <code>\L1</code> and <code>\L2</code>.
+If Table <code>T</code> is indexed by <code>Row</code> and <code>Col</code>, this expression tests if each <code>Row</code> contains the same items (ignoring ordering or repeated items):
+:<code>SetsAreEqual(\[Col]T, Row)</code>
+The first parameter specifies that each <code>Col</code>-vector (i.e., each row) is taken as a set.  The index parameter, <code>Row</code>, specifies that the comparison takes place along the <code>Row</code> index of <code>T</code>.
+== History ==
+The set functions were introduced in [[Analytica 4.3]].
+== See Also ==
+* [[Using References]]
+* [[Index_Position_Operator::@|The @[I = n] operator]]
+* [[Unique]]
+* [[Sort]]
+* [[SortIndex]]
+* [[Sets - collections of unique elements]]