Difference between revisions of "TableJoin"

(Created page with "category:Functions that create arrays ''Requires Analytica 5.1 or later.'' ''This function is currently experimental and not exposed. It could be subject to change i...")
 
 
(15 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
[[category:Functions that create arrays]]
 
[[category:Functions that create arrays]]
 +
[[category:Array Functions]]
  
 
''Requires [[Analytica 5.1]] or later.''
 
''Requires [[Analytica 5.1]] or later.''
  
''This function is currently experimental and not exposed. It could be subject to change in a future release.''
+
== TableJoin( x1, row1, col1, x2, row2, col2'', sharedCol..., joinType, position'' ) ==
  
== TableJoin( x1, row1, col1, x2, row2, col2'', sharedCol..., joinType, position'' ) ==
+
Assists in efficient computation of the join of two relational tables. it can perform the Inner (default), Outer, Left, and Right joins.  It applies to two relational tables, «x1», indexed by «row1» and  «col1» and «x2» indexed by «row2» and «col2». Optional parameter «sharedCol» is a column (or list of columns) which have the same name in «col1» and «col2» over which it performs the join. (If you omit «sharedCol», it joins over all columns with the same name in «col1» and «col2».)
  
Computes the join of two relational tables, «x1» and «x2».   
+
The result is an array with local indexes <code>Pair</code>, with values [1, 2], corresponding to the tables «x1»and «x2»,  and <code>Row</code>, which ranges from 1 to the number of rows in the joined table. Each row in the result, call it <code>r</code>, contains two values that refer to corresponding rows in tables «x1» and «x2». I.e.  <code>r[.Pair=1]</code> contains values from «row1», and <code>r[.Pair=2]</code> contains values from «row2».  If you specify «position» to true, it returns the positions in «row1» and «row2» instead of their labels.   
  
The result is indexed by two local indexes, <code>Pair</code> and <code>Row</code>. The result index <code>Row</code> ranges from 1 to the number of rows in the joined table. Let ''result'' denote the result, then <code>result[.Pair=1]</code> is a map from <code>Result.Row</code> to «row1», and <code>result[.Pair=2]</code> is a map from <code>Result.Row</code> to «row2».
+
To complete the join, you use the result of [[TableJoin]] to create a new relational table from the two tables. This involves:
 +
# Defining a result index containing the desired columns in the desired order.
 +
# Selecting the values from each of the source tables, using the result of [[TableJoin]] for the row of the source table.
  
 
=== Parameters ===
 
=== Parameters ===
 
* «x1»: The first relational table, indexed by «row1» and  «col1».
 
* «x1»: The first relational table, indexed by «row1» and  «col1».
 
* «x2»: The second relation table, indexed by «row2» and «col2».
 
* «x2»: The second relation table, indexed by «row2» and «col2».
* «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2», and it is the label or labels from «col1» and «col2» that are passed in this parameter. When this parameter is not specified, all columns that have the same labels in the «col1» and «col2» indexes, ignoring the final column in each, is used. If you specify <code>[]</code> for this parameter, a cross-join is performed.
+
* «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2». The parameter is passed this label or these labels from «col1» and «col2». If you don't specify this parameter, it assumes you mean all columns that have the same labels in the «col1» and «col2» indexes, ignoring the final column in each. If you specify an empty list, <code>[]</code>, for this parameter, it performs a cross-join -- i.e. it combines every row from «x1» with every row from «x2», resulting in <code>Size(«row1») * Size(«row2»)</code> rows.
* «joinType»: (Optional) Can be <code>'Inner'</code>, <code>'Outer'</code>, <code>'Left'</code> or <code>'Right'</code>. Specifies the join type.  
+
* «joinType»: (Optional) Can be <code>'Inner'</code>, <code>'Outer'</code>, <code>'Left'</code> or <code>'Right'</code>. Specifies the join type. See an explanation of SQL for details.
 
* «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2».
 
* «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2».
  
 
== Examples ==
 
== Examples ==
TBD
+
The examples here will use these source relational tables, each indexed by their own local indexes <code>.Row</code> and <code>.Col</code>.
 +
 
 +
{| style="width:100%;"
 +
| style="vertical-align:top; width:50%;" |
 +
 
 +
'''Variable Data1 ::='''
 +
 
 +
{| class="wikitable"
 +
! <code>.Row</code> &darr; !! Fruit !! Country !! 'Value'
 +
|-
 +
! 1
 +
| Grape || France || 6.998
 +
|-
 +
! 2
 +
| Honeydew || Hungary || 5.322
 +
|-
 +
! 3
 +
| Honeydew || Jamaica || 3.935
 +
|-
 +
! 4
 +
| Kiwi || Germany || 5.862
 +
|-
 +
! 5
 +
| Orange || Germany || 7.533
 +
|-
 +
! 6
 +
| Orange || Hungary || 3.877
 +
|-
 +
! 7
 +
| Plum || Canada || 10.44
 +
|-
 +
! 8
 +
| Plum || Germany || 6.19
 +
|-
 +
! 9
 +
| Watermelon || Denmark || 7.611
 +
|-
 +
! 10
 +
| Watermelon || France || 5.878
 +
|}
 +
 
 +
| style="vertical-align:top; width:50%;" |
 +
 
 +
'''Variable Data2 ::='''
 +
 
 +
{| class="wikitable"
 +
! <code>.Row</code> &darr; !! Country !! Edition !! 'Value'
 +
|-
 +
!| 1
 +
| England || Professional || 7.578
 +
|-
 +
! 2
 +
| France || Professional || 11.27
 +
|-
 +
! 3
 +
| Hungary || Free 101 || 9.441
 +
|-
 +
! 4
 +
| Jamaica || Cubeplan || 8.054
 +
|-
 +
! 5
 +
| Jamaica || Optimizer || 9.717
 +
|}
 +
 
 +
|}
 +
 
 +
=== Inner join example ===
 +
:Variable InnerJoinRow ::=
 +
::<code>[[TableJoin]](data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Inner')</code>
 +
:&rarr;
 +
<center>
 +
{| class="wikitable"
 +
| &nbsp;
 +
! colspan=2 | <code>.Pair &rarr;</code>
 +
|-
 +
! <code>.Row</code>&darr; !! 1 !! 2
 +
|-
 +
! 1
 +
| 1 || 2
 +
|-
 +
! 2
 +
| 10 || 2
 +
|-
 +
! 3
 +
| 2 || 3
 +
|-
 +
! 4
 +
| 6 || 3
 +
|-
 +
! 5
 +
| 3 || 4
 +
|-
 +
! 6
 +
| 3 || 5
 +
|}
 +
</center>
 +
 
 +
This result from [[TableJoin]] gives you the rows labels from each source table that is joined. There are many ways in which you might decide the join them. For example, you might want to keep only the Fruit and Edition columns and the Value column from Data1.
 +
 
 +
:Index InnerJoinCol ::= [ 'Fruit', 'Edition', 'Value' ]
 +
 
 +
:Variable Inner_Join_Result ::=
 +
[[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null];
 +
[[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null];
 +
[[If]] tmp1=Null [[Then]] tmp2 [[Else]] tmp1
 +
 
 +
<code>InnerJoinResult</code> &rarr;
 +
{| class="wikitable"
 +
| &nbsp;
 +
! colspan=3 | <code>InnerJoinCol &rarr;</code>
 +
|-
 +
! <code>.Row</code> &darr;
 +
! Fruit !! Edition !! 'Value'
 +
|-
 +
! 1
 +
| Grape || Professional || 6.998
 +
|-
 +
! 2
 +
| Watermelon || Professional || 5.878
 +
|-
 +
! 3
 +
| Honeydew || Free 101 || 5.322
 +
|-
 +
! 4
 +
| Orange || Free 101 || 3.877
 +
|-
 +
! 5
 +
| Honeydew || Cubeplan || 3.935
 +
|-
 +
! 6
 +
| Honeydew || Optimizer || 3.935
 +
|}
 +
 
 +
<code>Inner_Join_Result2</code> ::=
 +
{| class="wikitable"
 +
| &nbsp;
 +
! colspan=3 | <code>InnerJoinCol &rarr;</code>
 +
|-
 +
! <code>.Row</code> &darr;
 +
! Fruit !! Edition !! 'Value'
 +
|-
 +
! 1
 +
| Grape || Professional || 78.89
 +
|-
 +
! 2
 +
| Watermelon || Professional || 66.27
 +
|-
 +
! 3
 +
| Honeydew || Free 101 || 50.24
 +
|-
 +
! 4
 +
| Orange || Free 101 || 36.6
 +
|-
 +
! 5
 +
| Honeydew || Cubeplan || 31.69
 +
|-
 +
! 6
 +
| Honeydew || Optimizer || 38.23
 +
|}
 +
 
 +
Or perhaps you want the product of the two original <code>'Value'</code> columns:
 +
 
 +
:Variable Inner_Join_Result2 ::=
 +
[[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null];
 +
[[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null];
 +
[[If]] JCol='Value' [[Then]]
 +
tmp1[InnerJoinCol='Value'] * tmp2[InnerJoinCol='Value']
 +
[[Else]] [[If]] tmp1=Null [[Then]]
 +
tmp2
 +
[[Else]] tmp1
 +
 
 +
=== Outer join example ===
 +
:Variable Outer_Join_Row ::=
 +
::<code>[[TableJoin]](data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Outer')</code>
 +
:&rarr;
 +
<center>
 +
{| class="wikitable"
 +
| &nbsp;
 +
! colspan=2 | <code>.Pair &rarr;</code>
 +
|-
 +
! <code>.Row</code> &darr;
 +
! 1 !! 2
 +
|-
 +
! 1
 +
| 1 || 2
 +
|-
 +
! 2
 +
| 10 || 2
 +
|-
 +
! 3
 +
| 7 || 1
 +
|-
 +
! 4
 +
| 4 || style="color:lightgray" | Null
 +
|-
 +
! 5
 +
| 5 || style="color:lightgray" | Null
 +
|-
 +
! 6
 +
| 8 || style="color:lightgray" | Null
 +
|-
 +
! 7
 +
| 2 || 3
 +
|-
 +
! 8
 +
| 6 || 3
 +
|-
 +
! 9
 +
| 3 || 4
 +
|-
 +
! 10
 +
| style="color:lightgray" | Null || 5
 +
|-
 +
! 11
 +
| 9 || style="color:lightgray" | Null
 +
|-
 +
! 12
 +
| style="color:lightgray" | Null || 1
 +
|}
 +
</center>
 +
 
 +
This time we'll include all index columns with the shared column first.
 +
:Index Outer_Join_Col ::= <code>[ 'Country', 'Fruit', 'Edition', 'Value' ]</code>
 +
 
 +
And for the value, we'll take the larger of the two.
 +
:Variable Outer_Join_Result ::=
 +
[[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=Outer_join_col, defVal:null];
 +
[[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=Outer_join_col, defVal:null];
 +
[[Max]]([tmp1,tmp2])
 +
:&rarr;
 +
{| class="wikitable"
 +
| &nbsp;
 +
! colspan=4 | <code>Outer_Join_Col &rarr;</code>
 +
|-
 +
! <code>.Row</code> &darr;
 +
! Country !! Fruit !! Edition !! 'Value'
 +
|-
 +
! 1
 +
| France || Grape || Professional || 11.27
 +
|-
 +
! 2
 +
| France || Watermelon || Professional || 11.27
 +
|-
 +
! 3
 +
| Canada || Plum || style="color:lightgray" | Null || 10.44
 +
|-
 +
! 4
 +
| Germany || Kiwi || style="color:lightgray" | Null || 5.862
 +
|-
 +
! 5
 +
| Germany || Orange || style="color:lightgray" | Null || 7.533
 +
|-
 +
! 6
 +
| Germany || Plum || style="color:lightgray" | Null || 6.19
 +
|-
 +
! 7
 +
| Hungary || Honeydew || Free 101 || 9.441
 +
|-
 +
! 8
 +
| Hungary || Orange || Free 101 || 9.441
 +
|-
 +
! 9
 +
| Jamaica || Honeydew || Cubeplan || 8.054
 +
|-
 +
! 10
 +
| Jamaica || Honeydew || Optimizer || 9.717
 +
|-
 +
! 11
 +
| Denmark || Watermelon || style="color:lightgray" | Null || 7.611
 +
|-
 +
! 12
 +
| England || style="color:lightgray" | Null || Professional || 7.578
 +
|}
 +
 
 +
== Other ways to do a join ==
  
== Related ways of doing joins ==
+
The most common type of join is a Left join over a single column (key). You can do this with the  [[Slice]] or [[Subscript]] operator. For example, suppose <code>Zip_of_person</code> is indexed by <code>Person</code> and contains the zip code of each person, and <code>County_by_zip</code> is indexed by <code>Zip</code>.  Then <code>County_by_zip[Zip=Zip_of_person]</code> returns the county for each person. This is logically a left join, but a far simpler and more natural method.
In general, there are alternative ways in Analytica for computing joins when a single column is shared. When multiple columns are shared, then [[TableJoin]] is really the only easy way to do it.
 
  
Left joins with a single key (column) occur frequently in practice, but are usually done the [[Slice]] or [[Subscript]] operator. For example, suppose <code>Zip_of_person</code> is indexed by <code>Person</code> and contains the zip code of each person, and <code>County_by_zip</code> is indexed by <code>Zip</code>.  Then <code>County_by_zip[Zip=Zip_of_person]]</code> returns the county for each person, which is logically a left join, but a far simpler and more natural method.
+
When you want to join over multiple columns, [[TableJoin]] is really the only easy way to do it.
  
 
== See Also ==
 
== See Also ==

Latest revision as of 17:37, 26 May 2025


Requires Analytica 5.1 or later.

TableJoin( x1, row1, col1, x2, row2, col2, sharedCol..., joinType, position )

Assists in efficient computation of the join of two relational tables. it can perform the Inner (default), Outer, Left, and Right joins. It applies to two relational tables, «x1», indexed by «row1» and «col1» and «x2» indexed by «row2» and «col2». Optional parameter «sharedCol» is a column (or list of columns) which have the same name in «col1» and «col2» over which it performs the join. (If you omit «sharedCol», it joins over all columns with the same name in «col1» and «col2».)

The result is an array with local indexes Pair, with values [1, 2], corresponding to the tables «x1»and «x2», and Row, which ranges from 1 to the number of rows in the joined table. Each row in the result, call it r, contains two values that refer to corresponding rows in tables «x1» and «x2». I.e. r[.Pair=1] contains values from «row1», and r[.Pair=2] contains values from «row2». If you specify «position» to true, it returns the positions in «row1» and «row2» instead of their labels.

To complete the join, you use the result of TableJoin to create a new relational table from the two tables. This involves:

  1. Defining a result index containing the desired columns in the desired order.
  2. Selecting the values from each of the source tables, using the result of TableJoin for the row of the source table.

Parameters

  • «x1»: The first relational table, indexed by «row1» and «col1».
  • «x2»: The second relation table, indexed by «row2» and «col2».
  • «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2». The parameter is passed this label or these labels from «col1» and «col2». If you don't specify this parameter, it assumes you mean all columns that have the same labels in the «col1» and «col2» indexes, ignoring the final column in each. If you specify an empty list, [], for this parameter, it performs a cross-join -- i.e. it combines every row from «x1» with every row from «x2», resulting in Size(«row1») * Size(«row2») rows.
  • «joinType»: (Optional) Can be 'Inner', 'Outer', 'Left' or 'Right'. Specifies the join type. See an explanation of SQL for details.
  • «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2».

Examples

The examples here will use these source relational tables, each indexed by their own local indexes .Row and .Col.

Variable Data1 ::=

.Row Fruit Country 'Value'
1 Grape France 6.998
2 Honeydew Hungary 5.322
3 Honeydew Jamaica 3.935
4 Kiwi Germany 5.862
5 Orange Germany 7.533
6 Orange Hungary 3.877
7 Plum Canada 10.44
8 Plum Germany 6.19
9 Watermelon Denmark 7.611
10 Watermelon France 5.878

Variable Data2 ::=

.Row Country Edition 'Value'
1 England Professional 7.578
2 France Professional 11.27
3 Hungary Free 101 9.441
4 Jamaica Cubeplan 8.054
5 Jamaica Optimizer 9.717

Inner join example

Variable InnerJoinRow ::=
TableJoin(data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Inner')
  .Pair →
.Row 1 2
1 1 2
2 10 2
3 2 3
4 6 3
5 3 4
6 3 5

This result from TableJoin gives you the rows labels from each source table that is joined. There are many ways in which you might decide the join them. For example, you might want to keep only the Fruit and Edition columns and the Value column from Data1.

Index InnerJoinCol ::= [ 'Fruit', 'Edition', 'Value' ]
Variable Inner_Join_Result ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null];
Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null];

If tmp1=Null Then tmp2 Else tmp1

InnerJoinResult

  InnerJoinCol →
.Row Fruit Edition 'Value'
1 Grape Professional 6.998
2 Watermelon Professional 5.878
3 Honeydew Free 101 5.322
4 Orange Free 101 3.877
5 Honeydew Cubeplan 3.935
6 Honeydew Optimizer 3.935

Inner_Join_Result2 ::=

  InnerJoinCol →
.Row Fruit Edition 'Value'
1 Grape Professional 78.89
2 Watermelon Professional 66.27
3 Honeydew Free 101 50.24
4 Orange Free 101 36.6
5 Honeydew Cubeplan 31.69
6 Honeydew Optimizer 38.23

Or perhaps you want the product of the two original 'Value' columns:

Variable Inner_Join_Result2 ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null];
Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null];
If JCol='Value' Then
	tmp1[InnerJoinCol='Value'] * tmp2[InnerJoinCol='Value']
Else If tmp1=Null Then
	tmp2
Else tmp1

Outer join example

Variable Outer_Join_Row ::=
TableJoin(data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Outer')
  .Pair →
.Row 1 2
1 1 2
2 10 2
3 7 1
4 4 Null
5 5 Null
6 8 Null
7 2 3
8 6 3
9 3 4
10 Null 5
11 9 Null
12 Null 1

This time we'll include all index columns with the shared column first.

Index Outer_Join_Col ::= [ 'Country', 'Fruit', 'Edition', 'Value' ]

And for the value, we'll take the larger of the two.

Variable Outer_Join_Result ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=Outer_join_col, defVal:null];
Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=Outer_join_col, defVal:null];
Max([tmp1,tmp2])
  Outer_Join_Col →
.Row Country Fruit Edition 'Value'
1 France Grape Professional 11.27
2 France Watermelon Professional 11.27
3 Canada Plum Null 10.44
4 Germany Kiwi Null 5.862
5 Germany Orange Null 7.533
6 Germany Plum Null 6.19
7 Hungary Honeydew Free 101 9.441
8 Hungary Orange Free 101 9.441
9 Jamaica Honeydew Cubeplan 8.054
10 Jamaica Honeydew Optimizer 9.717
11 Denmark Watermelon Null 7.611
12 England Null Professional 7.578

Other ways to do a join

The most common type of join is a Left join over a single column (key). You can do this with the Slice or Subscript operator. For example, suppose Zip_of_person is indexed by Person and contains the zip code of each person, and County_by_zip is indexed by Zip. Then County_by_zip[Zip=Zip_of_person] returns the county for each person. This is logically a left join, but a far simpler and more natural method.

When you want to join over multiple columns, TableJoin is really the only easy way to do it.

See Also

Comments


You are not allowed to post comments.