Difference between revisions of "TableJoin"
(Created page with "category:Functions that create arrays ''Requires Analytica 5.1 or later.'' ''This function is currently experimental and not exposed. It could be subject to change i...") |
|||
(15 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
[[category:Functions that create arrays]] | [[category:Functions that create arrays]] | ||
+ | [[category:Array Functions]] | ||
''Requires [[Analytica 5.1]] or later.'' | ''Requires [[Analytica 5.1]] or later.'' | ||
− | '' | + | == TableJoin( x1, row1, col1, x2, row2, col2'', sharedCol..., joinType, position'' ) == |
− | + | Assists in efficient computation of the join of two relational tables. it can perform the Inner (default), Outer, Left, and Right joins. It applies to two relational tables, «x1», indexed by «row1» and «col1» and «x2» indexed by «row2» and «col2». Optional parameter «sharedCol» is a column (or list of columns) which have the same name in «col1» and «col2» over which it performs the join. (If you omit «sharedCol», it joins over all columns with the same name in «col1» and «col2».) | |
− | + | The result is an array with local indexes <code>Pair</code>, with values [1, 2], corresponding to the tables «x1»and «x2», and <code>Row</code>, which ranges from 1 to the number of rows in the joined table. Each row in the result, call it <code>r</code>, contains two values that refer to corresponding rows in tables «x1» and «x2». I.e. <code>r[.Pair=1]</code> contains values from «row1», and <code>r[.Pair=2]</code> contains values from «row2». If you specify «position» to true, it returns the positions in «row1» and «row2» instead of their labels. | |
− | + | To complete the join, you use the result of [[TableJoin]] to create a new relational table from the two tables. This involves: | |
+ | # Defining a result index containing the desired columns in the desired order. | ||
+ | # Selecting the values from each of the source tables, using the result of [[TableJoin]] for the row of the source table. | ||
=== Parameters === | === Parameters === | ||
* «x1»: The first relational table, indexed by «row1» and «col1». | * «x1»: The first relational table, indexed by «row1» and «col1». | ||
* «x2»: The second relation table, indexed by «row2» and «col2». | * «x2»: The second relation table, indexed by «row2» and «col2». | ||
− | * «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2» | + | * «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2». The parameter is passed this label or these labels from «col1» and «col2». If you don't specify this parameter, it assumes you mean all columns that have the same labels in the «col1» and «col2» indexes, ignoring the final column in each. If you specify an empty list, <code>[]</code>, for this parameter, it performs a cross-join -- i.e. it combines every row from «x1» with every row from «x2», resulting in <code>Size(«row1») * Size(«row2»)</code> rows. |
− | * «joinType»: (Optional) Can be <code>'Inner'</code>, <code>'Outer'</code>, <code>'Left'</code> or <code>'Right'</code>. Specifies the join type. | + | * «joinType»: (Optional) Can be <code>'Inner'</code>, <code>'Outer'</code>, <code>'Left'</code> or <code>'Right'</code>. Specifies the join type. See an explanation of SQL for details. |
* «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2». | * «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2». | ||
== Examples == | == Examples == | ||
− | + | The examples here will use these source relational tables, each indexed by their own local indexes <code>.Row</code> and <code>.Col</code>. | |
+ | |||
+ | {| style="width:100%;" | ||
+ | | style="vertical-align:top; width:50%;" | | ||
+ | |||
+ | '''Variable Data1 ::=''' | ||
+ | |||
+ | {| class="wikitable" | ||
+ | ! <code>.Row</code> ↓ !! Fruit !! Country !! 'Value' | ||
+ | |- | ||
+ | ! 1 | ||
+ | | Grape || France || 6.998 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | Honeydew || Hungary || 5.322 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | Honeydew || Jamaica || 3.935 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | Kiwi || Germany || 5.862 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | Orange || Germany || 7.533 | ||
+ | |- | ||
+ | ! 6 | ||
+ | | Orange || Hungary || 3.877 | ||
+ | |- | ||
+ | ! 7 | ||
+ | | Plum || Canada || 10.44 | ||
+ | |- | ||
+ | ! 8 | ||
+ | | Plum || Germany || 6.19 | ||
+ | |- | ||
+ | ! 9 | ||
+ | | Watermelon || Denmark || 7.611 | ||
+ | |- | ||
+ | ! 10 | ||
+ | | Watermelon || France || 5.878 | ||
+ | |} | ||
+ | |||
+ | | style="vertical-align:top; width:50%;" | | ||
+ | |||
+ | '''Variable Data2 ::=''' | ||
+ | |||
+ | {| class="wikitable" | ||
+ | ! <code>.Row</code> ↓ !! Country !! Edition !! 'Value' | ||
+ | |- | ||
+ | !| 1 | ||
+ | | England || Professional || 7.578 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | France || Professional || 11.27 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | Hungary || Free 101 || 9.441 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | Jamaica || Cubeplan || 8.054 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | Jamaica || Optimizer || 9.717 | ||
+ | |} | ||
+ | |||
+ | |} | ||
+ | |||
+ | === Inner join example === | ||
+ | :Variable InnerJoinRow ::= | ||
+ | ::<code>[[TableJoin]](data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Inner')</code> | ||
+ | :→ | ||
+ | <center> | ||
+ | {| class="wikitable" | ||
+ | | | ||
+ | ! colspan=2 | <code>.Pair →</code> | ||
+ | |- | ||
+ | ! <code>.Row</code>↓ !! 1 !! 2 | ||
+ | |- | ||
+ | ! 1 | ||
+ | | 1 || 2 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | 10 || 2 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | 2 || 3 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | 6 || 3 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | 3 || 4 | ||
+ | |- | ||
+ | ! 6 | ||
+ | | 3 || 5 | ||
+ | |} | ||
+ | </center> | ||
+ | |||
+ | This result from [[TableJoin]] gives you the rows labels from each source table that is joined. There are many ways in which you might decide the join them. For example, you might want to keep only the Fruit and Edition columns and the Value column from Data1. | ||
+ | |||
+ | :Index InnerJoinCol ::= [ 'Fruit', 'Edition', 'Value' ] | ||
+ | |||
+ | :Variable Inner_Join_Result ::= | ||
+ | [[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null]; | ||
+ | [[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null]; | ||
+ | [[If]] tmp1=Null [[Then]] tmp2 [[Else]] tmp1 | ||
+ | |||
+ | <code>InnerJoinResult</code> → | ||
+ | {| class="wikitable" | ||
+ | | | ||
+ | ! colspan=3 | <code>InnerJoinCol →</code> | ||
+ | |- | ||
+ | ! <code>.Row</code> ↓ | ||
+ | ! Fruit !! Edition !! 'Value' | ||
+ | |- | ||
+ | ! 1 | ||
+ | | Grape || Professional || 6.998 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | Watermelon || Professional || 5.878 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | Honeydew || Free 101 || 5.322 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | Orange || Free 101 || 3.877 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | Honeydew || Cubeplan || 3.935 | ||
+ | |- | ||
+ | ! 6 | ||
+ | | Honeydew || Optimizer || 3.935 | ||
+ | |} | ||
+ | |||
+ | <code>Inner_Join_Result2</code> ::= | ||
+ | {| class="wikitable" | ||
+ | | | ||
+ | ! colspan=3 | <code>InnerJoinCol →</code> | ||
+ | |- | ||
+ | ! <code>.Row</code> ↓ | ||
+ | ! Fruit !! Edition !! 'Value' | ||
+ | |- | ||
+ | ! 1 | ||
+ | | Grape || Professional || 78.89 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | Watermelon || Professional || 66.27 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | Honeydew || Free 101 || 50.24 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | Orange || Free 101 || 36.6 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | Honeydew || Cubeplan || 31.69 | ||
+ | |- | ||
+ | ! 6 | ||
+ | | Honeydew || Optimizer || 38.23 | ||
+ | |} | ||
+ | |||
+ | Or perhaps you want the product of the two original <code>'Value'</code> columns: | ||
+ | |||
+ | :Variable Inner_Join_Result2 ::= | ||
+ | [[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null]; | ||
+ | [[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null]; | ||
+ | [[If]] JCol='Value' [[Then]] | ||
+ | tmp1[InnerJoinCol='Value'] * tmp2[InnerJoinCol='Value'] | ||
+ | [[Else]] [[If]] tmp1=Null [[Then]] | ||
+ | tmp2 | ||
+ | [[Else]] tmp1 | ||
+ | |||
+ | === Outer join example === | ||
+ | :Variable Outer_Join_Row ::= | ||
+ | ::<code>[[TableJoin]](data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Outer')</code> | ||
+ | :→ | ||
+ | <center> | ||
+ | {| class="wikitable" | ||
+ | | | ||
+ | ! colspan=2 | <code>.Pair →</code> | ||
+ | |- | ||
+ | ! <code>.Row</code> ↓ | ||
+ | ! 1 !! 2 | ||
+ | |- | ||
+ | ! 1 | ||
+ | | 1 || 2 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | 10 || 2 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | 7 || 1 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | 4 || style="color:lightgray" | Null | ||
+ | |- | ||
+ | ! 5 | ||
+ | | 5 || style="color:lightgray" | Null | ||
+ | |- | ||
+ | ! 6 | ||
+ | | 8 || style="color:lightgray" | Null | ||
+ | |- | ||
+ | ! 7 | ||
+ | | 2 || 3 | ||
+ | |- | ||
+ | ! 8 | ||
+ | | 6 || 3 | ||
+ | |- | ||
+ | ! 9 | ||
+ | | 3 || 4 | ||
+ | |- | ||
+ | ! 10 | ||
+ | | style="color:lightgray" | Null || 5 | ||
+ | |- | ||
+ | ! 11 | ||
+ | | 9 || style="color:lightgray" | Null | ||
+ | |- | ||
+ | ! 12 | ||
+ | | style="color:lightgray" | Null || 1 | ||
+ | |} | ||
+ | </center> | ||
+ | |||
+ | This time we'll include all index columns with the shared column first. | ||
+ | :Index Outer_Join_Col ::= <code>[ 'Country', 'Fruit', 'Edition', 'Value' ]</code> | ||
+ | |||
+ | And for the value, we'll take the larger of the two. | ||
+ | :Variable Outer_Join_Result ::= | ||
+ | [[Local]] tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=Outer_join_col, defVal:null]; | ||
+ | [[Local]] tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=Outer_join_col, defVal:null]; | ||
+ | [[Max]]([tmp1,tmp2]) | ||
+ | :→ | ||
+ | {| class="wikitable" | ||
+ | | | ||
+ | ! colspan=4 | <code>Outer_Join_Col →</code> | ||
+ | |- | ||
+ | ! <code>.Row</code> ↓ | ||
+ | ! Country !! Fruit !! Edition !! 'Value' | ||
+ | |- | ||
+ | ! 1 | ||
+ | | France || Grape || Professional || 11.27 | ||
+ | |- | ||
+ | ! 2 | ||
+ | | France || Watermelon || Professional || 11.27 | ||
+ | |- | ||
+ | ! 3 | ||
+ | | Canada || Plum || style="color:lightgray" | Null || 10.44 | ||
+ | |- | ||
+ | ! 4 | ||
+ | | Germany || Kiwi || style="color:lightgray" | Null || 5.862 | ||
+ | |- | ||
+ | ! 5 | ||
+ | | Germany || Orange || style="color:lightgray" | Null || 7.533 | ||
+ | |- | ||
+ | ! 6 | ||
+ | | Germany || Plum || style="color:lightgray" | Null || 6.19 | ||
+ | |- | ||
+ | ! 7 | ||
+ | | Hungary || Honeydew || Free 101 || 9.441 | ||
+ | |- | ||
+ | ! 8 | ||
+ | | Hungary || Orange || Free 101 || 9.441 | ||
+ | |- | ||
+ | ! 9 | ||
+ | | Jamaica || Honeydew || Cubeplan || 8.054 | ||
+ | |- | ||
+ | ! 10 | ||
+ | | Jamaica || Honeydew || Optimizer || 9.717 | ||
+ | |- | ||
+ | ! 11 | ||
+ | | Denmark || Watermelon || style="color:lightgray" | Null || 7.611 | ||
+ | |- | ||
+ | ! 12 | ||
+ | | England || style="color:lightgray" | Null || Professional || 7.578 | ||
+ | |} | ||
+ | |||
+ | == Other ways to do a join == | ||
− | + | The most common type of join is a Left join over a single column (key). You can do this with the [[Slice]] or [[Subscript]] operator. For example, suppose <code>Zip_of_person</code> is indexed by <code>Person</code> and contains the zip code of each person, and <code>County_by_zip</code> is indexed by <code>Zip</code>. Then <code>County_by_zip[Zip=Zip_of_person]</code> returns the county for each person. This is logically a left join, but a far simpler and more natural method. | |
− | |||
− | + | When you want to join over multiple columns, [[TableJoin]] is really the only easy way to do it. | |
== See Also == | == See Also == |
Latest revision as of 17:37, 26 May 2025
Requires Analytica 5.1 or later.
Assists in efficient computation of the join of two relational tables. it can perform the Inner (default), Outer, Left, and Right joins. It applies to two relational tables, «x1», indexed by «row1» and «col1» and «x2» indexed by «row2» and «col2». Optional parameter «sharedCol» is a column (or list of columns) which have the same name in «col1» and «col2» over which it performs the join. (If you omit «sharedCol», it joins over all columns with the same name in «col1» and «col2».)
The result is an array with local indexes Pair
, with values [1, 2], corresponding to the tables «x1»and «x2», and Row
, which ranges from 1 to the number of rows in the joined table. Each row in the result, call it r
, contains two values that refer to corresponding rows in tables «x1» and «x2». I.e. r[.Pair=1]
contains values from «row1», and r[.Pair=2]
contains values from «row2». If you specify «position» to true, it returns the positions in «row1» and «row2» instead of their labels.
To complete the join, you use the result of TableJoin to create a new relational table from the two tables. This involves:
- Defining a result index containing the desired columns in the desired order.
- Selecting the values from each of the source tables, using the result of TableJoin for the row of the source table.
Parameters
- «x1»: The first relational table, indexed by «row1» and «col1».
- «x2»: The second relation table, indexed by «row2» and «col2».
- «sharedCol»: (Optional) A list of the columns appearing in both «col1» and «col2» that are to be matched during the join. A shared column must have the same label in «col1» and «col2». The parameter is passed this label or these labels from «col1» and «col2». If you don't specify this parameter, it assumes you mean all columns that have the same labels in the «col1» and «col2» indexes, ignoring the final column in each. If you specify an empty list,
[]
, for this parameter, it performs a cross-join -- i.e. it combines every row from «x1» with every row from «x2», resulting inSize(«row1») * Size(«row2»)
rows. - «joinType»: (Optional) Can be
'Inner'
,'Outer'
,'Left'
or'Right'
. Specifies the join type. See an explanation of SQL for details. - «position»: (Optional) When true, the result contains the positions in «row1» and «row2». When false, contains the labels from «row1» and «row2».
Examples
The examples here will use these source relational tables, each indexed by their own local indexes .Row
and .Col
.
Variable Data1 ::=
|
Variable Data2 ::=
|
Inner join example
- Variable InnerJoinRow ::=
TableJoin(data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Inner')
- →
.Pair →
| ||
---|---|---|
.Row ↓ |
1 | 2 |
1 | 1 | 2 |
2 | 10 | 2 |
3 | 2 | 3 |
4 | 6 | 3 |
5 | 3 | 4 |
6 | 3 | 5 |
This result from TableJoin gives you the rows labels from each source table that is joined. There are many ways in which you might decide the join them. For example, you might want to keep only the Fruit and Edition columns and the Value column from Data1.
- Index InnerJoinCol ::= [ 'Fruit', 'Edition', 'Value' ]
- Variable Inner_Join_Result ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null]; Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null];
If tmp1=Null Then tmp2 Else tmp1
InnerJoinResult
→
InnerJoinCol →
| |||
---|---|---|---|
.Row ↓
|
Fruit | Edition | 'Value' |
1 | Grape | Professional | 6.998 |
2 | Watermelon | Professional | 5.878 |
3 | Honeydew | Free 101 | 5.322 |
4 | Orange | Free 101 | 3.877 |
5 | Honeydew | Cubeplan | 3.935 |
6 | Honeydew | Optimizer | 3.935 |
Inner_Join_Result2
::=
InnerJoinCol →
| |||
---|---|---|---|
.Row ↓
|
Fruit | Edition | 'Value' |
1 | Grape | Professional | 78.89 |
2 | Watermelon | Professional | 66.27 |
3 | Honeydew | Free 101 | 50.24 |
4 | Orange | Free 101 | 36.6 |
5 | Honeydew | Cubeplan | 31.69 |
6 | Honeydew | Optimizer | 38.23 |
Or perhaps you want the product of the two original 'Value'
columns:
- Variable Inner_Join_Result2 ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=InnerJoinCol, defVal:null]; Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=InnerJoinCol, defVal:null]; If JCol='Value' Then tmp1[InnerJoinCol='Value'] * tmp2[InnerJoinCol='Value'] Else If tmp1=Null Then tmp2 Else tmp1
Outer join example
- Variable Outer_Join_Row ::=
TableJoin(data1, data1.row, data1.col, data2, data2.row, data2.col, jointype: 'Outer')
- →
.Pair →
| ||
---|---|---|
.Row ↓
|
1 | 2 |
1 | 1 | 2 |
2 | 10 | 2 |
3 | 7 | 1 |
4 | 4 | Null |
5 | 5 | Null |
6 | 8 | Null |
7 | 2 | 3 |
8 | 6 | 3 |
9 | 3 | 4 |
10 | Null | 5 |
11 | 9 | Null |
12 | Null | 1 |
This time we'll include all index columns with the shared column first.
- Index Outer_Join_Col ::=
[ 'Country', 'Fruit', 'Edition', 'Value' ]
And for the value, we'll take the larger of the two.
- Variable Outer_Join_Result ::=
Local tmp1 := Data1[.Row=InnerJoinRow[.Pair=1]] [.Col=Outer_join_col, defVal:null]; Local tmp2 := Data2[.Row=InnerJoinRow[.Pair=2]] [.Col=Outer_join_col, defVal:null]; Max([tmp1,tmp2])
- →
Outer_Join_Col →
| ||||
---|---|---|---|---|
.Row ↓
|
Country | Fruit | Edition | 'Value' |
1 | France | Grape | Professional | 11.27 |
2 | France | Watermelon | Professional | 11.27 |
3 | Canada | Plum | Null | 10.44 |
4 | Germany | Kiwi | Null | 5.862 |
5 | Germany | Orange | Null | 7.533 |
6 | Germany | Plum | Null | 6.19 |
7 | Hungary | Honeydew | Free 101 | 9.441 |
8 | Hungary | Orange | Free 101 | 9.441 |
9 | Jamaica | Honeydew | Cubeplan | 8.054 |
10 | Jamaica | Honeydew | Optimizer | 9.717 |
11 | Denmark | Watermelon | Null | 7.611 |
12 | England | Null | Professional | 7.578 |
Other ways to do a join
The most common type of join is a Left join over a single column (key). You can do this with the Slice or Subscript operator. For example, suppose Zip_of_person
is indexed by Person
and contains the zip code of each person, and County_by_zip
is indexed by Zip
. Then County_by_zip[Zip=Zip_of_person]
returns the county for each person. This is logically a left join, but a far simpler and more natural method.
When you want to join over multiple columns, TableJoin is really the only easy way to do it.
Enable comment auto-refresher