Difference between revisions of "MdTable"

(Changed an instance in an example of "conglomeration" to "type" to match renaming.)
 
(31 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
[[category:Array Flattening Functions]]
 
[[category:Array Flattening Functions]]
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
+
{{ReleaseBar}}
[[What's new in Analytica 4.0?]] >
 
  
= MdTable(T, Rows, Cols,''Vars, conglomerationFn, defaultValue, valueColumn'') =
+
== MdTable(t, rows, cols,''vars, {{Release||6.2|conglomerationFn}}{{Release|6.3||type}}, defaultValue, valueColumn'') ==
  
In the default case, it converts a 2D relational table, T, indexed by Rows and Cols, into an N-dimensional array that sums the values in the last column of T for all rows whose coordinates correspond to the index values of the result. All except the last column in Cols of T are interpreted as coordinates. Vars should be a list of identifiers or handles of the Indexes to be used in the result. MDTable is analogous to PivotTable in Excel. It is the inverse of MDArrayToTable().
+
[[MdTable]] converts a 2D relational table, «t», indexed by «rows» and «cols», into an ''N''-dimensional array result. It is analogous to PivotTable in Excel. It is the inverse of [[MdArrayToTable]]().
  
MdTable interprets the first ''N'' columns (along the Cols index) as containing a coordinate in the final multi-dimensional array, and last remaining ''M'' columns as containing values (in OLAP terminology, ''measures''), where ''N+M = Size(Cols)''.  In the most common usage, ''M=1'' and ''N=Size(Cols-1)'', meaning that all columns except the last specify the coordinate, and the last column contains the cell value. In this case, the ''Vars'' and ''valueColumn'' parameters are optional. If the ''Vars'' parameter is specified, its length determines how many columns are used for coordinates, and if its length is less than Size(Cols)-1, then the valueColumn parameter must be specified to indicate which column contains the cell value.
+
Suppose «cols» has '''N''' elements. In the standard case, the first '''N-1''' elements contain the identifier of an index (or a [[Handle]] to an index), and the last column contains a numerical value  (a "measure" in database terminology). It returns an array with '''N-1''' dimensions, corresponding to the '''N-1''' indexes. Each cell contains the sum  (or other «{{Release||6.2|conglomerationFn}}{{Release|6.3||type}}») over the numerical values for all rows of «t» whose values match the index values in first '''N-1''' «cols». If the index values of a cell in the result array match no row in «t», the result cell contains «defaultValue» (default NULL).
 +
<center>
 +
{| border="0"
 +
|[[image:MdTable coordinate columns.png]]
 +
| <div style="font-size:40pt;padding-top:50px">&#x21d2;</div> ||
 +
{| class="wikitable" style="margin-top:100px"
 +
! !! colspan="3" | Mpg &#9654;
 +
|-
 +
! Car_type &#9660; !! 26 !! 30 !! 35
 +
|-
 +
!VW
 +
|2185 ||1705 ||<span style="color:lightgray">Null</span>
 +
|-
 +
!Honda
 +
|2330 ||<span style="color:lightgray">Null</span> ||2210
 +
|-
 +
!BMW
 +
| <span style="color:lightgray">Null</span> ||2955 || 2835
 +
|}
 +
|-
 +
| colspan="3" | A relational table on the left is the input to [[MdTable]]. The array on the right is the result.
 +
|}
 +
</center>
  
Optional parameters:
 
  
  
 +
If the values in the first '''N-1''' «cols» are identifiers of or handles to the desired indexes, you can specify the list of identifers or [[handle]]s to indexes in the optional parameters «vars».  You may {{Release||6.2|need}}{{Release|6.3||want}} to create indexes for some or all of them. You can use the [[Unique]] function to define each index as the unique set of values from the corresponding column.{{Release||6.2|  Or you can save that effort by using [[Smart_MdTable]] from the [[Smart_MdTable library]], which automatically converting each of the first '''N-1''' «cols» into a local index with the unique values from the corresponding column.}} {{Release|6.3|| Or, you can let [[MdTable]] automatically create [[Local Indexes]] for any or all of these columns by specifying a textual name for the new index that starts with a dot (for example, <code>'.I'</code>). The dot prefix avoids possibly identifier collisions with existing or future global objects. The local index is set to the sorted unique values that appear in the corresponding column. Yet another option is to allow [[MdTable]] to define a global index for you corresponding to a given column. To do this, create the global index and set its definition to <code>ComputedBy(X}</code>, where <code>X</code> is the name of the variable containing the call to [[MdTable]]. If the call to [[MdTable]] is within a [[User-Defined Function]], then <code>X</code> would be the variable that calls the UDF. When using this configuration, the «cols» element cannot be a [[Handle]] to the index (which introduces a disallowed dependency loop). When evaluated, [[MdTable]] will set the global index's [[IndexValue]] to the sorted unique values that occur in the corresponding column.}}
  
  parameter, and thus the ability to have ''M>1'', is new to 4.0.
+
[[MdTable]] can also handle a relational table that has multiple values ("measure" columns) -- a "fact table" in database terminology. In this case, last  '''M''' elements of the «cols» index contain these measures. And the first '''N-M''' columns contain the coordinates of the indexes.  You specify «valueColumn» parameter as the index over  last  ''M'' value columns. Then the resulting array has  '''N-M+1''' indexes -- corresponding to the first '''N-M''' columns plus the «valueColumn» index.
  
The dimensions of the final result are given by the optional parameter ''Vars''. ''Vars'' must be a list of index identifiers or index names.  (Either identifiers, [[handle]]s, or strings).  If an element contains an index name as a string, the index must be a global index (i.e,. not a [[Local Indexes|local index]]).
 
  
As an alternative to using the ''Vars'' parameter, you can specify the index names or identifiers for the final array as the first ''Size(Col)-1'' elements of the ''Col' index.  In this case, the ''Vars'' parameter does not need to be specified.
+
==Optional parameters==
 +
=== vars===
  
In OLAP terminology, a fact table is a table in which the first N columns specify coordinates in a multi-dimensional cube, and the last M columns specify ''measures'' along a ''measure dimension'', so that the final multi-dimensional array has ''N+1'' dimensions, the extra dimension being the measure dimension. This transformation is accomplished by providing the measure dimension in the valueColumn parameter. The valueColumn parameter should then be a 1-D array, indexed by your measure dimension, where the array elements identify columns in the Cols index.
+
If «Cols» already contains identifiers of (or handles to) the Index variables containing the unique values in each column, you can omit «vars». Otherwise, you should specify «vars» as a list of the identifiers or [[Handle]]s to these indexes. You don't need to include the last column(s) in «vars», which it assumes contains the values.  Each index in «vars» becomes an index of the result.  If you want to refer to an existing local index, you must use a [[Handle]], because text identifiers can only refer to a global variable. It's safer to use [[Handle]]s for all the Indexes, so that the model won't break if someone changes the identifier of one of the Indexes. {{Release|6.3||If you provide a textual index name, but the index does not exist, [[MdTable]] will create a local index with that name automatically.
  
Before using ''MDTable'', you must define all of the indexes for the result. Each index must include all values that occur in the corresponding column of ''T'' or an error will result. The [[Unique]]() function is useful for defining the necessary indexes.
+
{{Release|6.3||Instead a List, you can also pass a 1-D array indexed by «Cols» to the «vars» parameter. When «vars» is a List, it contains only the non-value columns. When «vars» is indexed by «Cols», it contains an element for every column, even the value column(s). The element(s) of «vars» that correspond to the value column are ignored.}}
  
It is possible that two or more rows of ''T'' specify identical coordinates. In this case, a conglomeration function is used combine the values for the given cell. The conglomFn parameter is a text value specifying which conglomeration function is to be used. Some possible values are: "sum" (default), "min", "max", "average", or "product". You can also create your own custom conglomeration function.
+
Each element of «vars» identifies the corresponding index in one of the following ways:
 +
* A [[Handle]] to a global index which has been suitably defined.
 +
* A textual name of a global index which has been suitably defined.
 +
* A [[Handle]] or textual name of a global index that is defined as <code>[[ComputedBy]](X)</code>, where <code>X</code> is the name of the variable that calls [[MdTable]]. In this case, [[MdTable]] sets the index values to the sorted unique values that appear in the corresponding column.
 +
* A textual name that starts with a dot, such as <code>'.Year_acquired'</code>. [[MdTable]] will create a new local index with that name containing the sorted unique values that appear in the corresponding column. If the text includes spaces or other non-identifier characters, a legal identifier is generated (e.g., by replacing disallowed characters with <code>'_'</code>) and the title is set to the text (without the initial dot).
 +
* [[Null]] -- which ignores the corresponding column during the transformation.
 +
}}.
  
conglomerationFn: (Default: Sum) Specifies the function used to aggregate over values from the rows in T that map to each cell in the result. Common options are Max, Min, Average, Count. But you can specify the identifier (as text) of any reducing function that operates over an index, with [[Function Parameter Qualifiers|parameters]] of the form:  (A: Array[I]; I: Index).  It is OK if it has other parameters as long as they are optional.  You can use a user-defined function of this form.
+
{{Release|6.3||When use an existing global index, it should }}{{Release||6.2|You should make sure that all these Indexes}} have as values all the unique values from the corresponding column in «T», for example:
  
defaultVal: (Default: NULL) The value of a result cell that has no corresponding rows in T. It is often a good idea to set the default to 0.
+
:<code>Index Cols := ['A', 'B', 'Value']</code>
 +
:<code>Index Rows := 1..100</code>
 +
:<code>Variable T := [[Table]](Cols, Rows)(.....)</code>
 +
:<code>Index A := [[Unique]](T[Cols = 'A'], Rows)</code>
 +
:<code>Index B := [[Unique]](T[Cols = 'B'], Rows)</code>
 +
:<code>Variable Result := [[MdTable]](T, Rows, Cols)</code>
  
valueColumn: An index with n values equal to the last n values of Cols. In this case, Vars should identify as indexes only the first m-n elements of Cols -- where m=Size(Cols). The remaining Cols are mapped into valueColumn. The result will also be indexed by valueColumn if specified, and each slice in the result contains the values aggregated over the corresponding Col in T. In this case, we can call in the parameter T a "Fact table", which shows more than one value for each set of Vars.
+
It's safer to use «vars» using handles rather than the text identifiers of each index:
 +
:<code>INDEX Indexes := ListOfHandles(A, B) </code>
 +
:<code>Variable Result := MDArray(T, Rows, Cols, Indexes)</code>
 +
In this case, the model will not break if someone changes the Identifier of <code>A</code> or <code>B</code>, since these automatically propagate through <code>Indexes</code>.
 +
{{Release|6.3||
  
= Library =
+
If you set an element in «vars» to [[Null]], then the column corresponding to that position will not be mapped to an index. This is equivalent to first re-indexing your data to a new Column index without the column and then applying [[MdTable]] to that.}}
 +
==={{Release||6.2|conglomerationFn}}{{Release|6.3||type}}===
  
Array
+
The «{{Release||6.2|conglomerationFn}}{{Release|6.3||type}}» function combines the values when two or more rows of «T» have the same coordinates (values in their index columns). The default is  "[[Sum|sum]]". You can specify other options as a text identifier or handle, including:  <code>"[[Min|min]]"</code>, <code>"[[Max|max]]"</code>, <code>"[[Average|average]]" </code>, <code>"count"</code>, <code>"[[Product|product]]"</code>, <code>"First"</code>, <code>"Last"</code> or any [[Array-reducing functions]] that operates over an index, with [[Function Parameter Qualifiers|parameters]] of the form:  <code>(A: Array[I]; I: Index)</code>.  It is OK if it has other parameters as long as they are optional. 
  
= Examples =
+
You can also create your own custom {{Release||6.2|conglomerationFn}}{{Release|6.3||type}} function as a [[UDF]].
  
[[Media:MdTable_Example.ana|Download model with examples]] (requires Analytica 4.0)
+
The <code>"count"</code> method ignores the contents of the value column(s), so is equivalent to <code>"Sum"</code> when the value column is one everywhere. {{Release|6.3||To apply <code>"count"</code> without any value column, specify the parameter <code>valueColumn:Null</code>.}}  Unless you explicitly specify the default value, <code>"count"</code> uses 0, whereas other methods default to Null.
  
Suppose ''T'', ''Rows'', and ''Cols'' are defined as indicated by the following
+
===defaultVal===
table:
+
(Default: NULL) The value of a result cell that has no corresponding rows in «T». It is often a good idea to set the default to 0.
  
{| border="1"
+
===valueColumn===
! !! colspan="3" | Cols &rarr;
+
 
 +
In OLAP terminology, a '''fact table''' is a table in which the first '''N-M''' columns specify coordinates in a multi-dimensional cube, and the last '''M''' columns specify measures along a ''measure dimension''. Each row has multiple values across this measure dimension. In this case,  [[MdTable]]() generates a multi-dimensional array with '''N-M+1''' dimensions, the extra dimension being the measure dimension.  You specify the measure dimension in the «valueColumn» parameter, usually as an Index, but it can be a 1-D array, indexed by your measure index.  The  size of the «valueColumn» index must be ''M''. In this case, «vars» identifies as indexes only the first '''N-M''' elements of «Cols».  It maps the remaining '''M''' «Cols» into «valueColumn», which is also an index of the result.
 +
 
 +
== Library ==
 +
Array / Dimensional transformations
 +
 
 +
== Examples ==
 +
[[Media:MdTable_Example.ana|Download model with examples]]
 +
 
 +
Suppose <code>T</code>, <code>Rows</code>, and <code>Cols</code> are defined like this:
 +
 
 +
:{| class="wikitable"
 +
! !! colspan="3" | Cols &#9654;
 
|-
 
|-
! Rows &darr; !! Car_type !! Mpg !! X
+
! Rows &#9660; !! Car_type !! Mpg !! X
 
|-
 
|-
 
! 1  
 
! 1  
Line 70: Line 119:
 
|}
 
|}
  
MDTable(T,Rows,Cols,[Car_type,Mpg],'average','n/a') -->
+
The first example {{Release||6.2|requires [[Analytica 6.3]] or greater. It}} automatically creates local indexes for all of the result indexes for you.
 +
:<code>[[MdTable]](T, Rows, Cols, '.' & Cols, 'average')&rarr;</code>
 +
:{| class="wikitable"
 +
! !! colspan="3" | .Mpg &#9654;
 +
|-
 +
!  .Car_type &#9660; !! 26 !! 30 !! 35
 +
|-
 +
!BMW
 +
| <span style="color:lightgray">Null</span> ||2955 || 2835
 +
|-
 +
!Honda
 +
|2330 ||<span style="color:lightgray">Null</span> ||2210
 +
|-
 +
!VW
 +
|2185 ||1705 || <span style="color:lightgray">Null</span>
 +
|}
  
{| border="1"
+
This next example assumes you've created indexes <code>Car_type</code> and <code>Mpg</code> in advance. Since you created them yourself, you have control over the ordering of the index elements.
! !! colspan="3" | Mpg &rarr;
+
:<code>[[MdTable]](T, Rows, Cols, [Car_type, Mpg], 'average', 'n/a') &rarr;</code>
 +
:{| class="wikitable"
 +
! !! colspan="3" | Mpg &#9654;
 
|-
 
|-
! Car_type &darr; !! 26 !! 30 !! 35
+
! Car_type &#9660; !! 26 !! 30 !! 35
 
|-
 
|-
 
!VW  
 
!VW  
Line 87: Line 153:
 
|}
 
|}
  
Notice that in the example, Rows 6 and 7 both specified values
+
Cells with no corresponding rows in <code>T</code> containing <code>n/a</code>.  Rows 6 and 7 in <code>T</code> both specify values for <code>Car_type = BMW, Mpg = 35</code>,  which are combined by the "average" «{{Release||6.2|conglomerationFn}}{{Release|6.3||type}}» function.
for ''Car_type=BMW, Mpg=35''. The "average" conglomeration function
 
was used to combine these.
 
  
Now suppose we have a User-Defined Function, First:
+
You can also create global indexes for the result, but let it set the index values for you (requires [[Analytica 6.3]] or later).
  Function First( A : Array[I] ; I : Index ) := A[@I=1]
+
:<code>Index Mpg := ComputedBy(T)</code>
 +
:<code>Index Car_type := ComputedBy(T)</code>
 +
:<code>[[MdTable]](T, Rows, Cols, [Car_type, Mpg], 'average') &rarr;</code>
 +
:{| class="wikitable"
 +
! !! colspan="3" |.Mpg &#9654;
 +
|-
 +
!  Car_type &#9660; !! 26 !! 30 !! 35
 +
|-
 +
!BMW
 +
| <span style="color:lightgray">Null</span> ||2955 || 2835
 +
|-
 +
!Honda
 +
|2330 ||<span style="color:lightgray">Null</span> ||2210
 +
|-
 +
!VW
 +
|2185 ||1705 || <span style="color:lightgray">Null</span>
 +
|}
  
Then
+
[[MdTable]] can also work with a User-Defined function for «{{Release||6.2|conglomerationFn}}{{Release|6.3||type}}», provided it is an [[Array-reducing functions]] that operates over an index. Suppose we define this Function <code>First</code> that returns the first element of an array over an index:
MdTable(T,Rows,Cols,[Car_type,Mpg],'First','n/a') -->
+
:<code>Function First( A : Array[I] ; I : Index ) := A[@I = 1]</code>
  
{| border="1"
+
We can then use it as the {{Release||6.2|conglomerationFn}}{{Release|6.3||type}} function:
! !! colspan="3" | Mpg &rarr;
+
:<code>MdTable(T, Rows, Cols, [Car_type, Mpg], 'First', 'n/a') &rarr;</code>
 +
:{| class="wikitable"
 +
! !! colspan="3" | Mpg &#9654;
 
|-
 
|-
! Car_type &darr; !! 26 !! 30 !! 35
+
! Car_type &#9660; !! 26 !! 30 !! 35
 
|-
 
|-
 
!VW  
 
!VW  
Line 113: Line 195:
 
|}
 
|}
  
To aggregate X over Car_type, we can use the valueColumn.  Here we will aggregate using Sum.
+
To aggregate <code>X</code> over <code>Car_type</code>, we can use the «valueColumn».  Here we will aggregate using [[Sum]].
MdTable(T,Rows,Cols,[Car_type],valueColumn:'X') &rarr;
+
:<code>MdTable(T, Rows, Cols, [Car_type], valueColumn: 'X') &rarr;</code>
 
+
:{| class="wikitable"
{| border="1"
+
! Car_Type &#9660; ||
! Car_Type &darr;
 
 
|-
 
|-
 
! VW  
 
! VW  
Line 128: Line 209:
 
| 8625
 
| 8625
 
|}
 
|}
 +
{{Release|6.3||An equivalent way to do the same is <code>MdTable(T, Rows, Cols, [Car_type,Null] )</code>}}
  
In the previous example, ''Car_type'' is the first column.  When you aggregate in this fashion, the target aggregation index must be the first column.  If you wanted to aggregate onto ''Mpg'' (summing all records with the same ''Mpg'') then you would need to re-index first to make ''Mpg'' the first column like this:
+
{{Release||6.2|In the previous example, <code>Car_type</code> is the first column.  When you aggregate in this fashion, the target aggregation index must be the first column.  If you wanted to aggregate onto <code>Mpg</code> (summing all records with the same <code>Mpg</code>) then you would need to re-index first to make <code>Mpg</code> the first column like this:
Index L := ['Mpg','X'];
+
:<code>Index L :{{=}} ['Mpg', 'X'];</code>
MdTable(T[Cols=L],Rows,L,[mpg],valueColumn:'X') &rarr;
+
:<code>MdTable(T[Cols {{=}} L], Rows, L, [mpg], valueColumn: 'X') &rarr;</code>}}
{| border="1"
+
{{Release|6.3||To aggregate over a different index or indexes, use Null in the «vars» parameter to indicate which indexes to not include in the result. Here we sum out the first column, <code>Car_type</code>.
! Mpg &darr;
+
:<code>MdTable(T, Rows, Cols, [Null, Mpg] ) &rarr;</code>}}
 +
:{| class="wikitable"
 +
! Mpg &#9660; ||
 
|-
 
|-
 
! 26
 
! 26
Line 145: Line 229:
 
|}
 
|}
  
If ''T'' had 6 coordinate columns and you wanted to aggregate onto 3 dimensions only, then you'd need to make sure that the three final dimensions were in the first three columns.  If they were not there initially, then you'd reindex as in the previous example.  If you are aggregating only a single target dimension, the [[Aggregate]] function can also be used and may be more intuitive.  [[MdTable]] is actually more general since you can aggregate onto a multi-dimensional table.
+
If <code>T</code> had 6 coordinate columns and you wanted to aggregate onto 3 dimensions only, then you'd need to make sure that the three final dimensions were in the first three columns.  If they were not there initially, then you'd reindex as in the previous example.  If you are aggregating only a single target dimension, the [[Aggregate]] function can also be used and may be more intuitive.  [[MdTable]] is actually more general since you can aggregate onto a multi-dimensional table.
  
To use both Mpg and X as value columns, we can define a ''measure'' dimension:
+
To use both <code>Mpg</code> and <code>X</code> as value columns, we can define a ''measure'' dimension:
Index Measure_Index := ['Mpg','X']
+
:<code>Index Measure_Index := ['Mpg', 'X']</code>
  
 
Then
 
Then
MdTable(T,Rows,Cols,[Car_type],valueColumn:Measure_Index) &rarr;
+
:<code>MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index) &rarr;</code>
 
+
:{| class="wikitable"
{| border="1"
+
! !! colspan="2" | MeasureIndex &#9654;
! !! colspan="2" | MeasureIndex &rarr;
 
 
|-
 
|-
! Car_Type &darr; !! Mpg !! X
+
! Car_Type &#9660; !! Mpg !! X
 
|-
 
|-
 
! VW  
 
! VW  
Line 168: Line 251:
 
|}
 
|}
  
Notice here that both Mpg and X have been summed -- both values used the same conglomeration function.  However, suppose we want the average value for Mpg, but the maximum value for X, i.e., each "measure" having its own conglomeration.  We can accomplish this using:
+
Notice here that both <code>Mpg</code> and <code>X</code> have been summed -- both values used the same {{Release|6.3||«type»}}{{Release||6.2| conglomeration function}}.  However, suppose we want the average value for <code>Mpg</code>, but the maximum value for <code>X</code>, i.e., each "measure" having its own {{Release|6.3||«type»}}{{Release||6.2| conglomeration function}}.  We can accomplish this using:
  
MdTable(T,Rows,Cols,[Car_type],valueColumn:Measure_Index,
+
:<code>MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index,</code>
        conglomerationFn: Array(Measure_Index,["average","max"]) ) &rarr;
+
::<code>{{Release||6.2|conglomerationFn}}{{Release|6.3||type}}: Array(Measure_Index, ["average", "max"])) &rarr;</code>
 
+
:{| class="wikitable"
{| border="1"
+
! !! colspan="2" | MeasureIndex &#9654;
! !! colspan="2" | MeasureIndex &rarr;
 
 
|-
 
|-
! Car_Type &darr; !! Mpg !! X
+
! Car_Type &#9660; !! Mpg !! X
 
|-
 
|-
 
! VW  
 
! VW  
Line 188: Line 270:
 
|}
 
|}
  
= What's new in 4.0 =
+
==Fact Table==
 +
In order to convert a 2D relational table with more than one value per combination of indexes, you would use the parameter «valueColumn» to create a Fact Table. For example, suppose <code>T</code>, <code>Rows</code>, and <code>Cols</code> are defined as indicated by the following table:
 +
 
 +
:{| class="wikitable"
 +
! !! colspan="4" | Cols &#9654;
 +
|-
 +
! Rows &#9660; !! Car_type !! Mpg !! X !! Y
 +
|-
 +
! 1
 +
| VW || 26 || 2185 || 1
 +
|-
 +
! 2
 +
| VW || 30 || 1705 || 2
 +
|-
 +
!3
 +
|Honda ||26 ||2330 || 3
 +
|-
 +
!4
 +
|Honda ||35 ||2210 || 3
 +
|-
 +
!5
 +
|BMW ||30 ||2955 || 4
 +
|-
 +
!6
 +
|BMW ||35 ||2800 || 5
 +
|-
 +
!7
 +
|BMW ||35 ||2870 || 5
 +
|}
 +
 
 +
And suppose <code>Fact</code> is an index defined as <code>['X', 'Y']</code>. Therefore:
 +
 
 +
:<code>MDTable(T, Rows, Cols, [Car_type, Mpg], valueColumn: Fact, defaultValue: "n/a") &rarr;</code>
 +
:{| style="width:60%"
 +
! style="width: 20%;"|<center>MPG = 26</center>
 +
! style="width: 20%;"|<center>MPG = 30</center>
 +
! style="width: 20%;"|<center>MPG = 35</center>
 +
|-
 +
| style="padding: 5px;"|
 +
{| class="wikitable"
 +
! !! colspan="2" | Fact &#9654;
 +
|-
 +
! Car_type &#9660; !! X !! Y
 +
|-
 +
!VW
 +
|2185 ||1
 +
|-
 +
!Honda
 +
|2330 ||3
 +
|-
 +
!BMW
 +
|n/a || n/a
 +
|}
 +
 
 +
| style="padding: 5px;"|
 +
{| class="wikitable"
 +
! !! colspan="2" | Fact &#9654;
 +
|-
 +
! Car_type &#9660; !! X !! Y
 +
|-
 +
!VW
 +
|1705 ||2
 +
|-
 +
!Honda
 +
|n/a || n/a
 +
|-
 +
!BMW
 +
|1955 || 4
 +
|}
 +
 
 +
| style="padding: 5px;"|
 +
{| class="wikitable"
 +
! !! colspan="2" | Fact &#9654;
 +
|-
 +
! Car_type &#9660; !! X !! Y
 +
|-
 +
!VW
 +
|n/a || n/a
 +
|-
 +
!Honda
 +
|2210|| 3
 +
|-
 +
!BMW
 +
|5670 || 10
 +
|}
 +
 
 +
|}
 +
 
 +
Notice that in the example, Rows 6 and 7 both specified values for <code>Car_type = BMW, Mpg=35</code>. By default the [[Sum]] {{Release||6.2|conglomerationFn}}{{Release|6.3||type}} function was used to combine these.
  
MdTable contains several enhancements that are new to Analytica 4.0:
+
==History==
* The valueColumn parameter.  Previously, all ''Size(Cols)-1'' columns has to be used as a coordinate, and the value had to be in the last column.
+
Introduced in [[What's new in Analytica 4.0?|Analytica 4.0]].
* Better support for [[VarTerm]]s in the Vars parameter. This is particular useful when you want the result to use local indexes.
 
* Support for named parameters, some changes to the names (ConglomFn is now ConglomerationFn, missingVal is now defaultValue).
 
* Support for custom aggregation functions.
 
* DefaultValue for missing cells is [[Null]] when not specified explicitly (formerly it was [[Undefined]]).
 
  
= See Also =
+
Some conveniences added in [[Analytica 6.3]]:
 +
* Automatic setting of global result indexes when they are defined with [[ComputedBy]].
 +
* Auto-creation of local result indexes then given a textual index name starting with a dot (<code>'.'</code>)
 +
* Ignore columns when the corresponding entry in «vars» is Null.
 +
* Ability to use a 'Count' agglomeration without any value column. Done by setting «valueColumn» to Null, or by listing all indexes in a list to «vars».
 +
* «vars» can be indexed by «Cols». In this case, «valueColumn» can be any column or 1-D array of columns (i.e., the index columns don't have to come first).
 +
* The preferred name for the 5th parameter was changed from «conglomerationFn» to «type».
  
 +
== See Also ==
 +
* [[MdArrayToTable]] inverse of MdTable
 +
* [[Smart_MdTable library]]
 +
* [[Relational tables and multiD arrays]]
 +
* [[Unflatten]]
 
* [[Unique]]
 
* [[Unique]]
* [[MdArrayToTable]]
+
* [[Aggregate]]

Latest revision as of 17:19, 10 March 2023



Release:

4.6  •  5.0  •  5.1  •  5.2  •  5.3  •  5.4  •  6.0  •  6.1  •  6.2  •  6.3  •  6.4  •  6.5


MdTable(t, rows, cols,vars, type, defaultValue, valueColumn)

MdTable converts a 2D relational table, «t», indexed by «rows» and «cols», into an N-dimensional array result. It is analogous to PivotTable in Excel. It is the inverse of MdArrayToTable().

Suppose «cols» has N elements. In the standard case, the first N-1 elements contain the identifier of an index (or a Handle to an index), and the last column contains a numerical value (a "measure" in database terminology). It returns an array with N-1 dimensions, corresponding to the N-1 indexes. Each cell contains the sum (or other «type») over the numerical values for all rows of «t» whose values match the index values in first N-1 «cols». If the index values of a cell in the result array match no row in «t», the result cell contains «defaultValue» (default NULL).

MdTable coordinate columns.png
Mpg ▶
Car_type ▼ 26 30 35
VW 2185 1705 Null
Honda 2330 Null 2210
BMW Null 2955 2835
A relational table on the left is the input to MdTable. The array on the right is the result.


If the values in the first N-1 «cols» are identifiers of or handles to the desired indexes, you can specify the list of identifers or handles to indexes in the optional parameters «vars». You may want to create indexes for some or all of them. You can use the Unique function to define each index as the unique set of values from the corresponding column. Or, you can let MdTable automatically create Local Indexes for any or all of these columns by specifying a textual name for the new index that starts with a dot (for example, '.I'). The dot prefix avoids possibly identifier collisions with existing or future global objects. The local index is set to the sorted unique values that appear in the corresponding column. Yet another option is to allow MdTable to define a global index for you corresponding to a given column. To do this, create the global index and set its definition to ComputedBy(X}, where X is the name of the variable containing the call to MdTable. If the call to MdTable is within a User-Defined Function, then X would be the variable that calls the UDF. When using this configuration, the «cols» element cannot be a Handle to the index (which introduces a disallowed dependency loop). When evaluated, MdTable will set the global index's IndexValue to the sorted unique values that occur in the corresponding column.

MdTable can also handle a relational table that has multiple values ("measure" columns) -- a "fact table" in database terminology. In this case, last M elements of the «cols» index contain these measures. And the first N-M columns contain the coordinates of the indexes. You specify «valueColumn» parameter as the index over last M value columns. Then the resulting array has N-M+1 indexes -- corresponding to the first N-M columns plus the «valueColumn» index.


Optional parameters

vars

If «Cols» already contains identifiers of (or handles to) the Index variables containing the unique values in each column, you can omit «vars». Otherwise, you should specify «vars» as a list of the identifiers or Handles to these indexes. You don't need to include the last column(s) in «vars», which it assumes contains the values. Each index in «vars» becomes an index of the result. If you want to refer to an existing local index, you must use a Handle, because text identifiers can only refer to a global variable. It's safer to use Handles for all the Indexes, so that the model won't break if someone changes the identifier of one of the Indexes. If you provide a textual index name, but the index does not exist, MdTable will create a local index with that name automatically.

Instead a List, you can also pass a 1-D array indexed by «Cols» to the «vars» parameter. When «vars» is a List, it contains only the non-value columns. When «vars» is indexed by «Cols», it contains an element for every column, even the value column(s). The element(s) of «vars» that correspond to the value column are ignored.

Each element of «vars» identifies the corresponding index in one of the following ways:

  • A Handle to a global index which has been suitably defined.
  • A textual name of a global index which has been suitably defined.
  • A Handle or textual name of a global index that is defined as ComputedBy(X), where X is the name of the variable that calls MdTable. In this case, MdTable sets the index values to the sorted unique values that appear in the corresponding column.
  • A textual name that starts with a dot, such as '.Year_acquired'. MdTable will create a new local index with that name containing the sorted unique values that appear in the corresponding column. If the text includes spaces or other non-identifier characters, a legal identifier is generated (e.g., by replacing disallowed characters with '_') and the title is set to the text (without the initial dot).
  • Null -- which ignores the corresponding column during the transformation..

When use an existing global index, it should have as values all the unique values from the corresponding column in «T», for example:

Index Cols := ['A', 'B', 'Value']
Index Rows := 1..100
Variable T := Table(Cols, Rows)(.....)
Index A := Unique(T[Cols = 'A'], Rows)
Index B := Unique(T[Cols = 'B'], Rows)
Variable Result := MdTable(T, Rows, Cols)

It's safer to use «vars» using handles rather than the text identifiers of each index:

INDEX Indexes := ListOfHandles(A, B)
Variable Result := MDArray(T, Rows, Cols, Indexes)

In this case, the model will not break if someone changes the Identifier of A or B, since these automatically propagate through Indexes. If you set an element in «vars» to Null, then the column corresponding to that position will not be mapped to an index. This is equivalent to first re-indexing your data to a new Column index without the column and then applying MdTable to that.

type

The «type» function combines the values when two or more rows of «T» have the same coordinates (values in their index columns). The default is "sum". You can specify other options as a text identifier or handle, including: "min", "max", "average" , "count", "product", "First", "Last" or any Array-reducing functions that operates over an index, with parameters of the form: (A: Array[I]; I: Index). It is OK if it has other parameters as long as they are optional.

You can also create your own custom type function as a UDF.

The "count" method ignores the contents of the value column(s), so is equivalent to "Sum" when the value column is one everywhere. To apply "count" without any value column, specify the parameter valueColumn:Null. Unless you explicitly specify the default value, "count" uses 0, whereas other methods default to Null.

defaultVal

(Default: NULL) The value of a result cell that has no corresponding rows in «T». It is often a good idea to set the default to 0.

valueColumn

In OLAP terminology, a fact table is a table in which the first N-M columns specify coordinates in a multi-dimensional cube, and the last M columns specify measures along a measure dimension. Each row has multiple values across this measure dimension. In this case, MdTable() generates a multi-dimensional array with N-M+1 dimensions, the extra dimension being the measure dimension. You specify the measure dimension in the «valueColumn» parameter, usually as an Index, but it can be a 1-D array, indexed by your measure index. The size of the «valueColumn» index must be M. In this case, «vars» identifies as indexes only the first N-M elements of «Cols». It maps the remaining M «Cols» into «valueColumn», which is also an index of the result.

Library

Array / Dimensional transformations

Examples

Download model with examples

Suppose T, Rows, and Cols are defined like this:

Cols ▶
Rows ▼ Car_type Mpg X
1 VW 26 2185
2 VW 30 1705
3 Honda 26 2330
4 Honda 35 2210
5 BMW 30 2955
6 BMW 35 2800
7 BMW 35 2870

The first example automatically creates local indexes for all of the result indexes for you.

MdTable(T, Rows, Cols, '.' & Cols, 'average')→
.Mpg ▶
.Car_type ▼ 26 30 35
BMW Null 2955 2835
Honda 2330 Null 2210
VW 2185 1705 Null

This next example assumes you've created indexes Car_type and Mpg in advance. Since you created them yourself, you have control over the ordering of the index elements.

MdTable(T, Rows, Cols, [Car_type, Mpg], 'average', 'n/a') →
Mpg ▶
Car_type ▼ 26 30 35
VW 2185 1705 n/a
Honda 2330 n/a 2210
BMW n/a 2955 2835

Cells with no corresponding rows in T containing n/a. Rows 6 and 7 in T both specify values for Car_type = BMW, Mpg = 35, which are combined by the "average" «type» function.

You can also create global indexes for the result, but let it set the index values for you (requires Analytica 6.3 or later).

Index Mpg := ComputedBy(T)
Index Car_type := ComputedBy(T)
MdTable(T, Rows, Cols, [Car_type, Mpg], 'average') →
.Mpg ▶
Car_type ▼ 26 30 35
BMW Null 2955 2835
Honda 2330 Null 2210
VW 2185 1705 Null

MdTable can also work with a User-Defined function for «type», provided it is an Array-reducing functions that operates over an index. Suppose we define this Function First that returns the first element of an array over an index:

Function First( A : Array[I] ; I : Index ) := A[@I = 1]

We can then use it as the type function:

MdTable(T, Rows, Cols, [Car_type, Mpg], 'First', 'n/a') →
Mpg ▶
Car_type ▼ 26 30 35
VW 2185 1705 n/a
Honda 2330 n/a 2210
BMW n/a 2955 2800

To aggregate X over Car_type, we can use the «valueColumn». Here we will aggregate using Sum.

MdTable(T, Rows, Cols, [Car_type], valueColumn: 'X') →
Car_Type ▼
VW 3890
Honda 4540
BMW 8625

An equivalent way to do the same is MdTable(T, Rows, Cols, [Car_type,Null] )


To aggregate over a different index or indexes, use Null in the «vars» parameter to indicate which indexes to not include in the result. Here we sum out the first column, Car_type.

MdTable(T, Rows, Cols, [Null, Mpg] ) →
Mpg ▼
26 4515
30 4660
35 7880

If T had 6 coordinate columns and you wanted to aggregate onto 3 dimensions only, then you'd need to make sure that the three final dimensions were in the first three columns. If they were not there initially, then you'd reindex as in the previous example. If you are aggregating only a single target dimension, the Aggregate function can also be used and may be more intuitive. MdTable is actually more general since you can aggregate onto a multi-dimensional table.

To use both Mpg and X as value columns, we can define a measure dimension:

Index Measure_Index := ['Mpg', 'X']

Then

MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index) →
MeasureIndex ▶
Car_Type ▼ Mpg X
VW 56 3890
Honda 61 4540
BMW 100 8625

Notice here that both Mpg and X have been summed -- both values used the same «type». However, suppose we want the average value for Mpg, but the maximum value for X, i.e., each "measure" having its own «type». We can accomplish this using:

MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index,
type: Array(Measure_Index, ["average", "max"])) →
MeasureIndex ▶
Car_Type ▼ Mpg X
VW 28 2185
Honda 30.5 2330
BMW 33.3 2955

Fact Table

In order to convert a 2D relational table with more than one value per combination of indexes, you would use the parameter «valueColumn» to create a Fact Table. For example, suppose T, Rows, and Cols are defined as indicated by the following table:

Cols ▶
Rows ▼ Car_type Mpg X Y
1 VW 26 2185 1
2 VW 30 1705 2
3 Honda 26 2330 3
4 Honda 35 2210 3
5 BMW 30 2955 4
6 BMW 35 2800 5
7 BMW 35 2870 5

And suppose Fact is an index defined as ['X', 'Y']. Therefore:

MDTable(T, Rows, Cols, [Car_type, Mpg], valueColumn: Fact, defaultValue: "n/a") →
MPG = 26
MPG = 30
MPG = 35
Fact ▶
Car_type ▼ X Y
VW 2185 1
Honda 2330 3
BMW n/a n/a
Fact ▶
Car_type ▼ X Y
VW 1705 2
Honda n/a n/a
BMW 1955 4
Fact ▶
Car_type ▼ X Y
VW n/a n/a
Honda 2210 3
BMW 5670 10

Notice that in the example, Rows 6 and 7 both specified values for Car_type = BMW, Mpg=35. By default the Sum type function was used to combine these.

History

Introduced in Analytica 4.0.

Some conveniences added in Analytica 6.3:

  • Automatic setting of global result indexes when they are defined with ComputedBy.
  • Auto-creation of local result indexes then given a textual index name starting with a dot ('.')
  • Ignore columns when the corresponding entry in «vars» is Null.
  • Ability to use a 'Count' agglomeration without any value column. Done by setting «valueColumn» to Null, or by listing all indexes in a list to «vars».
  • «vars» can be indexed by «Cols». In this case, «valueColumn» can be any column or 1-D array of columns (i.e., the index columns don't have to come first).
  • The preferred name for the 5th parameter was changed from «conglomerationFn» to «type».

See Also

Comments


You are not allowed to post comments.