Difference between revisions of "MdTable"

Revision as of 19:31, 10 July 2018

MdTable(t, rows, cols,vars, conglomerationFn, defaultValue, valueColumn)

MdTable converts a 2D relational table, «t», indexed by «rows» and «cols», into an N-dimensional array result. It is analogous to PivotTable in Excel. It is the inverse of MdArrayToTable().

Suppose «cols» has N elements. In the simple case, the first N+1 contain the name (identifier) of an index (or a Handle to an index), and the last column contains a numerical value (a "measure" in database terminology). It then returns an array with N+1 dimensions, corresponding to the N+1 indexes. Each cell contains the sum (or other «conglomerationFn») over the numerical values for all rows of «t» with the corresponding index values in first N+1 «cols». If «t» has no row corresponding to a cell in the result array, it returns «defaultValue» (default NULL) in that cell.

If «cols» does't contain identifiers of or handles to the desired indexes, you can specify the list of identifers or handles to indexes in the optional parameters «vars».

MdTable can also handle a relational table that has multiple values ("measure" columns) -- a "fact table" in database terminology. In this case, last M elements of the «cols» index contain these measures. And the first N-M columns contain the coordinates of the indexes. You specify «valueColumn» parameter as the index over last M value columns. Then the resulting array has N-M+1 indexes -- corresponding to the first N-M columns plus the «valueColumn» index.

Before using MdTable, you must define all of the indexes needed for the result, each containing a list of the all the unique values that occur in the corresponding column of «T». The Unique() function is useful for defining the necessary indexes.

Optional parameters

vars

If «Cols» already contains identifiers of (or handles to) the Index variables containing the unique values in each column, you can omit «vars». Otherwise, you should specify «vars» as a list of the identifiers or Handles to these indexes. You don't need to include the last column(s) in «vars», which it assumes contains the values. Each index in «vars» becomes an index of the result. If you want to refer to a local index, you must use a Handle, because text identifiers can only refer to a global variable. It's safer to use Handles for all the Indexes, so that the model won't break if someone changes the identifier of one of the Indexes.

You must make sure that all these Indexes have as values all the unique values from the corresponding column in «T», for example:

Index Cols := ['A', 'B', 'Value']

Index Rows := 1..100

Variable T := Table(Cols, Rows)(.....)

Index A := Unique(T[Cols = 'A'], Rows)

Index B := Unique(T[Cols = 'B'], Rows)

Variable Result := MDArray(T, Rows, Cols)

It's safer to use «vars» using handles rather than the text identifiers of each index:

INDEX Indexes := ListOfHandles(A, B)

Variable Result := MDArray(T, Rows, Cols, Indexes)

In this case, the model will not break if someone changes the Identifier of A or B, since these automatically propagate through Indexes.

conglomerationFn

The conglomeration function combines the values when two or more rows of «T» have the same coordinates (values in their index columns). The default is "sum". You can specify other options as a text identifier or handle, including: min", "max", "average", or "count", "product", or any Array-reducing functions that operates over an index, with parameters of the form: (A: Array[I]; I: Index). It is OK if it has other parameters as long as they are optional.

You can also create your own custom conglomeration function as a UDF.

defaultVal

(Default: NULL) The value of a result cell that has no corresponding rows in «T». It is often a good idea to set the default to 0.

valueColumn

In OLAP terminology, a fact table is a table in which the first N-M columns specify coordinates in a multi-dimensional cube, and the last M columns specify measures along a measure dimension. Each row has multiple values across this measure dimension. In this case, MdTable() generates a multi-dimensional array with N-M+1 dimensions, the extra dimension being the measure dimension. You specify the measure dimension in the «valueColumn» parameter, usually as an Index, but it can be a 1-D array, indexed by your measure index. The size of the «valueColumn» index must be M. In this case, «vars» identifies as indexes only the first N-M elements of «Cols». It maps the remaining M «Cols» into «valueColumn», which is also an index of the result.

Library

Array

Examples

Download model with examples

Suppose T, Rows, and Cols are defined like this:

	Cols ▶
Rows ▼	Car_type	Mpg	X
1	VW	26	2185
2	VW	30	1705
3	Honda	26	2330
4	Honda	35	2210
5	BMW	30	2955
6	BMW	35	2800
7	BMW	35	2870

MDTable(T, Rows, Cols, [Car_type, Mpg], 'average', 'n/a') →

	Mpg ▶
Car_type ▼	26	30	35
VW	2185	1705	n/a
Honda	2330	n/a	2210
BMW	n/a	2955	2835

Cells with no corresponding rows in T containing n/a. Rows 6 and 7 in T both specify values for Car_type = BMW, Mpg = 35, which are combined by the "average" conglomeration function.

MDTable can also work with a User-Defined function for conglomeration, provided it is an Array-reducing functions that operates over an index. Suppose we define this Function First that returns the first element of an array over an index:

Function First( A : Array[I] ; I : Index ) := A[@I = 1]

We can then use it as the conglomeration function:

MdTable(T, Rows, Cols, [Car_type, Mpg], 'First', 'n/a') →

	Mpg ▶
Car_type ▼	26	30	35
VW	2185	1705	n/a
Honda	2330	n/a	2210
BMW	n/a	2955	2800

To aggregate X over Car_type, we can use the «valueColumn». Here we will aggregate using Sum.

MdTable(T, Rows, Cols, [Car_type], valueColumn: 'X') →

Car_Type ▼
VW	3890
Honda	4540
BMW	8625

In the previous example, Car_type is the first column. When you aggregate in this fashion, the target aggregation index must be the first column. If you wanted to aggregate onto Mpg (summing all records with the same Mpg) then you would need to re-index first to make Mpg the first column like this:

Index L := ['Mpg', 'X'];

MdTable(T[Cols = L], Rows, L, [mpg], valueColumn: 'X') →

Mpg ▼
26	4515
30	4660
35	7880

If T had 6 coordinate columns and you wanted to aggregate onto 3 dimensions only, then you'd need to make sure that the three final dimensions were in the first three columns. If they were not there initially, then you'd reindex as in the previous example. If you are aggregating only a single target dimension, the Aggregate function can also be used and may be more intuitive. MdTable is actually more general since you can aggregate onto a multi-dimensional table.

To use both Mpg and X as value columns, we can define a measure dimension:

Index Measure_Index := ['Mpg', 'X']

Then

MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index) →

	MeasureIndex ▶
Car_Type ▼	Mpg	X
VW	56	3890
Honda	61	4540
BMW	100	8625

Notice here that both Mpg and X have been summed -- both values used the same conglomeration function. However, suppose we want the average value for Mpg, but the maximum value for X, i.e., each "measure" having its own conglomeration. We can accomplish this using:

MdTable(T, Rows, Cols, [Car_type], valueColumn: Measure_Index,

conglomerationFn: Array(Measure_Index, ["average", "max"])) →

	MeasureIndex ▶
Car_Type ▼	Mpg	X
VW	28	2185
Honda	30.5	2330
BMW	33.3	2955

Fact Table

In order to convert a 2D relational table with more than one value per combination of indexes, you would use the parameter «valueColumn» to create a Fact Table. For example, suppose T, Rows, and Cols are defined as indicated by the following table:

	Cols ▶
Rows ▼	Car_type	Mpg	X	Y
1	VW	26	2185	1
2	VW	30	1705	2
3	Honda	26	2330	3
4	Honda	35	2210	3
5	BMW	30	2955	4
6	BMW	35	2800	5
7	BMW	35	2870	5

And suppose Fact is an index defined as ['X', 'Y']. Therefore:

MDTable(T, Rows, Cols, [Car_type, Mpg], valueColumn: Fact, defaultValue: "n/a") →

MPG = 26

MPG = 30

MPG = 35

	Fact ▶
Car_type ▼	X	Y
VW	2185	1
Honda	2330	3
BMW	n/a	n/a

	Fact ▶
Car_type ▼	X	Y
VW	1705	2
Honda	n/a	n/a
BMW	1955	4

	Fact ▶
Car_type ▼	X	Y
VW	n/a	n/a
Honda	2210	3
BMW	5670	10

Notice that in the example, Rows 6 and 7 both specified values for Car_type = BMW, Mpg=35. By default the Sum conglomeration function was used to combine these.

History

Introduced in Analytica 4.0.

@@ Line 6: / Line 6: @@
 [[MdTable]] converts a 2D relational table, «t», indexed by «rows» and «cols», into an ''N''-dimensional array result. It is analogous to PivotTable in Excel. It is the inverse of [[MdArrayToTable]]().
-It returns an ''N''-dimensional result when «cols» has ''N+1'' elements. The first ''N'' columns  correspond to values of an index in the result. Usually, the last column contains a numerical value (a "measure" in database terminology). Each cell in the resulting array is the sum (or other «conglomerationFn») over the values in the last column for all  rows in «t» with corresponding index values in the preceding columns.  The index «cols» may identify the corresponding index by its text identifier or a handle to the variable. Alternatively, you may specify the name or handle to each index in the optional parameter «vars».
+Suppose «cols» has '''N''' elements. In the simple case, the first '''N+1''' contain the name (identifier) of an index (or a [[Handle]] to an index), and the last column contains a numerical value  (a "measure" in database terminology). It then returns an array with '''N+1''' dimensions, corresponding to the '''N+1''' indexes. Each cell contains the sum  (or other «conglomerationFn») over the numerical values for all rows of «t» with the corresponding index values in first '''N+1''' «cols».  If «t» has no row corresponding to a cell in the result array, it returns «defaultValue» (default NULL) in that cell.
-[[MdTable]] can also handle a 2D relational table with multiple values ("measure" columns) termed a "fact table" in database terminology. In this case, last  ''M'' columns (along the «cols» index) contain these measures. As before the first ''N'' columns contain the coordinates of the indexes.  So  «cols» contains ''N + M'' elements. You specify «valueColumn» parameter as the index over  last  ''M'' value columns. Then the resulting array has  ''N+1'' indexes -- resulting from the first ''N'' columns plus the «valueColumn» index.
+If «cols» does't contain identifiers of or handles to the desired indexes, you can specify the list of identifers or handles to indexes in the optional parameters «vars».
+[[MdTable]] can also handle a relational table that has multiple values ("measure" columns) --  a "fact table" in database terminology. In this case, last  '''M''' elements of the «cols» index contain these measures. And the first '''N-M''' columns contain the coordinates of the indexes.  You specify «valueColumn» parameter as the index over  last  ''M'' value columns. Then the resulting array has  '''N-M+1''' indexes -- corresponding to the first '''N-M''' columns plus the «valueColumn» index.
 Before using [[MdTable]], you must define all of the indexes needed for the result, each containing a list of the all the unique values that occur in the corresponding column of «T». The [[Unique]]() function is useful for defining the necessary indexes.
@@ Line 15: / Line 17: @@
 === vars===
-«vars» is a list of indexes corresponding to the columns «Cols» of the table, except the last column which contains the values.  Each index in «vars» becomes an index of the result.  Each element of «vars» may be the identifier of an index as text or a  [[Handle]] to an index.  [[MdTable]]() assumes that the last column in «Cols» contains the values, there is no need to mention in it the «vars».
+If «Cols» already contains identifiers of (or handles to) the Index variables containing the unique values in each column, you can omit «vars». Otherwise, you should specify «vars» as a list of the identifiers or [[Handle]]s to these indexes. You don't need to include the last column(s) in «vars», which it assumes contains the values.  Each index in «vars» becomes an index of the result.  If you want to refer to a local index, you must use a [[Handle]], because text identifiers can only refer to a global variable. It's safer to use [[Handle]]s for all the Indexes, so that the model won't break if someone changes the identifier of one of the Indexes.
-If «Cols» already contains the names or handles for the corresponding indexes in each column, there is no need to include the «vars» parameter.  If you use a text identifier for the index, it must be a global Index variable.  If you want to refer to a local index, you must use a [[Handle]].
 You must make sure that all these Indexes have as values all the unique values from the corresponding column in «T», for example:
@@ Line 33: / Line 34: @@
 ===conglomerationFn===
-It is possible that two or more rows of «T» specify identical coordinates. In this case, a conglomeration function is used combine the values for the given cell. The «conglomerationFn» parameter is a text value specifying which conglomeration function is to be used. Some possible values are: "[[Sum|sum]]" (default), "[[Min|min]]", "[[Max|max]]", "[[Average|average]]", or "count" or "[[Product|product]]".
-But you can specify the identifier (as text) of any reducing function that operates over an index, with [[Function Parameter Qualifiers|parameters]] of the form:  <code>(A: Array[I]; I: Index)</code>.  It is OK if it has other parameters as long as they are optional.
+The conglomeration function combines the values when two or more rows of «T» have the same coordinates (values in their index columns). The default is  "[[Sum|sum]]". You can specify other options as a text identifier or handle, including:  [[Min|min]]", "[[Max|max]]", "[[Average|average]]", or "count", "[[Product|product]]", or any [[Array-reducing functions]] that operates over an index, with [[Function Parameter Qualifiers|parameters]] of the form:  <code>(A: Array[I]; I: Index)</code>.  It is OK if it has other parameters as long as they are optional.
 You can also create your own custom conglomeration function as a [[UDF]].
@@ Line 44: / Line 44: @@
 ===valueColumn===
-In OLAP terminology, a '''fact table''' is a table in which the first ''N'' columns specify coordinates in a multi-dimensional cube, and the last ''M'' columns specify measures along a ''measure dimension''. Each row has multiple values across this measure dimension. In this case,  [[MdTable]]() generates a multi-dimensional array with ''N + 1'' dimensions, the extra dimension being the measure dimension.  You specify the measure dimension in the «valueColumn» parameter, usually as an Index, but it can be a 1-D array, indexed by your measure index.  The  size of the «valueColumn» index must be ''M''. In this case, «vars» identifies as indexes only the first ''N'' elements of «Cols».  It maps the remaining ''M'' «Cols» into «valueColumn». Hence, «valueColumn» index is also an index of the result.
+In OLAP terminology, a '''fact table''' is a table in which the first '''N-M''' columns specify coordinates in a multi-dimensional cube, and the last '''M''' columns specify measures along a ''measure dimension''. Each row has multiple values across this measure dimension. In this case,  [[MdTable]]() generates a multi-dimensional array with '''N-M+1''' dimensions, the extra dimension being the measure dimension.  You specify the measure dimension in the «valueColumn» parameter, usually as an Index, but it can be a 1-D array, indexed by your measure index.  The  size of the «valueColumn» index must be ''M''. In this case, «vars» identifies as indexes only the first '''N-M''' elements of «Cols».  It maps the remaining '''M''' «Cols» into «valueColumn», which is also an index of the result.
 == Library ==