Difference between revisions of "MdTable"

(adorned with links, italics)
(Updated to reflect some new 4.0 features: user-defined conglomeration function, valueColumn)
Line 1: Line 1:
 
[[category:Array Flattening Functions]]
 
[[category:Array Flattening Functions]]
  
= MdTable(T,Rows,Cols,''Vars,conglomFn,missingval'') =
+
= MdTable(T,Rows,Cols,''Vars,conglomFn,defaultValue,valueColumn'') =
  
Returns a multi-dimensional array from a two-dimensional table
+
Returns a multi-dimensional array from a two-dimensional table of values. T is a two-dimensional array (i.e., a table) indexed by ''Rows'' and ''Cols''. Each row of ''T'' specifies a data point containing the coordinates of a cell in a multi-dimensional array, along with the value for that cell.
of values. T is a two-dimensional array (i.e., a table) indexed by
 
''Rows'' and ''Cols''. Each row of ''T'' specifies the coordinates of a cell
 
in a multi-dimensional array, along with the value for that cell.
 
  
 +
MdTable interprets the first ''N'' columns (along the Cols index) as containing a coordinate in the final multi-dimensional array, and last remaining ''M'' columns as containing values (in OLAP terminology, ''measures''), where ''N+M = Size(Cols)''.  In the most common usage, ''M=1'' and ''N=Size(Cols-1)'', meaning that all columns except the last specify the coordinate, and the last column contains the cell value.  In this case, the ''Vars'' and ''valueColumn'' parameters are optional.  If the ''Vars'' parameter is specified, its length determines how many columns are used for coordinates, and if its length is less than Size(Cols)-1, then the valueColumn parameter must be specified to indicate which column contains the cell value.
  
The dimensions of the final result are given by the optional
+
Note: the optional valueColumn parameter, and thus the ability to have ''M>1'', is new to 4.0.
parameter Vars. Vars must be a list of index identifiers or index
 
names. The length of Cols must be one greater than the length of
 
Vars.
 
  
If Vars is omitted, the dimensions of the final result are specified
+
The dimensions of the final result are given by the optional parameter ''Vars''. ''Vars'' must be a list of index identifiers or index names.  (Either identifiers, [[VarTerm]]s, or strings). If an element contains an index name as a string, the index must be a global index (i.e,. not a [[Local Indexes|local index]].
by the first ''n-1'' elements of ''Cols'' (''n=size(Cols)''). In this case, the
 
elements of Cols must be index identifiers or index names.
 
  
The first ''n-1'' columns of ''T'' specify the coordinates of a cell in the
+
As an alternative to using the ''Vars'' parameter, you can specify the index names or identifiers for the final array as the first ''Size(Col)-1'' elements of the ''Col' index.  In this case, the ''Vars'' parameter does not need to be specified.
result. The final column of ''T'' specifies the value for the indicated
 
cell.
 
  
Before using ''MDTable'', you must define all of the indexes for the
+
In OLAP terminology, a fact table is a table in which the first N columns specify coordinates in a multi-dimensional cube, and the last M columns specify ''measures'' along a ''measure dimension'', so that the final multi-dimensional array has ''N+1'' dimensions, the extra dimension being the measure dimension. This transformation is accomplished by providing the measure dimension in the valueColumn parameter.  The valueColumn parameter should then be a 1-D array, indexed by your measure dimension, where the array elements identify columns in the Cols index.  (TBD: Insert example model)
result. Each index must include all values that occur in the corresponding
 
column of ''T'' or an error will result. The [[Unique]]() function
 
is useful for defining the necessary indexes.
 
  
It is possible that two or more rows of ''T'' specify identical coordinates.
+
Before using ''MDTable'', you must define all of the indexes for the result. Each index must include all values that occur in the corresponding column of ''T'' or an error will result. The [[Unique]]() function is useful for defining the necessary indexes.
In this case, a conglomeration function is used combine
 
the values for the given cell. The conglomFn parameter is a text
 
value specifying which conglomeration function is to be used.
 
Possible values are: "sum" (default), "min", "max", "average", or
 
"product".
 
  
(new to 4.0 -- to fill in): Using your own UDF-defined conglomeration functions.
+
It is possible that two or more rows of ''T'' specify identical coordinates. In this case, a conglomeration function is used combine the values for the given cell. The conglomFn parameter is a text value specifying which conglomeration function is to be used. Some possible values are: "sum" (default), "min", "max", "average", or "product".  You can also create your own custom conglomeration function.
  
It is also possible that no row in T corresponds to a particular cell.
+
A congomeration function has a parameter declaration of:  ( A : Array[I] ; I : IndexType ).  Additional parameters are okay as long as they are optional.  You can define a user-defined function of this form to perform the conglomeration for a cell when multiple coordinates map to the same cell.  You then specify the function name (as a string) to the ''congomerationFn'' parameter.
In this case, the cell value is set to missingval, or if the missingval
+
 
parameter is omitted, the cell value is set to undefined. Undefined
+
It is also possible that no row in T corresponds to a particular cell. In this case, the cell value is set to missingval, or if the ''defaultVal'' parameter is omitted, the cell value is set to [[Null]].
values can be detected using the IsUndef() function.
 
  
 
= Library =
 
= Library =

Revision as of 19:12, 8 March 2007


MdTable(T,Rows,Cols,Vars,conglomFn,defaultValue,valueColumn)

Returns a multi-dimensional array from a two-dimensional table of values. T is a two-dimensional array (i.e., a table) indexed by Rows and Cols. Each row of T specifies a data point containing the coordinates of a cell in a multi-dimensional array, along with the value for that cell.

MdTable interprets the first N columns (along the Cols index) as containing a coordinate in the final multi-dimensional array, and last remaining M columns as containing values (in OLAP terminology, measures), where N+M = Size(Cols). In the most common usage, M=1 and N=Size(Cols-1), meaning that all columns except the last specify the coordinate, and the last column contains the cell value. In this case, the Vars and valueColumn parameters are optional. If the Vars parameter is specified, its length determines how many columns are used for coordinates, and if its length is less than Size(Cols)-1, then the valueColumn parameter must be specified to indicate which column contains the cell value.

Note: the optional valueColumn parameter, and thus the ability to have M>1, is new to 4.0.

The dimensions of the final result are given by the optional parameter Vars. Vars must be a list of index identifiers or index names. (Either identifiers, VarTerms, or strings). If an element contains an index name as a string, the index must be a global index (i.e,. not a local index.

As an alternative to using the Vars parameter, you can specify the index names or identifiers for the final array as the first Size(Col)-1 elements of the Col' index. In this case, the Vars parameter does not need to be specified.

In OLAP terminology, a fact table is a table in which the first N columns specify coordinates in a multi-dimensional cube, and the last M columns specify measures along a measure dimension, so that the final multi-dimensional array has N+1 dimensions, the extra dimension being the measure dimension. This transformation is accomplished by providing the measure dimension in the valueColumn parameter. The valueColumn parameter should then be a 1-D array, indexed by your measure dimension, where the array elements identify columns in the Cols index. (TBD: Insert example model)

Before using MDTable, you must define all of the indexes for the result. Each index must include all values that occur in the corresponding column of T or an error will result. The Unique() function is useful for defining the necessary indexes.

It is possible that two or more rows of T specify identical coordinates. In this case, a conglomeration function is used combine the values for the given cell. The conglomFn parameter is a text value specifying which conglomeration function is to be used. Some possible values are: "sum" (default), "min", "max", "average", or "product". You can also create your own custom conglomeration function.

A congomeration function has a parameter declaration of: ( A : Array[I] ; I : IndexType ). Additional parameters are okay as long as they are optional. You can define a user-defined function of this form to perform the conglomeration for a cell when multiple coordinates map to the same cell. You then specify the function name (as a string) to the congomerationFn parameter.

It is also possible that no row in T corresponds to a particular cell. In this case, the cell value is set to missingval, or if the defaultVal parameter is omitted, the cell value is set to Null.

Library

Array

Example

Suppose T, Rows, and Cols are defined as indicated by the following table:

Rows v, Cols >>
Car_type Mpg X
1 VW 26 2185
2 VW 30 1705
3 Honda 26 2330
4 Honda 35 2210
5 BMW 30 2955
6 BMW 35 2800
7 BMW 35 2870
MDTable(T,Rows,Cols,[Car_type,Mpg],'average','n/a') -->
Rows v, Cols >>
Car_type Mpg X
VW 2185 1705 n/a
Honda 2330 n/a 2210
BMW n/a 2955 2835

Notice that in the example, Rows 6 and 7 both specified values for Car_type=BMW, Mpg=35. The "average" conglomeration function was used to combine these.

See Also

Comments


You are not allowed to post comments.