MdArrayToTable

Revision as of 00:48, 29 January 2016 by Bbecane (talk | contribs)


MdArrayToTable(a, row, col, valueIndex, positional, omitZero, omitNull)

Transforms a multi-dimensional array, «a», into a two-dimensional relational table. A relational table contains a row corresponding to each cell of «a» and a column for each index of «a». Each row shows the values of each index identifying that cell, plus a final column 'Value' with the cell value. By default, it generates local indexes .Row and .Col, unless you specify global indexes as «row» or «col» parameters.

By default, it omits rows for any cells of «a» that are empty, i.e. contain 0 or Null. Hence, a relational table is a more compact representation for sparse arrays in which a large fraction of cells are empty. On the other hand, the standard Analytica array representation is usually more compact and easier to handle for arrays with few or no empty cells.

All parameters except the first are optional.

Use the optional «row» and «col» parameters to provide the indexes for the final result explicitly. When these are omitted, the function does not abide by the law of array-abstraction (see Preserving Array-abstraction).

Making a fact table

MdArrayToTable() can also produce a fact table -- an extension of a relational table, in which each row shows the the values of all the cells over one index of «a», specified in the «valueIndex» parameter, instead of just a single cell. A fact table is 2D, like a relational table, but it is more compact, reducing the number of rows by a factor equal to the size of «valueIndex».

You must specify the «valueIndex» -- one index of «a» -- to obtain a fact table. If you specify the «col» parameter, it must contain all (or some) of the indexes of «a» excluding the «valueIndex» concatenated to the values of the «valueIndex», which will form the headers of the corresponding cell values in each row. If «a» has n indexes, including the «valueIndex», «col» contains up to n-1 + k elements, where k is the number of elements in «valueIndex». There is one row for each combination of the index values of «a» excluding the «valueIndex» -- and excluding any rows when all of its original cell values are empty, 0 or NULL.

When you specify «col», each of the first n elements of «col» identifies an index of «a», and to do that, it must be either a handle to the index or the textual identifier of the index. The last element of «col» in a relational table, or the last k elements in a fact table, corresponds to the value column(s), and you may use any label you wish. When you define the index you will use for «col», the Concat function is often used, e.g., Concat(IndexesOf(A), 'A') or for a fact table, Concat(#SetDifference(\IndexesOf(A), \Handle(valueIndex)), valueIndex). Notice that for a fact table, «col» should not include the «valueIndex».

You usually create an index, which you pass to the «col» parameter, but you also have the option of passing a one-dimensional array for «col». This can be useful if you want the index values to be something other than the handles or identifiers of the indexes of «a». When «col» in a 1-D array, it is the cells of the array (rather than the index elements) that must contain the index handle or identifier.

You are not required to include all indexes of «a» in «col». In this case, you obtain a partial flattening, where «a» is flattened only over the indexes that appear in «col». All other indexes are array abstracted, and appear as indexes of the result.

Creating Sparse Relational Tables

You can elect to include only the non-null / non-zero elements of «a» in the resulting table. Normally, when you want all elements, you provide an index «row» with a length equal to the total number of cells in the array. To include only the non-zero / non-null elements, provide an index with a length equal to the number of non-null / non-zero elements. If you omit «row», only the non-empty elements are included, although the «omitZero» and «omitNull» flags alter this.

If you want to include zeroes, but not Nulls, then size «row» for the number of non-Null values and specify:

MdArrayToTable(..., omitZero: false)

Likewise, if you want to include Null cells as rows, but not zeros, specify:

MdArrayToTable(..., omitNull: false)

Fact tables and valueIndex

MdArrayToTable(A, Row, Col, valueIndex)

A fact table is a generalization of a simple relational table. In a fact table, the first N columns identify cell coordinates, and the last K columns hold measured values (often called measures, facts, or just values). This generalized the simple relational table where only the single final row holds a value.

The optional parameter, «valueIndex», specifies an index of «a» that is used for the value columns in the result. The first Size(Col) - Size(«valueIndex») columns of «col» are then the coordinates, and the final Size(«valueIndex») columns of «col» hold the values.

When omitting cells having no data, all values in «a» along the «valueIndex» must be zero or Null. If any of the values have data, the corresponding cell (i.e., row in the result) is considered to have data.

Positional Coordinates

MdArrayToTable(..., positional: true)

Normally, the first N columns of the result specify coordinates using index labels. When the optional parameter «positional» is specified as true, index positions are returned rather than labels.

Customizing Column Names

Normally, the column labels in «col» are just the index identifiers. However, you may need to use different labels, such as labels to match column names in an external database, or more human-readable names. To do this, define an index, say Column, with the desired labels. Then create a Table indexed by Column, call it L, and fill in the index names in the table. Pass L as the third parameter to MdArrayToTable

Examples

Define these global objects (as indexes or variables):

Rows := sequence(1, size(Cost_in_time))
Cols := ["Mpg", "Time", "Car_type", "Cost"]
MDArrayToTable(Cost_in_time, Rows, Cols) →
Cols ▶
Rows ▼ Mpg Time Car_type Cost
1 26 0 VW 2185
2 26 0 Honda 2385
3 26 0 BMW 3185
4 26 1 VW 2294
5 26 1 Honda 2314
6 26 1 BMW 3294
7 26 2 VW 2409
...
45 35 4 BMW 5175

Note: The expression for doing the same transformation using local indexes looks like

Index Rows := Sequence(1, Size(Cost_in_time));
Index Cols := ["Mpg", "Time", "Car_type", "Cost"];
MdArrayToTable(Cost_in_time, Rows, Cols)

Positional example

Changing the previous example to:

MdArrayToTable(Cost_in_time, Rows, Cols, positional: true)

causes the coordinates in the first 3 columns to be expressed as index positions, rather than index labels.

Cols ▶
Rows ▼ Mpg Time Car_type Cost
1 1 1 1 2185
2 1 1 2 2385
3 1 1 3 3185
4 1 2 1 2294
5 1 2 2 2314
6 1 2 3 3294
7 1 3 1 2409
...
45 3 5 3 5175

Creating a Fact Table

In this example, we will provide a separate column for each car-type, rather than including that as a relational dimension. The numbers in the columns are the cost.

Index Row := 1..Sum(1, Mpg, Time)
Index Label := [Mpg, Time, 'VW', 'Honda', 'BMW'];
Variable Cost := MdArrayToTable(Cost_in_time, Row, Label, valueIndex: Car_type)
Cost →
Cols ▶
Rows ▼ Mpg Time VW Honda BMW
1 26 0 2185 2385 3185
2 26 1 2294 2314 3294
3 26 2 2409 2512 3359
...
15 35 4 3829 4230 5175

Details

Textual Index Names vs. Handles

You can specify indexes for the first N columns of «col» using text index names, or using handles. Which is better? The advantage of using handles is that, if you rename one of the indexes, it will automatically update the name in the definition, as it does in other definitions referring to that identifier. If you use text names, it will cause an "Unidentified identifier" error unless or until you edit the name in «col».

When defining «col» using handles, you should define an index as a List (not a list-of-labels), and then enter the index identifiers into the first N cells. For the final cell, you can enter anything, but I often just enter the identifier for the array being transformed. When using a list-of-identifiers like this, it is essential to set the MetaOnly attribute to 1. With the Index node selected in the Diagram,

  1. open the Attribute panel,
  2. click the Attribute pulldown,
  3. select metaOnly and
  4. enter 1 into the panel.

This prevents it from evaluating the variables in the list, and makes sure that the identifiers (handles actually)rather than their values that get passed as the value of «col».

Preserving Array-abstraction

Most Analytica functions are array-abstractable, meaning that they adhere to the law of array-abstraction, a very powerful principle that, when composed properly, results in models that are array-abstractable. The law of array abstraction states that for any unit of computation, f(x, y,..), that computes a result based on inputs x, y,..., and which does not operate over index I, it holds that:

f(x[I = v], y[I = v], ..) = f(x, y, ...)[I = v]

The law ensures that we can add a dimension to the inputs of computational unit, while preserving a consistency with the results before the dimension was present: For any input slice having the original values, the corresponding output slice will compute the original value.

This principle becomes particular powerful from the fact that the composition of array-abstractable expressions is array-abstractable. Thus, as long as you emply only array-abstractable constructs, your entire will be fully array-abstractable. This enable you to rely on power analyses, including parametric, sensitivity, uncertainty, and scenario analyses.

MdArrayToTable is an example of an array-abstractable function when you specify «row» and «col» parameters explicitly with a fixed number of indexes in «col». For example, you might list two indexes to be transformed (so that the index of «col» has length three, the last column to hold the value). If we later supply a three dimensional array, two of the dimensions are transformed to a relational table, the extra third dimension is array-abstracted, resulting in an array of relational tables. This transformation takes a while to get your head around -- but it is an exact application of the law of array-abstraction above.

A common practice is to use MdArrayToTable() to transform all dimensions of an array into the relational table. This is done by omitting the «row» and «col» parameters, or by creating general purpose code that collects all existing dimensions. This application of MdArrayToTable() creates a violation of the law of array-abstraction. The violation implies that you can no longer assume that any model that does this internally is consistent with the law of array-abstraction. This generalized transformation of arbitrary dimensions does require some special meta-constructs around MdArrayToTable() As long as you understand the implications of doing this, there are certainly legitimate uses of a full transformation, especially in points that interface with databases, spreadsheets, or programs external to Analytica. When it is used as part of an algorithm within your model, you should use it in an array-abstractable manner otherwise parametric, sensitivity and scenario analyses over your model may produce incorrect results.

Transforming all dimensions

It's often useful to convert all dimensions of an array into a 2D table -- even if it does mean the code is not fully array abstractable. In this case, any dimension that you add (remove) to «a» in the future will add (remove) a column from the 2-D result, whereas in a purely array-abstractable function the introduction of a new dimension to «a» would become a new dimension of the result.

You can easily full flatten an array fully by using MdArrayToTable(A) -- it adds local indexes .Row and .Col, if you omit parameters «row» and «col».

Otherwise, you need to create and define «row» and «col» yourself. The index «col» must have an element for each dimension of x, plus one extra element for the array's value:

Index Col := Concat(IndexesOf(A), 'Value')

The row index to have either the total number of elements in A, as in:

Index Row := 1..Size(A)

Or if you only want to exclude rows that contain Null or zero, use:

Index Row := 1..SumAll(A <> Null and A <> 0)

where SumAll is a user-defined function that sums x over all its indexes:

Function SumAll(x) :=
LocalAlias I:= IndexesOf(x) Do x := Sum(x, I);
x

Now you can call the function as usual:

MdArrayToTable(A, Row, Col)

History

The optional parameters «Row» and «Col» were required before Analytica 4.5 and were previously named «I» and «L».

See Also

Comments


You are not allowed to post comments.