Difference between revisions of "MdArrayToTable"

m (Reorged a bit for clarity and removed some redundant material.)
(23 intermediate revisions by 4 users not shown)
Line 2: Line 2:
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
 
[[Category:Doc Status C]] <!-- For Lumina use, do not change -->
 
   
 
   
= MdArrayToTable(A,I,L) =
+
=== MdArrayToTable(a'', row, col, valueIndex, positional, omitZero, omitNull'') ===
  
Transforms a multi-dimensional array, ''A'', into a two-dimensional
+
Transforms a multi-dimensional array, «a», into a two-dimensional relational table. A '''''relational table''''' contains a row corresponding to each cell of «a» and a column for each index of «a». Each row shows the values of each index identifying that cell, plus a final column <code>'Value'</code> with the cell value. By default, it generates local indexes <code>.Row</code> and <code>.Col</code>, unless you specify global indexes as «row» or  «col» parameters.
array (i.e., a relational table) indexed by ''I'' and ''L''. The result contains one
 
row along ''I'' for each element of ''A''. ''L'' must contain a list of names
 
of the indexes of ''A'', followed by one final element. All elements of
 
L must be text values. The column corresponding to the final element
 
of ''L'' contains the cell value. If L does not contain all the
 
indexes of ''A'', array abstraction will create a set of tables indexed
 
by the dimensions not listed in ''L''.
 
  
== Creating Sparse Relational Tables ==
+
By default, it omits rows for any cells of «a» that are empty, i.e. contain 0 or [[Null]].  Hence, a relational table is a more compact representation for sparse arrays in which a large fraction of cells are empty. On the other hand, the standard Analytica array representation is usually more compact and easier to handle for arrays with few or no empty cells.
  
You can elect to include only the non-null / non-zero elements of «A» in the resulting table.  Normally, when you want all elements, you provide an index «I» with a length equal to the total number of cells in the array.  To include only the non-zero / non-null elements, provide an index with a length equal to the number of non-null / non-zero elements.
+
All parameters except the first are optional.
  
''New to [[What's new in Analytica 4.2?|Analytica 4.2]]'': Prior to 4.2, only zero elements were excluded.  The ability to exclude [[Null]] elements is new.  The «omitZero» and «omitNull» parameters are new to 4.2.
+
Use the optional «row» and «col» parameters to provide the indexes for the final result explicitly. When you omit these, the function does not abide by the law of array-abstraction (see [[#Preserving Array-abstraction|Preserving Array-abstraction]]).  
  
If you want to include zeroes, but not [[Null]]s, then size «I» for the number of non-Null values and specify:
+
===The «col» parameter===
::[[MdArrayToTable]](...,omitZero:false)
 
Likewise, if you want to include [[Null]] cells as rows, but not zeros, specify:
 
::[[MdArrayToTable]](...,omitNull:false)
 
  
== Creating a fact table ==
+
The «col» parameter is an index specifying the columns of the result. If you omit «col», it automatically creates a local index, called <code>.Col</code>, which includes all the indexes of the array, plus 'Value' that contains the measure of each cell of the array.  When omitted, it is similar to you having specified the parameter as a index defined as <code>Index Col := Concat(IndexesOf(A), 'Value')</code>.
  
''New to [[What's new in Analytica 4.2?|Analytica 4.2]]''
+
Reasons to specify «col» are when you want to control the sequence of the columns that contain the first indexes (other than the «valueIndex») or when you want to use only a subset of the indexes.  When you omit one or more indexes, it flattens array «a» only over the indexes that appear in «col». The other indexes are [[Array Abstraction|array abstracted]], and are retained as indexes in the result.
  
::[[MdArrayToTable]](A,I,L'',valueIndex'')
+
When you specify «col», each of the first n elements of «col» identifies an index of «a», as a [[handle]] to the index or as the textual identifier of the index. It's safer to use handles in case the identifier of an index gets changed. If you want to have text values rather than handles or identifiers in the «col» index of the result, you can provide «col» as a 1-D array, whose index contains the desired text values (with 'Value' or any text value in the last item), and whose cells contain the index handles (or identifiers).
  
A fact table is a generalization of a simple relational table.  In a fact table, the first ''N'' columns identify cell coordinates, and the last ''K'' columns hold measured values (often called ''measures'', ''facts'', or just ''values'').  This generalized the simple relational table where only the single final row holds a value.
+
=== The «row» parameter ===
  
The optional parameter, «valueColumn», specifies an index of «A» that is used for the value columns in the result. The first [[Size]](L) - [[Size]](valueIndex) columns of «L» are then the coordinates, and the final [[Size]](valueIndex]] columns of «L» hold the values.
+
The «row» parameter is the index for rows in the result. If you omit the parameter, the result has a local index .Row, with an element for each combination of the index values of «a».  
  
When omitting cells having no data, all values in «A» along the valueIndex must be zero or null.  If any of the values have data, the corresponding cell (i.e., row in the result) is considered to have data.
+
===  Sparse Tables with «omitZero» or «omitNull» ===
  
== Positional Coordinates ==
+
The result normally excludes any rows whose original cell values are 0 or NULL , unless you specify «omitZero» or «omitNull» as False, respectively.  It will also include all the rows, including those with 0 or Null values, if you specify «row» as an index whose length is equal to Size(«a»).
 +
 
 +
If you want to include zeroes, but not [[Null]]s, then size «row» for the number of non-Null values and specify:
 +
:[[MdArrayToTable]](..., omitZero: false)
 +
 
 +
Or, if you want to include [[Null]] cells as rows, but not zeros, specify:
 +
:[[MdArrayToTable]](..., omitNull: false)
  
''New to [[What's new in Analytica 4.2?|Analytica 4.2]]''
+
===Fact tables and «valueIndex» ===
  
::[[MdArrayToTable]](...,positional:true)  
+
[[MdArrayToTable]]() can also produce a '''fact table''' -- an extension of a relational table, in which each row, shows the the values of all the cells over one index of «a» instead of a single 'Value' cell. You specify the index to use as the «valueIndex» parameter. A fact table is 2D, like a relational table, but it is more compact, reducing the number of rows by a factor equal to the size of «valueIndex». The first [[Size]](Col) - [[Size]](«valueIndex») columns of «col» are then the coordinates, and the final [[Size]](«valueIndex») columns of «col» hold the values.
  
Normally, the first ''N'' columns of the result specify coordinates using index ''labels''. When the optional parameter «positional» is specified as true, index positions are returned rather than labels.
+
If you specify the «col» parameter, it must contain all (or some) of the indexes of  «a» excluding the «valueIndex», concatenated with the values of the «valueIndex».  If «a» has n indexes, including the  «valueIndex», «col» contains up to n-1 + k elements, where k is the number of elements in «valueIndex».  You could generate this index <code>Concat(#SetDifference(\IndexesOf(A), \Handle(valueIndex)), valueIndex)</code>. If you omit the «col» parameter, it does this automatically.
  
== Customizing Column Names ==
+
A fact table normally omits rows whose values along the «valueIndex» are all zero or [[Null]], unless you specify «omitZero» or «omitNull» as False. 
  
Normally, the column labels in «L» are just the index identifiers.  However, you may need to use different labels, such as labels to match column names in an external database, or more human-readable names.  To do this, define an index, say ''Column'', with the desired labels.  Then create a [[Table]] indexed by ''Column'', call it ''L'', and fill in the index names in the table. Pass ''L'' as the third parameter to [[MdArrayToTable]].
+
== Positional Coordinates ==
 +
:[[MdArrayToTable]](..., positional: true)
  
= Library =
+
Normally, the first ''N'' columns of the result contain coordinates using index ''labels''.  When you specify parameter «positional» as true, it returns the positions of each index value rather than its label.
  
Array
+
== Customizing Column Names ==
  
= Examples =
+
Normally, the column labels in «col» are just the index identifiers.  However, you may need to use different labels, such as labels to match column names in an external database, or more human-readable names.  To do this, define an index, say <code>Column</code>, with the desired labels.  Then create a [[Table]] indexed by <code>Column</code>, call it <code>L</code>, and fill in the index names in the table.  Pass <code>L</code> as the third parameter to [[MdArrayToTable]]
  
 +
== Examples ==
 
Define these global objects (as indexes or variables):
 
Define these global objects (as indexes or variables):
Rows := sequence(1,size(Cost_in_time))
 
Cols := [‘Mpg’,’Time’,’Car_type’,’Cost’]
 
  
MDArrayToTable(Cost_in_time,Rows,Cols)
+
:<code>Rows := sequence(1, size(Cost_in_time))</code>
{| border="1"
+
:<code>Cols := ["Mpg", "Time", "Car_type", "Cost"]</code>
|+ Rows v, Cols >>
+
:<code>MDArrayToTable(Cost_in_time, Rows, Cols) &rarr;</code>
!
+
:{| class="wikitable"
!Mpg !!Time !!Car_type !!Cost
+
! !! colspan=4| Cols &#9654;
 +
|-
 +
!Rows &#9660; !!Mpg !!Time !!Car_type !!Cost
 
|-
 
|-
 
!1  
 
!1  
Line 92: Line 90:
  
 
Note: The expression for doing the same transformation using local indexes looks like
 
Note: The expression for doing the same transformation using local indexes looks like
[[Index]] Rows := [[Sequence]](1,[[Size]](Cost_in_time))
+
:<code>Index Rows := Sequence(1, Size(Cost_in_time));</code>
[[Index]] Cols := [‘Mpg’,’Time’,’Car_type’,’Cost’]
+
:<code>Index Cols := ["Mpg", "Time", "Car_type", "Cost"];</code>
[[MdArrayToTable]](Cost_in_time,Rows,Cols)
+
:<code>MdArrayToTable(Cost_in_time, Rows, Cols)</code>
  
 
== Positional example ==
 
== Positional example ==
 +
Changing the previous example to:
 +
:<code>MdArrayToTable(Cost_in_time, Rows, Cols, positional: true)</code>
  
Changing the previous example to:
 
[[MdArrayToTable]](Cost_in_time,Rows,Cols,positional:true)
 
 
causes the coordinates in the first 3 columns to be expressed as index positions, rather than index labels.
 
causes the coordinates in the first 3 columns to be expressed as index positions, rather than index labels.
{| border="1"
+
:{|class="wikitable"
|+ Rows v, Cols >>
+
! !! colspan=5| Cols &#9654;
!
+
|-
!Mpg !!Time !!Car_type !!Cost
+
!Rows &#9660; !!Mpg !!Time !!Car_type !!Cost
 
|-
 
|-
 
!1  
 
!1  
Line 134: Line 132:
  
 
== Creating a Fact Table ==
 
== Creating a Fact Table ==
 +
In this example, we will provide a separate column for each car-type, rather than including that as a relational dimension.  The numbers in the columns are the cost.
  
In this example, well provide a separate column for each car-type, rather than including that as a relational dimension.  The numbers in the columns are the cost.
+
:<code>Index Row := 1..Sum(1, Mpg, Time)</code>
 
+
:<code>Index Label := [Mpg, Time, 'VW', 'Honda', 'BMW'];</code>
Index Row := 1..[[Sum]](1,Mpg,Time)
+
:<code>Variable Cost := MdArrayToTable(Cost_in_time, Row, Label, valueIndex: Car_type)</code>
Index Label := [Mpg,Time,'VW','Honda','BMW'];
+
:<code>Cost &rarr;</code>
Variable Cost := [[MdArrayToTable]](Cost_in_time,Row,Label,valueIndex:Car_type)
+
:{| class="wikitable"
 
+
! !! colspan=5| Cols &#9654;
 
+
|-
{| border="0"
+
!Rows &#9660; !!Mpg !!Time !! VW !! Honda !! BMW
| Cost &rarr;
 
|
 
{| border="1"
 
|+ Rows v, Cols >>
 
!
 
!Mpg !!Time !! VW !! Honda !! BMW
 
 
|-
 
|-
 
!1  
 
!1  
Line 164: Line 157:
 
|35 ||4 ||3829 || 4230 || 5175
 
|35 ||4 ||3829 || 4230 || 5175
 
|}
 
|}
|}
 
  
= See Also =
+
== Details ==
 +
=== Textual Index Names vs. Handles ===
 +
You can specify indexes for the first ''N'' columns of «col» using text index names, or using [[handle]]s.  Which is better? The advantage of using  [[handle]]s is that, if you rename one of the indexes, it will automatically update the name in the definition, as it does in other definitions referring to that identifier. If you use text names, it will cause an "Unidentified identifier" error unless or until you edit the name in «col».
 +
 
 +
When defining «col» using [[handle]]s, you should define an index as a List (not a list-of-labels), and then enter the index identifiers into the first ''N'' cells.  For the final cell, you can enter anything, but I often just enter the identifier for the array being transformed.  When using a list-of-identifiers like this, it is ''essential'' to set the [[MetaOnly]] attribute to 1.  With the Index node selected in the Diagram,
 +
# open the Attribute panel,
 +
# click the Attribute pulldown,
 +
# select ''metaOnly'' and
 +
# enter 1 into the panel. 
 +
 
 +
This prevents it from evaluating the variables in the list, and makes sure that the identifiers ([[handle]]s actually)rather than their values that get passed as the value of «col».
 +
 
 +
=== Preserving Array-abstraction ===
 +
Most Analytica functions are [[Array Abstraction|array-abstractable]], meaning that they adhere to the ''law of array-abstraction'', a very powerful principle that, when composed properly, results in models that are ''array-abstractable''.  The law of array abstraction states that for any unit of computation, <code>f(x, y,..)</code>, that computes a result based on inputs ''x, y,...'', and which does not operate over index <code>I</code>, it holds that:
 +
 
 +
:<code>f(x[I = v], y[I = v], ..) = f(x, y, ...)[I = v]</code>
 +
 
 +
The law ensures that we can add a dimension to the inputs of computational unit, while preserving a consistency with the results before the dimension was present: For any input [[slice]] having the original values, the corresponding output slice will compute the original value.
 +
 
 +
This principle becomes particular powerful from the fact that the composition of array-abstractable expressions is array-abstractable.  Thus, as long as you emply only array-abstractable constructs, your entire will be fully array-abstractable.  This enable you to rely on power analyses, including parametric, sensitivity, uncertainty, and scenario analyses. 
 +
 
 +
[[MdArrayToTable]] is an example of an ''array-abstractable'' function when you specify «row» and «col» parameters explicitly with a fixed number of indexes in «col».  For example, you might list two indexes to be transformed (so that the index of «col» has length three, the last column to hold the value).  If we later supply a three dimensional array, two of the dimensions are transformed to a relational table, the extra third dimension is array-abstracted, resulting in an array of relational tables.  This transformation takes a while to get your head around -- but it is an exact application of the law of array-abstraction above.
 +
 
 +
It is common to use MdArrayToTable() to transform ''all'' dimensions of an array into the relational table, often by omitting the «row» and «col» parameters.  This method violates the ''law of array-abstraction'' -- and may cause problems when used for parametric or sensitivity analysis.  But there are certainly legitimate uses of a full transformation -- for example when you interface with databases, spreadsheets, or programs external to Analytica that can handle only arrays with only one or two dimensions.
 +
 
 +
== History ==
 +
The optional parameters «Row» and «Col» were required before [[Analytica 4.5]] and were previously named «I» and «L».
  
 +
== See Also ==
 +
* [[Flatten]]
 
* [[MdTable]]
 
* [[MdTable]]
 
* [[ConcatRows]]
 
* [[ConcatRows]]
 +
* [[Relational tables and multiD arrays]]

Revision as of 02:29, 17 June 2022


MdArrayToTable(a, row, col, valueIndex, positional, omitZero, omitNull)

Transforms a multi-dimensional array, «a», into a two-dimensional relational table. A relational table contains a row corresponding to each cell of «a» and a column for each index of «a». Each row shows the values of each index identifying that cell, plus a final column 'Value' with the cell value. By default, it generates local indexes .Row and .Col, unless you specify global indexes as «row» or «col» parameters.

By default, it omits rows for any cells of «a» that are empty, i.e. contain 0 or Null. Hence, a relational table is a more compact representation for sparse arrays in which a large fraction of cells are empty. On the other hand, the standard Analytica array representation is usually more compact and easier to handle for arrays with few or no empty cells.

All parameters except the first are optional.

Use the optional «row» and «col» parameters to provide the indexes for the final result explicitly. When you omit these, the function does not abide by the law of array-abstraction (see Preserving Array-abstraction).

The «col» parameter

The «col» parameter is an index specifying the columns of the result. If you omit «col», it automatically creates a local index, called .Col, which includes all the indexes of the array, plus 'Value' that contains the measure of each cell of the array. When omitted, it is similar to you having specified the parameter as a index defined as Index Col := Concat(IndexesOf(A), 'Value').

Reasons to specify «col» are when you want to control the sequence of the columns that contain the first indexes (other than the «valueIndex») or when you want to use only a subset of the indexes. When you omit one or more indexes, it flattens array «a» only over the indexes that appear in «col». The other indexes are array abstracted, and are retained as indexes in the result.

When you specify «col», each of the first n elements of «col» identifies an index of «a», as a handle to the index or as the textual identifier of the index. It's safer to use handles in case the identifier of an index gets changed. If you want to have text values rather than handles or identifiers in the «col» index of the result, you can provide «col» as a 1-D array, whose index contains the desired text values (with 'Value' or any text value in the last item), and whose cells contain the index handles (or identifiers).

The «row» parameter

The «row» parameter is the index for rows in the result. If you omit the parameter, the result has a local index .Row, with an element for each combination of the index values of «a».

Sparse Tables with «omitZero» or «omitNull»

The result normally excludes any rows whose original cell values are 0 or NULL , unless you specify «omitZero» or «omitNull» as False, respectively. It will also include all the rows, including those with 0 or Null values, if you specify «row» as an index whose length is equal to Size(«a»).

If you want to include zeroes, but not Nulls, then size «row» for the number of non-Null values and specify:

MdArrayToTable(..., omitZero: false)

Or, if you want to include Null cells as rows, but not zeros, specify:

MdArrayToTable(..., omitNull: false)

Fact tables and «valueIndex»

MdArrayToTable() can also produce a fact table -- an extension of a relational table, in which each row, shows the the values of all the cells over one index of «a» instead of a single 'Value' cell. You specify the index to use as the «valueIndex» parameter. A fact table is 2D, like a relational table, but it is more compact, reducing the number of rows by a factor equal to the size of «valueIndex». The first Size(Col) - Size(«valueIndex») columns of «col» are then the coordinates, and the final Size(«valueIndex») columns of «col» hold the values.

If you specify the «col» parameter, it must contain all (or some) of the indexes of «a» excluding the «valueIndex», concatenated with the values of the «valueIndex». If «a» has n indexes, including the «valueIndex», «col» contains up to n-1 + k elements, where k is the number of elements in «valueIndex». You could generate this index Concat(#SetDifference(\IndexesOf(A), \Handle(valueIndex)), valueIndex). If you omit the «col» parameter, it does this automatically.

A fact table normally omits rows whose values along the «valueIndex» are all zero or Null, unless you specify «omitZero» or «omitNull» as False.

Positional Coordinates

MdArrayToTable(..., positional: true)

Normally, the first N columns of the result contain coordinates using index labels. When you specify parameter «positional» as true, it returns the positions of each index value rather than its label.

Customizing Column Names

Normally, the column labels in «col» are just the index identifiers. However, you may need to use different labels, such as labels to match column names in an external database, or more human-readable names. To do this, define an index, say Column, with the desired labels. Then create a Table indexed by Column, call it L, and fill in the index names in the table. Pass L as the third parameter to MdArrayToTable

Examples

Define these global objects (as indexes or variables):

Rows := sequence(1, size(Cost_in_time))
Cols := ["Mpg", "Time", "Car_type", "Cost"]
MDArrayToTable(Cost_in_time, Rows, Cols) →
Cols ▶
Rows ▼ Mpg Time Car_type Cost
1 26 0 VW 2185
2 26 0 Honda 2385
3 26 0 BMW 3185
4 26 1 VW 2294
5 26 1 Honda 2314
6 26 1 BMW 3294
7 26 2 VW 2409
...
45 35 4 BMW 5175

Note: The expression for doing the same transformation using local indexes looks like

Index Rows := Sequence(1, Size(Cost_in_time));
Index Cols := ["Mpg", "Time", "Car_type", "Cost"];
MdArrayToTable(Cost_in_time, Rows, Cols)

Positional example

Changing the previous example to:

MdArrayToTable(Cost_in_time, Rows, Cols, positional: true)

causes the coordinates in the first 3 columns to be expressed as index positions, rather than index labels.

Cols ▶
Rows ▼ Mpg Time Car_type Cost
1 1 1 1 2185
2 1 1 2 2385
3 1 1 3 3185
4 1 2 1 2294
5 1 2 2 2314
6 1 2 3 3294
7 1 3 1 2409
...
45 3 5 3 5175

Creating a Fact Table

In this example, we will provide a separate column for each car-type, rather than including that as a relational dimension. The numbers in the columns are the cost.

Index Row := 1..Sum(1, Mpg, Time)
Index Label := [Mpg, Time, 'VW', 'Honda', 'BMW'];
Variable Cost := MdArrayToTable(Cost_in_time, Row, Label, valueIndex: Car_type)
Cost →
Cols ▶
Rows ▼ Mpg Time VW Honda BMW
1 26 0 2185 2385 3185
2 26 1 2294 2314 3294
3 26 2 2409 2512 3359
...
15 35 4 3829 4230 5175

Details

Textual Index Names vs. Handles

You can specify indexes for the first N columns of «col» using text index names, or using handles. Which is better? The advantage of using handles is that, if you rename one of the indexes, it will automatically update the name in the definition, as it does in other definitions referring to that identifier. If you use text names, it will cause an "Unidentified identifier" error unless or until you edit the name in «col».

When defining «col» using handles, you should define an index as a List (not a list-of-labels), and then enter the index identifiers into the first N cells. For the final cell, you can enter anything, but I often just enter the identifier for the array being transformed. When using a list-of-identifiers like this, it is essential to set the MetaOnly attribute to 1. With the Index node selected in the Diagram,

  1. open the Attribute panel,
  2. click the Attribute pulldown,
  3. select metaOnly and
  4. enter 1 into the panel.

This prevents it from evaluating the variables in the list, and makes sure that the identifiers (handles actually)rather than their values that get passed as the value of «col».

Preserving Array-abstraction

Most Analytica functions are array-abstractable, meaning that they adhere to the law of array-abstraction, a very powerful principle that, when composed properly, results in models that are array-abstractable. The law of array abstraction states that for any unit of computation, f(x, y,..), that computes a result based on inputs x, y,..., and which does not operate over index I, it holds that:

f(x[I = v], y[I = v], ..) = f(x, y, ...)[I = v]

The law ensures that we can add a dimension to the inputs of computational unit, while preserving a consistency with the results before the dimension was present: For any input slice having the original values, the corresponding output slice will compute the original value.

This principle becomes particular powerful from the fact that the composition of array-abstractable expressions is array-abstractable. Thus, as long as you emply only array-abstractable constructs, your entire will be fully array-abstractable. This enable you to rely on power analyses, including parametric, sensitivity, uncertainty, and scenario analyses.

MdArrayToTable is an example of an array-abstractable function when you specify «row» and «col» parameters explicitly with a fixed number of indexes in «col». For example, you might list two indexes to be transformed (so that the index of «col» has length three, the last column to hold the value). If we later supply a three dimensional array, two of the dimensions are transformed to a relational table, the extra third dimension is array-abstracted, resulting in an array of relational tables. This transformation takes a while to get your head around -- but it is an exact application of the law of array-abstraction above.

It is common to use MdArrayToTable() to transform all dimensions of an array into the relational table, often by omitting the «row» and «col» parameters. This method violates the law of array-abstraction -- and may cause problems when used for parametric or sensitivity analysis. But there are certainly legitimate uses of a full transformation -- for example when you interface with databases, spreadsheets, or programs external to Analytica that can handle only arrays with only one or two dimensions.

History

The optional parameters «Row» and «Col» were required before Analytica 4.5 and were previously named «I» and «L».

See Also

Comments


You are not allowed to post comments.