Difference between revisions of "Relational to Array Dialog"

Latest revision as of 22:51, 12 June 2025

This dialog is a "wizard" to make it easier to convert a relational table or "wide table" into a multi-dimensional array. It creates a call to MdTable-- the Analytica function that does this conversion -- but you can do it from the dialog without have to write out the function parameters and more.

Relational tables

Relational tables are commonly used in databases and spreadsheets. A relational table is a two-dimensional table with one "record" on each row. The initial column(s) contain the values of an index (or key field). The last column contains the corresponding value, often a number, but it can be text.

A relational table often depicts quantities that are effectively multi-dimensional quantities. However, the multi-dimensionality is only implied, where some columns effectively specify the "coordinate" of the record in a multi-dimensional space. In such case, it is usually far more convenient in Analytica to convert it into an actual multi-dimensional array, which Analytica loves to manipulate directly. This accomplishes two things: (a) Is separates our heterogenous data into separate variables, where each variable represents one type of quantity only, with one consistent unit of measurement, and (b) turns the implicit "coordinates" that appear in the index-like columns of a relational table into actual explicit indexes. Once the quantities are in a homogeneous array form, they become really easy to manipulate using Analytica's various array functions.

It is worth noting, however, that for an array with many dimensions, and many index combinations that are absemt (a sparse array), a relational table can be a much more space-efficient.

Here is an example of a simple 3-column relational table.

Project Route Length relational table.png

The project and route columns are "index-like", where as the Length column contains quantitative data.

Wide tables

The dialog also handles the conversion of a wide table to an array (formerly termed a fact table.) A wide table is a relational table having more than one value column, and in which the multiple value columns are all instances of the same quantity varying across a "value index" (often a time index). A wide table allows one dimension to be embedded in each record, rather than having to have a separate row for every cell. It is more compact than a full (long) relational table, that has only one value per row, and so more rows by a factor equal to the length of the value index.

(TO DO: Insert an image)

Accessing the dialog

To bring up the Relational table to array dialog, select Relational table to array... on the right-mouse context menu while viewing a relational table in a result window.

Relational table to array menu option.png

For the option to be present and enabled,

You must be in edit mode.
The module containing the table's variable must be editable (it cannot be locked, browse-only). This is where it will create new variables.
It must be a result table (not an edit table)
The table must appear to be a relational table.
- The table must be pivoted with the columns horizontally and the records down the vertical rows.
- Column names must be unique (no duplicates)
- (A few others, TBD)

Column types

In the main part of the dialog, you specify for each column of your relational table whether the column should be treated as an index, as a value or ignored. For each index column, you have the options of using an existing index in your model, creating a new index, or using a new local index. These are each covered in turn below.

One of the nicest things about using this dialog is that it can take care of creating the index nodes for you, eliminating a few steps when writing an expression yourself that calls MdTable.

An index should have only unique values (no duplicate elements). An index column in a relational table will have many duplicates. In general, an index should have all the values that appear in the corresponding column of the relational table.

The dialog looks at your data and makes an educated first guess about how to treat each column.

Existing index

Use this option when you already have an index in your model for the column. The existing index should contain all the values that appear in the column.

If the column label is a handle to an index, or if there is already an index that matches the name of your column and has the values that appear in your relational column, it will auto-select the existing index.

When you first select this option, you may see «click to select» as seen here

The blue underline indicates that the item can be clicked. You'll need to select the existing index in the index finder that pops up.

Once you've selected an index, or it has found one for you, the selected index identifier is blue underlined, for example

To change it to a different existing index, click the blue link and again use the index selector to locate the correct index.

When you select an existing index, the index definition is not modified my the dialog.

New index

This option creates a new global index for you, and defines it automatically to be the sorted unique values that appear in the column.

Your relational column name is used to create the identifier for the new index. You can change this later by renaming the created node if necessary.

The new index is defined by an expression that computes the sorted unique values. Thus, if you want to tweak this in any way, you can simply modify the definition later.

Local index

This option creates a new local index, using the sorted unique values that appear in the column. The local index's name is based on the name of your relational column.

The local index for each separate value variable is a different local index, even though they might have the same name. Thus, use this option with care. Typically, it is much more convenient to use have a global index in Analytica, so consider using New index instead.

Value

A value column contains the data for the resulting array. Value columns usually contain numeric data.

You need to have at least one Value column (with one exception -- you don't need any value column when using a Count type).

You can specify multiple Value columns. When you do so, there are two options:

Create a separate variable for each value column.
Combine all the specified value columns into one result variable using a Value index. See #Value indexes.

You should use the first option -- separate variables -- when each value column contains different types of quantities. Use a value index (also known as a fact index) only when your source table is structured as a fact table, where the value columns are used to embed an index (such as a time-like index) in each record.

When you mark more than one column as Value, the value index selector appears. Set this to '"No value index"' when you want to create a separate variable for each column.

Ignore

Columns marked Ignore do not participate in the transformation. Use this when the column should not appear as an index in the final result. Ignoring a column may cause some rows to map to the same index coordinates, in which case the values are aggregated (see [[#Aggregation type]).

Aggregation type

In is not uncommon for multiple records to have the same values in every index column. When this happens, you have multiple records mapping to the same result cell. To deal with this case, the values much be combined. Words such as aggregated and conglomerated are sometimes used for this process.

You have several standard options for the 'Type of aggregation (called simply the type):

Sum (the default)
Min
Max
Average
Product
First
Last
Count

Of these, only Count does not require a Value column.

Other custom types are possible as well (i.e., any array reducing function), but don't appear on the dialog's dropdown. Since the dialog produces a variable with an explicit call to MdTable, if the need arises to use a custom type, you can modify the definition. See MdTable for details on using a custom «type».

Multiple values columns

When you mark more than one column as Value, the value index selector appears. Set this to '"No value index"' when you want to create a separate variable for each column.

Value indexes

Sometimes you want a subset of value columns to be in the same result using an index for the value columns. This occurs when transforming a so-called fact table. We illustrate this with an example.

Each row in the following STAR table contains data for one student.

(This data was downloaded from this collection of datasets.)

We wish to extract the math test scores, indexed by gender, ethnicity and birth (quarter). There are 4 columns with math scores. The dialog entries are set as follows.

There are several columns that cannot be seen in this image because you have to scroll down. All were set to Ignore in this example).

Notice that the four math test score columns have been marked as Value, and we have specified that a new Value index be created named Math_test. After pressing OK, and renaming the new variable to Math_scores, the result is

A single Variable was created, along with four indexes: Math_test, ethnicity, birth and gender.

@@ Line 3: / Line 3: @@
 ''New to [[Analytica 6.3]]''
-This dialog is used to convert a relational table or "fact table" to a multi-dimensional array.
+This dialog is a "wizard" to make it easier to convert a relational table or "wide table" into a multi-dimensional array. It creates a call to [[MdTable]]-- the Analytica function that does this conversion -- but you can do it from the dialog without have to write out the function parameters and more.
 <center>[[image:MdTable Wizard Project Route Length.png]]</center>
@@ Line 9: / Line 10: @@
 == Relational tables ==
-Relational tables are commonly used in databases and spreadsheets. A ''relational table'' is a two-dimensional heterogeneous tables with one "record" on each row, and different quantities relating to each record in a different column. They are often heterogeneous, with some columns containing index-like labels, and other columns containing numerical quantities with varying units of measurement and varying meanings.
+Relational tables are commonly used in databases and spreadsheets. A ''relational table'' is a two-dimensional table with one "record" on each row. The initial column(s) contain the values of an index (or key field). The last column contains the corresponding value, often a number, but it can be text.
-A relational table often depicts quantities that are effectively multi-dimensional quantities. However, the multi-dimensionality is only implied, where some columns effectively specify the "coordinate" of the record in a multi-dimensional space. In such a case, it is usually far more convenient in Analytica to convert these into actual multi-dimensional arrays, which Analytica loves to manipulate directly. This accomplishes two things: (a) Is separates our heterogenous data into separate variables, where each variable represents one type of quantity only, with one consistent unit of measurement, and (b) turns the implicit "coordinates" that appear in the index-like columns of a relational table into actual explicit indexes. Once the quantities are in a homogeneous array form, they become really easy to manipulate using Analytica's various array functions.
+A relational table often depicts quantities that are effectively multi-dimensional quantities. However, the multi-dimensionality is only implied, where some columns effectively specify the "coordinate" of the record in a multi-dimensional space. In such case, it is usually far more convenient in Analytica to convert it into an actual multi-dimensional array, which Analytica loves to manipulate directly. This accomplishes two things: (a) Is separates our heterogenous data into separate variables, where each variable represents one type of quantity only, with one consistent unit of measurement, and (b) turns the implicit "coordinates" that appear in the index-like columns of a relational table into actual explicit indexes. Once the quantities are in a homogeneous array form, they become really easy to manipulate using Analytica's various array functions.
-It is worth noting, however, that for some extremely high-dimensional cases, a relational table can be a very space-efficient sparse representation.
+It is worth noting, however, that for an array with many dimensions, and many index combinations that are absemt (a sparse array), a relational table can be a much more space-efficient.
 Here is an example of a simple 3-column relational table.
 :[[image:Project Route Length relational table.png]]
 The project and route columns are "index-like", where as the Length column contains quantitative data.
+== Wide tables ==
+The dialog also handles the conversion of a ''wide table'' to an array (formerly termed a ''fact table''.)  A wide table is a relational table having more than one value column, and in which the multiple value columns are all instances of the same quantity varying across a "value index" (often a time index). A wide table allows one dimension to be embedded in each record, rather than having to have a separate row for every cell. It is more compact than a full (long) relational table, that has only one value per row, and so more rows by a factor equal to the length of the value index.
+(TO DO: Insert an image)
 == Accessing the dialog ==
@@ Line 32: / Line 39: @@
 ** Column names must be unique (no duplicates)
 ** (A few others, TBD)
+== Column types ==
+In the main part of the dialog, you specify for each column of your relational table whether the column should be treated as an index, as a value or ignored.  For each index column, you have the options of using an existing index in your model, creating a new index, or using a new local index. These are each covered in turn below.
+One of the nicest things about using this dialog is that it can take care of creating the index nodes for you, eliminating a few steps when writing an expression yourself that calls [[MdTable]].
+An index should have only unique values (no duplicate elements). An index column in a relational table will have many duplicates. In general, an index should have all the values that appear in the corresponding column of the relational table.
+The dialog looks at your data and makes an educated first guess about how to treat each column.
+=== Existing index ===
+Use this option when you already have an index in your model for the column. The existing index should contain all the values that appear in the column.
+If the column label is a handle to an index, or if there is already an index that matches the name of your column and has the values that appear in your relational column, it will auto-select the existing index.
+When you first select this option, you may see «click to select» as seen here
+:[[image:Click to select existing index.png]]
+The blue underline indicates that the item can be clicked. You'll need to select the existing index in the index finder that pops up.
+Once you've selected an index, or it has found one for you, the selected index identifier is blue underlined, for example
+:[[image:Existing route index selected.png]]
+To change it to a different existing index, click the blue link and again use the index selector to locate the correct index.
+When you select an existing index, the index definition is not modified my the dialog.
+=== New index ===
+This option creates a new global index for you, and defines it automatically to be the sorted unique values that appear in the column.
+Your relational column name is used to create the identifier for the new index. You can change this later by renaming the created node if necessary.
+The new index is defined by an expression that computes the sorted unique values. Thus, if you want to tweak this in any way, you can simply modify the definition later.
+=== Local index ===
+This option creates a new [[Local Indexes|local index]], using the sorted unique values that appear in the column. The local index's name is based on the name of your relational column.
+The local index for each separate value variable is a ''different'' local index, even though they might have the same name. Thus, use this option with care. Typically, it is much more convenient to use have a global index in Analytica, so consider using ''New index'' instead.
+=== Value ===
+A value column contains the data for the resulting array. Value columns usually contain numeric data.
+You need to have at least one Value column (with one exception -- you don't need any value column when using a <code>Count</code> type).
+You can specify multiple Value columns. When you do so, there are two options:
+* Create a separate variable for each value column.
+* Combine all the specified value columns into one result variable using a ''Value index''.  See [[#Value indexes]].
+You should use the first option -- separate variables -- when each value column contains different types of quantities. Use a value index (also known as a fact index) only when your source table is structured as a fact table, where the value columns are used to embed an index (such as a time-like index) in each record.
+When you mark more than one column as Value, the value index selector appears. Set this to '"No value index"' when you want to create a separate variable for each column.
+=== Ignore ===
+Columns marked ''Ignore'' do not participate in the transformation. Use this when the column should not appear as an index in the final result. Ignoring a column may cause some rows to map to the same index coordinates, in which case the values are aggregated (see [[#Aggregation type]).
+== Aggregation type ==
+In is not uncommon for multiple records to have the same values in every index column. When this happens, you have multiple records mapping to the ''same'' result cell. To deal with this case, the values much be combined. Words such as ''aggregated'' and ''conglomerated'' are sometimes used for this process.
+You have several standard options for the ''''Type''' of aggregation (called simply the ''type''):
+* [[Sum]] (the default)
+* [[Min]]
+* [[Max]]
+* [[Average]]
+* [[Product]]
+* First
+* Last
+* [[Count]]
+Of these, only [[Count]] does not require a Value column.
+Other custom types are possible as well (i.e., any array reducing function), but don't appear on the dialog's dropdown. Since the dialog produces a variable with an explicit call to [[MdTable]], if the need arises to use a custom type, you can modify the definition. See [[MdTable]] for details on using a custom «type».
+== Multiple values columns ==
+When you mark more than one column as Value, the value index selector appears. Set this to '"No value index"' when you want to create a separate variable for each column.
+=== Value indexes ===
+Sometimes you want a subset of value columns to be in the same result using an index for the value columns. This occurs when transforming a so-called ''fact table''.  We illustrate this with an example.
+Each row in the following STAR table contains data for one student.
+:[[image:STAR relational.png]]
+(This data was [https://vincentarelbundock.github.io/Rdatasets/csv/AER/STAR.csv downloaded] from [https://vincentarelbundock.github.io/Rdatasets/datasets.html this collection of datasets].)
+We wish to extract the math test scores, indexed by gender, ethnicity and birth (quarter). There are 4 columns with math scores. The dialog entries are set as follows.
+:[[image:MdTable Wizard for Math Scores.png]]
+There are several columns that cannot be seen in this image because you have to scroll down. All were set to '''Ignore''' in this example).
+Notice that the four math test score columns have been marked as <code>Value</code>, and we have specified that a new Value index be created named Math_test. After pressing '''OK''', and renaming the new variable to Math_scores, the result is
+:[[image:Math scores array.png]]
+A single Variable was created, along with four indexes: <code>Math_test</code>, <code>ethnicity</code>, <code>birth</code> and <code>gender</code>.
+== See Also ==
+* [[MdTable]]
+* [[New variable from column]]
+* [[Import and Export data]]