Export-Import data format
under construction -- do not rely on this information yet
This page provides a detailed specification of Analytica's multidimensional data file format, which we refer to as the Export-Import data format. This format is also described in the Analytica User Guide, Chapter 18, largely by way of examples. Here we provide a more complete and detailed specification of the format as it exists as of Analytica 4.5. There have been some revisions to this spec in Analytica 4.5 -- see Differences from 4.4 and earlier.
The format is used by
- File→Export... menu command
- File→Import... menu command
- Edit→Copy Table menu command
- Export typescript command
- Import typescript command
Note: it would be nice to have ReadExportFile() and WriteExportFile() functions.
Format specification
Conventions
In the format specification that follows, we use the following conventions:
- <name> : Indicates a pattern that is defined separately
- +, *, ? : When these characters appear at the end of a line, it means the entire line is repeated a variable number of times. These qualifiers have the following meanings:
- + : Repeated 1 or more times
- * : Repeated 0 or more times
- ? : Repeated 0 or 1 times
- | : Disjunction (or) -- A|B means A or B can appear at that spot.
- LiteralText : These characters appear in the file verbatim.
- New line characters in the file are explicitly specified as <newline>. when two formal patterns appear on separate lines, this does not imply newline at that position. There often will be, but it will be given explicitly, either in the pattern, or in the subpattern.
For example, when a pattern is given as:
- <item><valueList><newline>+
it means that all three subpatterns are repeated one or more times (it isn't just the last one).
Formal Specification
The export file has the following format:
TextTable <view> <ident> <newline> <tab><block>+
This says that the file format consists of a first line that starts with the word TextTable. After that, it consists of repeated <block>s. Note that the beginning of each <block> is marked with a <tab> as the first character on the line. <ident> is the Analytica identifier of the table that was exported.
<view> := Definition|EditTable|DetermTable|ProbTable|IntraTable|Value|Mid|Mean|ProbBands|Statistics|PDF|CDF|Sample|ProbValue
View identifies the type of table that was exported. This doesn't really have any impact on the import, it is more just for information.
<newline> : May be CR, LF or CRLF. In files, the PC convention is usually CRLF. CR is ascii 13, LF is ascii 10.
<block> :=
<slicerList>?
<colIdent><valueList><newline>? { the column headers }
<rowIdent><newline>?
<item><valueList><newline>+ { the table data, the first <item> is the row header index value }
The format captures the array in a certain pivot, so that one index is a column index and one index is a row index. All other indexes are slicers. The slicer indexes are listed first, then the column index and then the row index. The column headers line lists the index values for the column index. It is possible to have a pivot in which there is no column index -- for example, a 1-D array would have only one index, which would usually be the row index. When there is no column index, the column headers line does not appear (this is why the ? appears at the end). Similarly, a given pivot might not have a row index. The row identifier line appears if and only if a row index is present. The row index values appear in the <item> column of the data rows. Each block will have one data row for each row index item. If there is no row index, then there will be only one data row, and <item> will be empty so that the line will actually start with a <tab> character.
The column headers row and the <rowIdent> row usually appear in every block, and appear identically in every case. This information is therefore redundant. It is an error for these lines to contain different information in two different <block>s. If the ident is different or the index value are different, then the file is not in a valid format. The row headers, i.e., <item>, are also duplicated identically in every block, so these are also redundant, and again, they must be identical in every block or file is not in a valid format.
If you are implementing a reader of this file format, you should allow the column header line and the row ident line to be optional after the first block. When the row ident line is omitted, the <item>s should also be omitted (so that each of those lines would actually start with a <tab>. Although import may not support this today, we want to leave the omission of those lines (to reduce character count) an option for the future.
<slicerList> :=
<indexIdent><tab><value><newline>+
The slicer list provides the dimensions other than the row and column indexes, and for each block it also specifies the slicer value that pertains to this block. Each <block> contains a 2-D table of data, so a 3-D table consists of several blocks where the third dimension is a slicer index, and each block is one slice along the slicer index. A 4-D table would have two slicer indexes, so the slicer list would specify two "coordinates". In general, each block can actually contain the data of a single scalar value, a 1-D vector, or a 2-D table, depending on whether row and column indexes are present in the pivot.
The <indexIdent> is the identifier of the index. The export format requires unique names for all indexes (in theory, if local indexes are used, it is possible to have multiple indexes with the same identifier, but you can't export those). <value> is an index value -- the slicer positions are recorded by value, not by position. This also means that elements of the indexes you use as slicers must be unique.
In the first block, the list of slicers must include all slicer indexes. In the blocks after that, only the slicers that have different values from the previous block need to be listed.
<valueList> :=
<tab><value>+
<value> := <number>|"<text>"|<expression>|~<expression>~|«null»|<blank>
<item> := <number>|"<text>"|<non-ident expression>|~<expression>~|«null»
A <value> encodes one cell in a table. In <item> encodes one item (cell) of an index. A <number> is any of Analytica's numeric formats, including suffix format, or the canonical date-time format.
"<text>"
Text values are delinated with quotation marks. The text inside cannot contain any tabs, newlines internal quotes or backslashes. If the text contains any of these characters, they need to be escaped using \t, \n, \r, \\ or \". As an alternative, a single quote (') can be used for the start and stop quote, in which case any single quote in text must be escaped using \'. These are the only escape sequences recognized. As an example, the two line text that contains a tab character and an internal quote would be encoded as "I said:\t\"Hello\rworld\"".
<expression>
<non-ident expression>
An <expression> is any Analytica expression, as might appear in a definition or edit table cell. An example might be Normal(10,3). Expressions include as special cases numbers, text, identifiers, function calls, references, Null, etc. We've called out <number> separately even though it would actually fall into this case. We've also called out "<text>" separately even though it to is a special case of expression; however, a pure text string has some slightly different character escape rules, and thus it makes sense to pull it out separately. The <non-ident expression> in <item> means any expression other than a single identifier. So, for example, Price would not qualify as a <non-ident expression>, while Price+Tax would.
When you have an Analytica expression that does not qualify given the restrictions on what can appear in <expression> or <non-ident expression>, it can be surrounded by tildes (~</code) -- in the same way <text> is surrounded by quotes, and then problematic characters within the expression can be escaped. The surrounding tildes are not part of the expression itself. Tabs, newlines and backslashes must be replaced with \t, \r, \n, or \\. Tildes may optionally be escaped using \~. These are the only escape sequences recognized.
The restrictions and the tilde convention are there to provent potentially ambiguous situations.
A multi-line expression might appear as:
<tab>~Index x:=1..100;\rConcat(x,x+1000)~
Examples
Column vector
Here is an example of a one-dimensional array, pivoted so that the sole index is the row-index. There is only one <block> since there are no slicers, and there is no column index.
TextTable EditTable House_cost_inputs
House_inputs ← index identifier
"Prop tax" 3400
"Tax rate" 0.44
"Maintenance" 4000
"Interest" 0.105
"Appreciation" 0.08
The underlying model for this is:
Index House_inputs := ["Prop tax","Tax rate","Maintenance","Interest","Appreciation"]
Variable House_cost_inputs := Table(House_inputs)(3400,0.44,4000,0.105,0.08)
Differences from 4.4 and earlier
Enable comment auto-refresher