Export-Import data format

Revision as of 19:25, 1 February 2013 by Lchrisman (talk | contribs) (→‎Formal Specification: continued.)

under construction -- do not rely on this information yet

This page provides a detailed specification of Analytica's multidimensional data file format, which we refer to as the Export-Import data format. This format is also described in the Analytica User Guide, Chapter 18, largely by way of examples. Here we provide a more complete and detailed specification of the format as it exists as of Analytica 4.5. There have been some revisions to this spec in Analytica 4.5 -- see Differences from 4.4 and earlier.

The format is used by

  • File→Export... menu command
  • File→Import... menu command
  • Edit→Copy Table menu command
  • Export typescript command
  • Import typescript command

Note: it would be nice to have ReadExportFile() and WriteExportFile() functions.

Format specification

Conventions

In the format specification that follows, we use the following conventions:

  • <name> : Indicates a pattern that is defined separately
  • +, *, ? : When these characters appear at the end of a line, it means the entire line is repeated a variable number of times. These qualifiers have the following meanings:
  • + : Repeated 1 or more times
  • * : Repeated 0 or more times
  • ? : Repeated 0 or 1 times
  • | : Disjunction (or) -- A|B means A or B can appear at that spot.
  • LiteralText : These characters appear in the file verbatim.
  • New line characters in the file are explicitly specified as <newline>. when two formal patterns appear on separate lines, this does not imply newline at that position. There often will be, but it will be given explicitly, either in the pattern, or in the subpattern.

For example, when a pattern is given as:

<item><valueList><newline>+

it means that all three subpatterns are repeated one or more times (it isn't just the last one).

Formal Specification

The export file has the following format:

TextTable <view> <ident> <newline>
<tab><block>+

This says that the file format consists of a first line that starts with the word TextTable. After that, it consists of repeated <block>s. Note that the beginning of each <block> is marked with a <tab> as the first character on the line. <ident> is the Analytica identifier of the table that was exported.

<view> := Definition|EditTable|DetermTable|ProbTable|IntraTable|Value|Mid|Mean|ProbBands|Statistics|PDF|CDF|Sample|ProbValue

View identifies the type of table that was exported. This doesn't really have any impact on the import, it is more just for information.

<newline> : May be CR, LF or CRLF.  In files, the PC convention is usually CRLF.  CR is ascii 13, LF is ascii 10.
<block> := 
       <slicerList>?
       <colIdent><valueList><newline>?          { the column headers }
       <rowIdent><newline>?
       <item><valueList><newline>+              { the table data, the first <item> is the row header index value }

The format captures the array in a certain pivot, so that one index is a column index and one index is a row index. All other indexes are slicers. The slicer indexes are listed first, then the column index and then the row index. The column headers line lists the index values for the column index. It is possible to have a pivot in which there is no column index -- for example, a 1-D array would have only one index, which would usually be the row index. When there is no column index, the column headers line does not appear (this is why the ? appears at the end). Similarly, a given pivot might not have a row index. The row identifier line appears if and only if a row index is present. The row index values appear in the <item> column of the data rows. Each block will have one data row for each row index item. If there is no row index, then there will be only one data row, and <item> will be empty so that the line will actually start with a <tab> character.

The column headers row and the <rowIdent> row usually appear in every block, and appear identically in every case. This information is therefore redundant. It is an error for these lines to contain different information in two different <block>s. If the ident is different or the index value are different, then the file is not in a valid format. The row headers, i.e., <item>, are also duplicated identically in every block, so these are also redundant, and again, they must be identical in every block or file is not in a valid format.

If you are implementing a reader of this file format, you should allow the column header line and the row ident line to be optional after the first block. When the row ident line is omitted, the <item>s should also be omitted (so that each of those lines would actually start with a <tab>. Although import may not support this today, we want to leave the omission of those lines (to reduce character count) an option for the future.

<slicerList> :=
       <indexIdent><tab><value><newline>+

The slicer list provides the dimensions other than the row and column indexes, and for each block it also specifies the slicer value that pertains to this block. Each <block> contains a 2-D table of data, so a 3-D table consists of several blocks where the third dimension is a slicer index, and each block is one slice along the slicer index. A 4-D table would have two slicer indexes, so the slicer list would specify two "coordinates". In general, each block can actually contain the data of a single scalar value, a 1-D vector, or a 2-D table, depending on whether row and column indexes are present in the pivot.

The <indexIdent> is the identifier of the index. The export format requires unique names for all indexes (in theory, if local indexes are used, it is possible to have multiple indexes with the same identifier, but you can't export those). <value> is an index value -- the slicer positions are recorded by value, not by position. This also means that elements of the indexes you use as slicers must be unique.

Differences from 4.4 and earlier

Comments


You are not allowed to post comments.