Parsing and formatting data
After reading a data file using ReadTextFile, you'll need to parse it to separate the contents into individual values, and to convert some of the fields to numbers and dates. There are data file formats, ranging from common standards like CSV, XML and JSON, to custom formats. When you write data to a text file, using WriteTextFile, you'll need to put your output into the desired standard or custom format.
Data in rows and columns (CSV)
Data files are often organized into rows and columns. Each "cell" contains one datum, which may be a number, a date, or text. The first line of the file may (or may not) contain column headings rather than actual data. This type or organization is broadly referred to as a CSV format. CSV stands for Comma-Separated Values, since a common convention is for the value on each line to be separated by a commas, but the term is broadly applied even when a different separator, such as a tab character, is used.
Even though CSV is one of the most widely used standard data formats, there is no official CSV standard. While all CSV conventions have a lot in common, particularly the 2-D structure of the data, there are many details that can and do vary among applications. Foremost among these conventions regarding when quotes are placed around cells, how separator, new-line, and other special characters are escaped within single-cell text values, and how quoted cells are interpreted. The ParseCSV and MakeCSV functions in Analytica 5.0 and later parse and produce CSV using Excel's conventions by default, with a great deal of flexibility with optional parameters to adapt to other CSV conventions, which makes it quite easy to parse or produce CSV. These functions also handle the conversion from text to numbers and dates and vise-versa.
Reading and parsing a CSV file that uses commas as separators is done as follows:
ParseCSV(ReadTextFile( "MyFile.csv" ) )
The result is a 2-D array, indexed by local indexes named .Row
and .Column
. For a CSV file that uses a tab character as a separator, use
ParseCSV(ReadTextFile( "MyFile.csv" ), separator:Chr(9) )
ParseCSV includes many other options, some of which are likely to be necessary or convenient in a particular case. You may wish to use an existing index for the column index or row index, take the row index labels from a specific column in the data, adopt different quoting conventions, use different international/regional conventions, or extract only a subset of the columns. See ParseCSV for details.
Writing a 2-D array, x
, to a CSV file is done as follows:
WriteTextFile("MyFile.csv", MakeCSV( x, I, J ) )
where I
and J
are the indexes of x
. To write a tab-separated file, use
WriteTextFile("MyFile.csv", MakeCSV( x, I, J, separator:Chr(9) ) )
MakeCSV supports many additional conventions, see Chr(9).
XML
JSON
Custom data formats
See Also
- ReadTextFile
- WriteTextFile
- ParseCSV
- MakeCSV
- ParseJSON
- MakeJSON
- Extracting data from an XML file
- For custom-format parsing or formatting
Enable comment auto-refresher