Difference between revisions of "ParseCSV"

m
(→‎See Also: -- Parsing and formatting data)
 
(30 intermediate revisions by 2 users not shown)
Line 1: Line 1:
[[category:Text functions]]
+
[[category:Text Functions]]
  
''new to [[Analytica 5.0]]''
+
Converts a text value read in from a file with CSV (Comma-Separated Values) format into a 2D array.
 +
By default, it converts numbers and dates into Analytica numbers and dates. It offers lots of optional parameters to control the column and row indexes, and handle the many variants of the CSV format -- for example, nonstandard characters that separate columns, rows, and for the quotes around text.  
  
== ParseCSV( csvText'', columnIndex, rowIndex, separator, firstLineIsHeader, trimCells, matchColumnLabels, rowIndexLabelColumn, parseFlags '' ) ==
+
== ParseCSV( csvText'', columnIndex, rowIndex, separator, firstLineIsHeader, trimCells, columnsToKeep, rowIndexLabelColumn, parseFlags, dateTemplate, decimal, quote '' ) ==
  
Parses text that is in a CSV-format (Comma Separated Values format) and returns a two-dimensional array containing the data. The CSV format in inherently two-dimensional, with items on each row separated by a comma or other specified «separator», and each row on a new line (separated by new-line characters, CR, LF or CRLF). Numerical value or dates are automatically parsed as appropriate.
+
Converts text «csvText» in a CSV (Comma-Separated Values) format into a two-dimensional array. You usually use it with [[ReadTextFile]]() to read the file:
  
=== Indexes for result ===
+
<code>ParseCSV(ReadTextFile( filename ))</code>
  
When called with only one parameter, it parses using the most common CSV conventions in the same style as Excel. Local indexes named <code>Column</code> and <code>Row</code> are created for you. Use «columnIndex» and «rowIndex» to specify existing indexes for the result. If «columnIndex» has fewer columns than actually exist in the data, then only a subset of the columns are returned. If it has more, then the extra columns are null-padded. Similarly for «rowIndex» -- if shorter than the number of rows in the data, then only the first N rows are parsed, and if it is longer, the final rows are null-padded.
+
It assumes these defaults:
 +
* The usual conventions for CSV files (like Excel):  Comma to separate fields, newline to separate rows, and double quotes (<code>"</code>) to enclose elements that may contain commas or newlines.
 +
* It returns a 2D array with local indexes, <code>.Column</code> and <code>.Row</code>.
 +
* Local index <code>.Column</code> gets the column headers from the first row.  
 +
* Local index <code>.Row</code> is numbered 1 to n-1, where n is the number of rows.  
  
=== Non-comma separator ===
+
You can override any of these defaults with these optional parameters:
If values are delineated be something other than a comma, specify «separator». Commonly occurring non-comma separators include <code>'|'</code>, <code>[[Chr]](9)</code> (tab), <code>';'</code>. «Separator» can be multiple characters.
 
  
=== Header column in data ===
+
=== «columnIndex» and «rowIndex» ===
In some CSV data files, the first line contains columns names. When you set «firstLineIsHeader» to true, the values in the first line are used to construct the local column index (if you didn't specify «columnIndex»), and to identify specify columns when you've specified «matchColumnLabels» or «rowIndexLabelsColumn», but the first line data does not appear in the resulting array itself. When «firstLineIsHeader» is set to false, then the first line is included as part of the array data, and not used to name the columns. When left unspecified, the value is inferred based on the other options specified, defaulting to true when the column names are used or referenced (i.e., when a local column index is created automatically, or «matchColumnLabels» or «rowIndexLabelsColumn» are specified).
 
  
=== Spaces around values ===
+
Specify «columnIndex» or «rowIndex» to use an existing index instead of the default local indexes <code>.Column</code> or <code>.Row</code>. If you specify a «columnIndex» it treats the first row as data and assumes that the columns are in the correct order. If the first row contains column headers that you want to ignore, perhaps because you want to rename some or all with «columnIndex»,  set «firstLineIsHeader» to <code>true</code>.  If «columnIndex»  (or «rowIndex») is shorter than the number of columns (rows) in the original, it ignores the extra columns or rows. If it is longer, it pads the result with NULL for the extra columns (rows.)
«trimCells» controls whether leading or trailing space around each value are included or not. This defaults to true (surrounding space is not included). Spaces inside quotes are never trimmed.
 
  
=== Subset of or re-ordered columns ===
+
You can use ParseCSV to update indexes «columnIndex» and/or «rowIndex» with values from the column or row headers from the CSV.  To do this, you define the global «columnIndex» and/or «rowIndex» as [[ComputedBy]] the variable or function containing the ParseCSV() call. For example, suppose you define, a variable TableFromCSV as:
If «matchColumnLabels» is set to true, «columnIndex» must be specified, and in this case, the names of the columns in the first line of the data are matched with the index labels (the comparison is case-sensitive). Only the columns that appear in the index are returned, and the order of the  columns does not have to be the same as in the data.
+
<code>
 +
Variable TableFromCSV := ParseCSV(ReadTextFile('data.csv'),  Cols,  Rows,  firstLineIsHeader: True,  rowIndexLabelColumn: 1)
 +
Index Cols := ComputedBy(TableFromCSV)
 +
Index Rows := ComputedBy(TableFromCSV)
 +
</code>
 +
It automatically updates <code>Cols</code> and <code>rows</code> with the header values from the csv table, when one of those indexes or <code>TableFromCSV</code> is first computed.
  
=== Using a column for row labels ===
+
=== «firstLineIsHeader» ===
You can use «rowIndexLabelsColumn» specify a column to be used for the row index labels, which a local row index is automatically created for you. The column used is removed from the array (unless you specify «columnIndex» with «matchColumnLabels» and explicitly include that column). For example, <code>ParseCSV(csvText, rowIndexLabelsColumn:1)</code> treats the first column as row index labels rather than as array cells.
 
  
=== Parse flags ===
+
If the CSV has no column headers, set «firstLineIsHeader» to <code>False</code>, so that it treats the first row as data.  If you don't specify a «columnIndex», it uses local index <code>.Column</code> containing numbers 1 to M (the number of columns). 
«parseFlags» is a bit-field of flags that control things such as for which cells an attempt is made to parse the cell into a number or date. The default (0) corresponds to the standard Excel conventions. The flags can be added, and are as follows.
+
 
* 0 = Parse both quoted ("52") and unquoted (52) cells.
+
=== «columnsToKeep» to select, reorder, or rename columns ===
* 1 = Don't parse any cells. E.g., <code>="52"</code>, <code>"52"</code> and <code>52</code> will all be return as text.
+
 
* 2 = Don't parse quoted cells. E.g., <code>"52"</code> is text, but <code>52</code> is numeric.
+
You can select just a single column from «csvText» by specifying «columnsToKeep» as a column name from the first line of the csv file, or as the column number. You should then omit «columnIndex».  It returns a vector (1D table) with no column index.
* 4 = Disable the '=' prefix (<code>="0042"</code> normally suppressed parsing).
+
 
* 8 = Recognize backslash-escaped quotes inside quotes. (in addition to the doubling of quotes, which is the usual CSV standard).
+
You can also select several columns and/or reorder columns by setting «columnIndex» to an existing index containing only the headers you want in the order you want. If it contains column names from the CSV (in any order), you can specify it also to «columnsToKeep», e.g.,
* 16 = Disable generalized date parsing. When on, only the format <code>6-May-2016</code> is recognized as a date but not other formats. This flag dramatically speeds up parsing times for data with no other date formats.
+
 
 +
<code>[[ParseCSV]](Csv, Col, columnsToKeep: Col)</code>
 +
 
 +
If you want to rename the columns, you can pass «columnsToKeep» a 1D array containing the column names or numbers from the CSV text indexed by «columnIndex» that contains the new names. (Some could be the same and some different.)
 +
 
 +
If you pass a list or array to «columnsToKeep» it's best to also specify «columnIndex». Without a «columnIndex» it will still extract the specified columns, but less efficiently because array abstraction repeats the call for each element in the array passed to «columnsToKeep», and it parses the «csvText» anew each time.
 +
 
 +
===  «rowIndexLabelsColumn» for row labels ===
 +
 
 +
By default, the local index <code>.Row</code> contains numbers 1 to the number of rows. But, you can also use a column to use for the labels for the local row index by setting «rowIndexLabelsColumn» to the number or label of that column.  For example,
 +
 
 +
<code>ParseCSV(CsvText, rowIndexLabelsColumn: 1)</code>
 +
 
 +
treats the first column as row index labels rather than as array cells. It removes the selected column from the result array unless you  include that column in «columnIndex» or «columnsToKeep».
 +
 
 +
=== Non-comma separator: «separator» ===
 +
 
 +
You might expect that CSV ("comma-separated value") format would ''always'' use commas to separate values! But, sometimes "CSV" files use another separator character, such as code>'|'</code>, <code>'[[Chr]](9)'</code> (tab), or <code>';'</code>.  It is convenient to choose a separator that never appears within a value, as <code>','</code> often does.  If the CSV ''does'' use comma separators, you must put the «quote»  character (default double quote) around the value to avoid confusing the parser.
 +
 
 +
=== Spaces around values: «trimCells» ===
 +
 
 +
By default it trims away any leading or trailing spaces around each value. Set «trimCells» to False if you want to retain those spaces. It never trims spaces inside quotes.
 +
 
 +
=== «parseFlags» to interpret numbers and dates ===
 +
 
 +
By default ParseCSV converts each number, whether enclosed in quotes or not,  into a number or date, following Excel conventions. It treats any cell that starts with "=" (e.g. an Excel formula) as text. You can change these defaults with «parseFlags». This parameter is a bit-field of flags, which means that you can add flags together to combine their effects: 
 +
* 0 = (default) Return quoted ("52") and unquoted (52) numbers as numbers, and cells that look like dates ('<code>6-May-2016</code>' or '9/4/2020 15:04') as dates.  
 +
* 1 = Don't parse any cells -- e.g. return <code>="52"</code>, <code>"52"</code> and <code>52</code> as text.
 +
* 2 = Don't parse quoted cells -- e.g. treat <code>"52"</code> as text, but <code>52</code> as a number.
 +
* 4 = Disable the '=' prefix (<code>="0042"</code> normally suppresses parsing).
 +
* 8 = Recognize backslash-escaped quotes to allow quotes inside a quote (in addition to the doubling of quotes, which is the usual CSV standard). For example "He said \"Hello\"." would return the text "He said "Hello"."
 +
* 16 = Disable generalized date parsing. When you do this, it still recognizes format <code>6-May-2016</code> as a date but not other formats. This flag dramatically speeds up parsing times for data with no other date formats.
 +
 
 +
=== «dateTemplate» Date Formats ===
 +
 
 +
Use the «dateTemplate» parameter to specify the ordering for international dates. Use the letters "d", "M" and "y" to specify the ordering of these components. For example, with «dateTemplate» of "d/M/y", parses "11/10/9" as 11-Oct-2009, but with «dateTemplate» of "y/M/d" it parses it as 9-Oct-2011, and with "M/d/y" it would be 10-Nov-2009.
 +
 
 +
This parameter uses the same conventions used in the Custom date formats template in the [[Number Format Dialog]], but only the relative ordering of the day, month and year patterns matter. Hence, all these work equivalently: "dMy", "dd/MM/yyyy", "d-M-yy", "d M yyyy". Also, upper/lower case matters!  "M" must be uppercase (because "m" means minutes").
 +
 
 +
=== «decimal» point and thousands separators ===
 +
 +
By default, it assumes that numbers look like <code>"2,525,948.77"</code> with a decimal point <code>'.'</code> (dot) to separate the whole and decimal part, and commas <code>','</code> to group digits into threes (thousands, millions, billions, etc.). That is the convention in English-speaking countries. To use the opposite convention -- e.g.  <code>"2.525.948,77"</code> -- common in the rest of the World, set the «decimal» parameter to  <code>','</code> (comma) instead of the default <code>'.'</code>. 
 +
 
 +
When using comma for «decimal», it's usual to specify «separator» as semi-colon <code>';'</code> or something else other than comma. If «decimal» and «separator» are both commas, you need to use quotes around any numbers containing decimals.
 +
 
 +
=== «quote» character ===
 +
 
 +
The default quote character to enclose numbers or text that may contain commas, new lines, or other separators is double quote (<code>"</code>). You can specify a different character or text to the «quote» parameter.  The most common other quote character is the single quote ('). To type a single quote in Analytica, you can type doubleQuote SingleQuote doubleQuote (<code>"'"</code>). Analytica lets you use either matching single quotes or matching double quotes to enclose a text string.
  
 
== The CSV format ==
 
== The CSV format ==
  
There is no single well-defined CSV standard. Undocumented conventions used by Excel are often seen as the most definitive "specification". The article at [http://edoceo.com/utilitas/csv-file-format Comma Separated Values (CSV) Standard File Format] is relatively easy to read and covers the basics well, without being the most comprehensive reference.
+
There is no single well-defined CSV standard. Undocumented conventions used by Excel are often treated as the "definitive" specification. The article at [http://edoceo.com/utilitas/csv-file-format Comma Separated Values (CSV) Standard File Format] is a good overview, but not comprehensive.
  
Each "record" (or row) is a CSV file is delineated by a newline, either CR, LF or CRLF.  Each value within a row is separated from the next field by a separator character, usually comma (hence the name "Comma-separated values"), although other characters or character sequences are also sometimes used, including commonly TAB (<code>[[Chr]](9)</code>), bar (<code>'|'</code>), semi-colon (<code>';'</code>) and even space.  
+
Analytica's ParseCSV() and MakeCSV() follows these Excel conventions  (which you can override with parseFlags):
 +
* Each "record" (or row) in a CSV file is delineated by a newline, either CR, LF or CRLF.   
 +
* Each value within a row is separated from the next field by a separator character, usually comma (hence the name "Comma-separated values"), although other characters or character sequences are also sometimes used, including commonly TAB (<code>[[Chr]](9)</code>), bar (<code>'|'</code>), semi-colon (<code>';'</code>) and even a simple space.  
  
 
A row of data might look like this:
 
A row of data might look like this:
 
:<code>California, CA, "39,144,818", 163696, Sacramento, "Edmund Gerald ""Jerry"" Brown, Jr.", ="95814", 9/9/1850</code>
 
:<code>California, CA, "39,144,818", 163696, Sacramento, "Edmund Gerald ""Jerry"" Brown, Jr.", ="95814", 9/9/1850</code>
  
Both textual and numeric values may or may not be surrounded by quotes. Any value containing the separator character or a newline must be surrounded by quotes. Quotes are always the double-quote character (<code>"</code>). A value may also be a date (e.g., <code>9/9/1850</code>).  Spaces to the left or right of a value are normally trimmed (unless you set «textTrim» to false).
+
* Text and numeric values may or may not be surrounded by quotes. Any value containing the separator character or a newline must be surrounded by quotes.  
 
+
* Quotes are always the double-quote character (<code>"</code>).  
When a value is read, if it parses as a valid date, it will be parsed as a date, otherwise if it parses as a valid number, it will parse as a number, and otherwise it will be read as text. The «parseFlags» parameter can be used to alter this behavior somewhat. To force a value to be read as text, even when it would be a valid number, use the equal-quote form, <code>="95814"</code>.
+
* A value may also be a date (e.g., <code>9/9/1850</code>).   
 
+
* If a value parses as a valid date, ParseCSV() treats it as a date, otherwise it tries to parse it as a number. If that fails, it treats it as text. The «parseFlags» parameter can be used to alter this behavior somewhat. T
Note that new lines can appear within quoted values, so simply splitting CSV data on newlines does not reliably separate the data into rows.  
+
* To force a value to be read as text, even when it would be a valid number, use the equal-quote form, <code>="95814"</code>.
 
+
* It trims spaces to the left or right of a value, unless you set «textTrim» to False.
The first line of a CSV file often consists of columns names, with data starting on the second row. But this is not always the case, some CSV files have no column headers with data starting on the first line.  
+
* A new line can appear within a quoted value, so simply splitting CSV data on newlines does not reliably separate the data into rows.  
 +
* The first line of a CSV file often consists of columns names, with data starting on the second row. But, some CSV files have no column headers and the data starts on the first line.
  
 
== Examples ==
 
== Examples ==
Line 63: Line 119:
 
== History ==
 
== History ==
  
This built-in function was introduced in [[Analytica 5.0]]. For earlier releases, the [[Flat-file library]] contains [[ReadCSVFile]] and [[ParseCsvText]] functions. The [[ParseCSV]] is a substantial improvement.
+
[[Analytica 5.3]] introduced the ability to update the columns and rows at the same time you parse the CSV, using the [[ComputedBy]] function.
 +
 
 +
We introduced this function in [[Analytica 5.0]]. For earlier releases, the [[Standard_libraries#Flat_File_Library|Flat-file library]] contains [[ReadCsvFile]] and [[ParseCsvText]] functions. [[ParseCSV]] is a substantial improvement on these.
  
  
 
== See Also ==
 
== See Also ==
  
 +
* [[Parsing and formatting data]]
 
* [[ReadTextFile]]
 
* [[ReadTextFile]]
 +
* [[MakeCSV]]
 +
* [[SplitText]], [[FindInText]]
 +
* [[ParseNumber]]
 +
* [[ParseDate]]
 +
* [[ReadCsvFile]], [[ParseCsvText]] -- these are made obsolete by [[ParseCSV]], but are of use in [[Analytica 4.7]] and earlier.

Latest revision as of 16:04, 27 May 2021


Converts a text value read in from a file with CSV (Comma-Separated Values) format into a 2D array. By default, it converts numbers and dates into Analytica numbers and dates. It offers lots of optional parameters to control the column and row indexes, and handle the many variants of the CSV format -- for example, nonstandard characters that separate columns, rows, and for the quotes around text.

ParseCSV( csvText, columnIndex, rowIndex, separator, firstLineIsHeader, trimCells, columnsToKeep, rowIndexLabelColumn, parseFlags, dateTemplate, decimal, quote )

Converts text «csvText» in a CSV (Comma-Separated Values) format into a two-dimensional array. You usually use it with ReadTextFile() to read the file:

ParseCSV(ReadTextFile( filename ))

It assumes these defaults:

  • The usual conventions for CSV files (like Excel): Comma to separate fields, newline to separate rows, and double quotes (") to enclose elements that may contain commas or newlines.
  • It returns a 2D array with local indexes, .Column and .Row.
  • Local index .Column gets the column headers from the first row.
  • Local index .Row is numbered 1 to n-1, where n is the number of rows.

You can override any of these defaults with these optional parameters:

«columnIndex» and «rowIndex»

Specify «columnIndex» or «rowIndex» to use an existing index instead of the default local indexes .Column or .Row. If you specify a «columnIndex» it treats the first row as data and assumes that the columns are in the correct order. If the first row contains column headers that you want to ignore, perhaps because you want to rename some or all with «columnIndex», set «firstLineIsHeader» to true. If «columnIndex» (or «rowIndex») is shorter than the number of columns (rows) in the original, it ignores the extra columns or rows. If it is longer, it pads the result with NULL for the extra columns (rows.)

You can use ParseCSV to update indexes «columnIndex» and/or «rowIndex» with values from the column or row headers from the CSV. To do this, you define the global «columnIndex» and/or «rowIndex» as ComputedBy the variable or function containing the ParseCSV() call. For example, suppose you define, a variable TableFromCSV as: Variable TableFromCSV := ParseCSV(ReadTextFile('data.csv'), Cols, Rows, firstLineIsHeader: True, rowIndexLabelColumn: 1) Index Cols := ComputedBy(TableFromCSV) Index Rows := ComputedBy(TableFromCSV) It automatically updates Cols and rows with the header values from the csv table, when one of those indexes or TableFromCSV is first computed.

«firstLineIsHeader»

If the CSV has no column headers, set «firstLineIsHeader» to False, so that it treats the first row as data. If you don't specify a «columnIndex», it uses local index .Column containing numbers 1 to M (the number of columns).

«columnsToKeep» to select, reorder, or rename columns

You can select just a single column from «csvText» by specifying «columnsToKeep» as a column name from the first line of the csv file, or as the column number. You should then omit «columnIndex». It returns a vector (1D table) with no column index.

You can also select several columns and/or reorder columns by setting «columnIndex» to an existing index containing only the headers you want in the order you want. If it contains column names from the CSV (in any order), you can specify it also to «columnsToKeep», e.g.,

ParseCSV(Csv, Col, columnsToKeep: Col)

If you want to rename the columns, you can pass «columnsToKeep» a 1D array containing the column names or numbers from the CSV text indexed by «columnIndex» that contains the new names. (Some could be the same and some different.)

If you pass a list or array to «columnsToKeep» it's best to also specify «columnIndex». Without a «columnIndex» it will still extract the specified columns, but less efficiently because array abstraction repeats the call for each element in the array passed to «columnsToKeep», and it parses the «csvText» anew each time.

«rowIndexLabelsColumn» for row labels

By default, the local index .Row contains numbers 1 to the number of rows. But, you can also use a column to use for the labels for the local row index by setting «rowIndexLabelsColumn» to the number or label of that column. For example,

ParseCSV(CsvText, rowIndexLabelsColumn: 1)

treats the first column as row index labels rather than as array cells. It removes the selected column from the result array unless you include that column in «columnIndex» or «columnsToKeep».

Non-comma separator: «separator»

You might expect that CSV ("comma-separated value") format would always use commas to separate values! But, sometimes "CSV" files use another separator character, such as code>'|', 'Chr(9)' (tab), or ';'. It is convenient to choose a separator that never appears within a value, as ',' often does. If the CSV does use comma separators, you must put the «quote» character (default double quote) around the value to avoid confusing the parser.

Spaces around values: «trimCells»

By default it trims away any leading or trailing spaces around each value. Set «trimCells» to False if you want to retain those spaces. It never trims spaces inside quotes.

«parseFlags» to interpret numbers and dates

By default ParseCSV converts each number, whether enclosed in quotes or not, into a number or date, following Excel conventions. It treats any cell that starts with "=" (e.g. an Excel formula) as text. You can change these defaults with «parseFlags». This parameter is a bit-field of flags, which means that you can add flags together to combine their effects:

  • 0 = (default) Return quoted ("52") and unquoted (52) numbers as numbers, and cells that look like dates ('6-May-2016' or '9/4/2020 15:04') as dates.
  • 1 = Don't parse any cells -- e.g. return ="52", "52" and 52 as text.
  • 2 = Don't parse quoted cells -- e.g. treat "52" as text, but 52 as a number.
  • 4 = Disable the '=' prefix (="0042" normally suppresses parsing).
  • 8 = Recognize backslash-escaped quotes to allow quotes inside a quote (in addition to the doubling of quotes, which is the usual CSV standard). For example "He said \"Hello\"." would return the text "He said "Hello"."
  • 16 = Disable generalized date parsing. When you do this, it still recognizes format 6-May-2016 as a date but not other formats. This flag dramatically speeds up parsing times for data with no other date formats.

«dateTemplate» Date Formats

Use the «dateTemplate» parameter to specify the ordering for international dates. Use the letters "d", "M" and "y" to specify the ordering of these components. For example, with «dateTemplate» of "d/M/y", parses "11/10/9" as 11-Oct-2009, but with «dateTemplate» of "y/M/d" it parses it as 9-Oct-2011, and with "M/d/y" it would be 10-Nov-2009.

This parameter uses the same conventions used in the Custom date formats template in the Number Format Dialog, but only the relative ordering of the day, month and year patterns matter. Hence, all these work equivalently: "dMy", "dd/MM/yyyy", "d-M-yy", "d M yyyy". Also, upper/lower case matters! "M" must be uppercase (because "m" means minutes").

«decimal» point and thousands separators

By default, it assumes that numbers look like "2,525,948.77" with a decimal point '.' (dot) to separate the whole and decimal part, and commas ',' to group digits into threes (thousands, millions, billions, etc.). That is the convention in English-speaking countries. To use the opposite convention -- e.g. "2.525.948,77" -- common in the rest of the World, set the «decimal» parameter to ',' (comma) instead of the default '.'.

When using comma for «decimal», it's usual to specify «separator» as semi-colon ';' or something else other than comma. If «decimal» and «separator» are both commas, you need to use quotes around any numbers containing decimals.

«quote» character

The default quote character to enclose numbers or text that may contain commas, new lines, or other separators is double quote ("). You can specify a different character or text to the «quote» parameter. The most common other quote character is the single quote ('). To type a single quote in Analytica, you can type doubleQuote SingleQuote doubleQuote ("'"). Analytica lets you use either matching single quotes or matching double quotes to enclose a text string.

The CSV format

There is no single well-defined CSV standard. Undocumented conventions used by Excel are often treated as the "definitive" specification. The article at Comma Separated Values (CSV) Standard File Format is a good overview, but not comprehensive.

Analytica's ParseCSV() and MakeCSV() follows these Excel conventions (which you can override with parseFlags):

  • Each "record" (or row) in a CSV file is delineated by a newline, either CR, LF or CRLF.
  • Each value within a row is separated from the next field by a separator character, usually comma (hence the name "Comma-separated values"), although other characters or character sequences are also sometimes used, including commonly TAB (Chr(9)), bar ('|'), semi-colon (';') and even a simple space.

A row of data might look like this:

California, CA, "39,144,818", 163696, Sacramento, "Edmund Gerald ""Jerry"" Brown, Jr.", ="95814", 9/9/1850
  • Text and numeric values may or may not be surrounded by quotes. Any value containing the separator character or a newline must be surrounded by quotes.
  • Quotes are always the double-quote character (").
  • A value may also be a date (e.g., 9/9/1850).
  • If a value parses as a valid date, ParseCSV() treats it as a date, otherwise it tries to parse it as a number. If that fails, it treats it as text. The «parseFlags» parameter can be used to alter this behavior somewhat. T
  • To force a value to be read as text, even when it would be a valid number, use the equal-quote form, ="95814".
  • It trims spaces to the left or right of a value, unless you set «textTrim» to False.
  • A new line can appear within a quoted value, so simply splitting CSV data on newlines does not reliably separate the data into rows.
  • The first line of a CSV file often consists of columns names, with data starting on the second row. But, some CSV files have no column headers and the data starts on the first line.

Examples

The simplest usage is:

ParseCSV( csvText )

which follows the most common format and Excel standards.

To read in a CSV file:

ParseCSV( ReadTextFile( filename ) )


History

Analytica 5.3 introduced the ability to update the columns and rows at the same time you parse the CSV, using the ComputedBy function.

We introduced this function in Analytica 5.0. For earlier releases, the Flat-file library contains ReadCsvFile and ParseCsvText functions. ParseCSV is a substantial improvement on these.


See Also

Comments


You are not allowed to post comments.