Text functions

Analytica User GuideText, Date, Math, and Financial FunctionsText functions

These functions work with text values (sometimes known as strings), available in the built-in Text library.

Asc(t)

Returns the numeric character code of the first character in text value «t». This is occasionally useful, for example to understand the alphabetic ordering of text values. Unicode characters may have values greater that 255.

Asc('d') → 100

Asc('δ') → 948 { or, in hex, 0x3b4 }

Chr(n)

Returns the character corresponding to the numeric unicode code «n». Chr() and Asc() are inverses of each other, for example:

Chr(65) → 'A', Asc(Chr(65)) → 65

Asc('A') → 65, Chr(Asc('A')) → 'A'

Chr() is useful for creating characters that cannot easily be typed, such as Tab, which is Chr(9) and carriage return (CR), which is Chr(13). For example, if you read in a text file, x, you can use SplitText(x, Chr(13)) to generate an array of lines from a multiline text file.

TextLength(t)

Returns the number of characters in text «t».

SelectText(t, m, n)

Returns text containing the «m»'th through the «n»'th character of text «t» (where the first character is «m»=1). If you omit «n» it returns characters from the «m»'th through the end of «t».

FindInText(substr, text, start, caseInsensitive, re, return, subpattern, repeat, repeatSubpattern, repeatIndex)

Returns the position of the first occurrence of «substr» within «text», as the number of characters to the first character of text. If it can't find «substr» in text, it returns 0.

Variable People := 'Amy, Betty, Carla'

FindInText('Amy', People) → 1

FindInText('Betty', People) → 6

FindInText('Fred', People) → 0

Optional parameters

CaseInsensitive

FindInText() is case-sensitive unless the optional parameter «caseInsensitive» is true:

FindInText('amy', People) → 0

FindInText('amy', People, caseInsensitive: true) → 1

Start

The optional third parameter, «start», specifies the position to start searching at, for example, if you want to find a second occurrence of «substr» after you have found the first one.

FindInText('i','Supercalifragilisticexpialidocious') → 9

FindInText('i','Supercalifragilisticexpialidocious', 10) → 14

Repeat, RepeatIndex

Normally FindInText() returns information on the first match, but by using any of the three optional repeat parameters can be used to find all matches in text. When it finds multiple matches, the result is an array. If you specifyrepeat: true, the resulthas a local index named .Repeat with elements 1..n.

Alternatively, you can specify a preexisting index as «repeatIndex» parameter. If that index has n elements, it returns only the first n matches.

This example parses XML text, returning an array of ages with a local index named «.Name», where the labels of the local index are the names of each person:

FindInText('<person.*?name = "(?<name>(.*?))".*?>.*?' & '<age>(?<age>.*?)</age>.*?' & '</person>', xmlText, re: true, return: 'S', repeatSubpattern: 'name', subpattern: 'age')

Re (Regular expression), Return, and Subpattern

If you set optional parameter, «re», to True,it interprets «substr» as a regular expression. The regular expression language is widely used (not just in Analytica) to specify match criteria for text. It offers "wild cards" that match any letter, digit, or other character, identify separate words, and a lot more. See Regular expressions for details.

Parentheses within a regular expression denote subpatterns, numbered in a depth first fashion. Subpatterns can also be named using the regular expression format “(?<name>...)”. You can the match information for any subpattern by specifying the subpattern number or subpattern name in the optional «subpattern» parameter.

FindInText('d.*T', 'FindInText', re:1) → 4

FindInText('d.*T', 'FindInText', re: 1, return:['L', 'S']) → [4, 'dInT']

FindInText('(\d\d\d)-(\d\d\d\d)', '650-212-1212', re:1, return:'S',  subpattern:[0, 1, 2]) → ['212-1212', ’212’, ’1212’]

FindInText('a*(?<bcd>b+(c+)(d*))','zyabdaabbcccfd', re:1, subpattern:['bcd', 0, 1, 2, 3]) → [8, 6, 8, 10, 13]

The «return» parameter alters what is returned, according to what letter you provide as the parameter:

‘P’ (or ‘Position’): The position of the matching text or subpattern (default)
‘L’ (or ‘Length’): The length of the matching text or subpattern.
'S’ (or ‘Subpattern’): The text matched the regular expression or subpattern.
‘#’ (or ‘#Subpatterns’): The number of subpatterns in the regular expression.

TextTrim(t, leftOnly, rightOnly, trimChars)

Removes leading and trailing spaces from the text «t».

TextTrim(' Hello World ') → 'Hello World'

Set optional parameter «leftOnly» to True to remove only preceding spaces, or set «rightOnly» to True to remove only followingspaces:

TextTrim(' Hello World ', leftOnly: True) → 'Hello World'

TextTrim(' Hello World ', rightOnly: True) → ' Hello World'

To remove characters other than spaces, specify those characters in a text for the optional «trimChars» parameter:

TextTrim(' [One, Two, Three] ', trimChars: ' []') → 'One, Two, Three'

TextReplace(text, pattern, substr, all, caseInsensitive, re)

Returns text with the first occurrence of «pattern» replaced by «substr».

TextReplace('StringReplace, StringLength', 'String', 'Text') → 'TextReplace, StringLength'

If «all» is True, it returns «text» with all occurrences of text «pattern» replaced by «subst».

TextReplace('StringReplace, StringLength', 'String', 'Text', All: True) → 'TextReplace, TextLength'

Matches are case-sensitive unless «caseInsensitive» is True.

Re (Regular expression)

When the optional «re» parameter is True, it treats «pattern» as a regular expression. In this mode, it replaces the character sequence \0 in «subst» by the matching text, and it replaces \1, \2, ..., \9 by the subtext matched by the corresponding numbered subpattern in the regular expression. The character sequence <name> in «subst» is replaced by the subtext matched to the indicated named subpattern.

TextReplace('Hello world', '\w+', '«\0»', all:True, re: True) → '«Hello» «world»'

TextReplace('Hello world', '(.{1, 7}).*', '\1…', re: True) → 'Hello w…'

TextReplace(text: 'swap first and last', pattern: '(?<first>\w+)(?<mid>.*)(?<last>\b\w+)', subst:'<last><mid><first>', re: True) → 'last first and swap’

TextReplace('swap first and last', '(\w)(\w*)(\w)', '\3\2\1', re: 1, all: 1 ) → 'pwas tirsf dna tasl'

Joining Text: a & b

The & operator joins (concatenates) two text values to form a single text value, for example:

'What is the' & ' number' & '?' → 'What is the number?'

If one or both operands are numbers, it converts them to text using the number format of the variable whose definition contains this function call (or the default suffix format if none is set), for example:

'The number is ' & 10^8 → 'The number is 100M'

This is also useful for converting (or “coercing”) numbers to text. The NumberToText function is also useful for converting numbers to text when you want to specify the number format explicitly.

The result of concatenation of text with the special value Null changed in the Analytica 5.0 release:

'The value is ' & Null → 'The value is Null' {in Analytica 5.0 and later}

'The value is ' & Null → «null» {in Analytica 4.6 and earlier}

JoinText(a, i, separator, finalSeparator, default, textForNull)

Returns the elements of array «a» joined together into a single text value over index «i». If elements of «a» are numeric, JoinText() first converts them to text using the number format settings for the variable whose definition contains this function call. For example:

I := ['A', 'B', 'C']

JoinText(I, I) → 'ABC'

A := Array(I, ['VW', 'Honda', 'BMW'])

B := Array(I, ['one', Null, 'two'])

JoinText(A, I) → 'VWHondaBMW'

If the optional parameter «separator» is specified, it is inserted as a separator between successive elements, for example:

JoinText(A, I, ', ') → 'VW, Honda, BMW'

The optional parameter «finalSeparator», if present, specifies a different separator between the second-to-last and last elements of «a».

JoinText(A, I, '; ', '; and ') → 'VW; Honda; and BMW'

Null values in «a» are ignored unless the optional parameter «textForNull» is specified.

JoinText(B, I, ', ') → 'one, two'

JoinText(B, I, ', ', textForNull: ) → 'one, , two'

JoinText(B, I, ', ' , textForNull: 'NULL') → 'one, NULL, two'

The optional «default» parameter is returned when all values are ignored, or «a» has a zero length.

JoinText([Null, Null, Null], default: Null) → «null»

SplitText(text, separator, caseInsensitive, re)

Returns a list of text values formed by splitting the elements of text value text at each occurrence of separator «separator». For example:

SplitText('VW, Honda, BMW', ', ') → ['VW', 'Honda', 'BMW']

SplitText() is the inverse of JoinText(), if you use the same separators. For example:

Var x := SplitText('Humpty Dumpty sat on a wall.', ' ') → ['Humpty', 'Dumpty', 'sat', 'on', 'a', 'wall.']

JoinText(x, , ' ') → 'Humpty Dumpty sat on a wall.'

When «separator» contains letters, setting «caseInsensitive» to True matches in a lower/uppercase-insensitive manner. When the «re» parameter is True, separator is interpreted as a Perl-compatible regular expression.

Variable s := 'Yes, Virginia. There is a Santa Claus!'

SplitText(s, '[\s, \.!]+', re: 1) → ['Yes', 'Virginia', 'There', 'is', 'a', 'Santa', 'Claus', ]

SplitText(TextTrim(s, trimChars: ' , .!'), '[\s, \.!]+', re:1) → ['Yes', 'Virginia', 'There', 'is', 'a', 'Santa', 'Claus']

Tip

With SplitText(), «text» must be a single text value, not an array. Otherwise, it might generate an array of arrays of different length. See Functions expecting atomic parameters on what to do if you want apply it to an array.

TextLowerCase(t)

TextLowerCase() returns the text «t» with all letters as lowercase. For example:

TextLowerCase('What does XML mean?') → 'what does xml mean?'

TextUpperCase(t)

TextUpperCase() returns the text «t» with all letters as uppercase. For example:

TextUpperCase('What does XML mean?') → 'WHAT DOES XML MEAN?'

TextSentenceCase(Text, preserveUC)

TextSentenceCase() returns the text «t» with the first character (if a letter) as uppercase, and any other letters as lowercase. For example:

TextSentenceCase('mary ann FRED Maylene') → 'Mary ann fred maylene'

TextSentenceCase(SplitText('mary ann FRED Maylene', ' ')) → ['Mary', 'Ann', 'Fred', 'Maylene']

TextSentenceCase('they are Fred and Maylene', true) → 'They are Fred and Maylene'

NumberToText(x, format)

NumberToText() converts number «x» to text using the specified «format». It provides all the settings and options available in the Number formats dialog for displaying a number (including a date number).

Possible values for «format» are: 'Suffix', 'Exponential', 'Fixed Point', 'Integer', 'Percent', 'Date', 'Boolean', or 'Hexadecimal'. You can just use the first letter of a format to keep it brief:

NumberToText(3.45M, ['S', 'E', 'F', 'I', 'H']) → ['3.45M', '3.45.e+006', '3450000', '3450000', '0x34a490']

NumberToText(0.0012, ['S', 'E', 'F', 'P']) → ['1.2m', '1.2e-003', '0', 0.12%']

NumberToText() offers these optional parameters:

«digits» specifies the precision for Suffix and Exponential formats, and the digits to the right of the decimal for Fixed Point and Percent formats, for example:

NumberToText(Pi, 'Suffix', digits: 5) → '3.1416'

NumberToText(Pi, 'Fixed Point', digits: 5) → '3.14159'

NumberToText(Pi, 'Percent', digits: 5) → '314.15927%'

«showZeros» forces the inclusion of trailing zeros:

NumberToText(1/4, 'Percent', digits: 2, showZeros: True) → '25.00%'

Set «thousandsSeparators» to True to separate digits into groups of three:

NumberToText(7^12,'I', thousandsSeparators: True) → '13,841,287,201'

«currency» specifies a template with the currency symbol and its placement relative to the number and the minus sign.

NumberToText(-372, 'F', currency: ['-£#', 'US$-#', '#£-', '($#)']) → ['-£372', 'US$-372', '372£-', '($372)']

«dateFormat» provides a date format template for converting date numbers to text:

NumberToText(Today(), dateFormat: 'yyyy MMMM dd (wwww)') → '2013 July 01 (Monday)'

«fullPrecision»: Set this to true, to include all digits to ensure that the full precision of the number.

NumberToText(Pi, fullPrecision: True) → '3.141592653589793'

ParseNumber(text, badVal)

Parses a text value into a number. (For dates, use ParseDate()). The result is independent of the number format setting. Values that are already numeric are returned. If the number is unparseable, it returns Null, unless you specify a different value to the optional parameter «badVal».

ParseNumber('12.43K') → 12.43K

ParseNumber('hello') → «null»

ParseNumber(14.3) → 14.3

If you use ParseNumber(x, x) it will simply return any unparseable text values in «x» as the original text value:

VAR x := ['3, 214', 14, 'foo'] DO ParseNumber(x, x) → [ 3214, 14, 'foo']

TextCharacterEncode(type,text)

(New to Analytica 5.0 ) Transforms «text» into an alternative text encoding specified by «type». The text can be URL encoded for inclusion in a URL or URL decoded, XML encoded, put into one of the four normalized unicode forms, or converted to or from a UTF-8 encoding. See TextCharacterEncode.

TextCharacterEncode('UTF-8', 'γσh') → 'Î³Ïh'

TextCharacterDecode('-UTF-8','Î³Ïh') → 'γσh'

TextCharacterEncode('URL','3:4 = 3/4 & 1/2 = 0.5') → '3%3A4+%3D+3%2F4+%26+1%2F2+%3D+0.5'

TextDistance(t1,t2)

(New to Analytica 5.0 ) Returns a distance measure indicating how different two text values are. The result is the number of edit steps (substitutions, deletions, insertions, etc.) are required to transform «t1» into «t2». Many optional parameters can be used to specify which editing operations are allowed (see TextDistance()), allowing various standard distance measures including Levenshtein distance, Demerau-Levenshtein distance, Hamming distance, and Longest common subsequence among others.

TextDistance("portland", "orlando") → 3

Note: "portland" → "ortland" → "orland" → "orlando"

Data file parsing and creation functions

When you exchange data with other applications (with Analytica Enterprise edition or better), you may need to parse input data, or format data for output. Commonly used formats include CSV (which stands for Comma Separated Values, but includes other separators such as tabs), XML and JSON. These functions parse or create data in these formats:

ParseCSV and MakeCSV
ParseJSON and MakeJSON
The Microsoft XML DOM parser. See Parsing an XML file. This uses COM Integration.

ParseCSV( ReadTextFile( "MyData.csv" ) ) → { 2-D array indexed by local indexes .Column and .Row}