Text functions

Revision as of 21:31, 31 January 2016 by Max (talk | contribs)

These functions work with text values (sometimes known as strings), available in the built-in Text library.

Asc(t)

Returns the ASCII code (a number between 0 and 255) of the first character in text value «t». This is occasionally useful, for example to understand the alphabetic ordering of text values.

See also Asc(t).

Chr(n)

Returns the character corresponding to the numeric ASCII code «n» (a number between 0 and 255). Chr() and Asc() are inverses of each other, for example:

Chr(65) → 'A', Asc(Chr(65)) → 65
Asc('A') → 65, Chr(Asc('A')) → 'A'

Chr() is useful for creating characters that cannot easily be typed, such as Tab, which is Chr(9) and carriage return (CR), which is Chr(13). For example, if you read in a text file, x, you can use SplitText(x, Chr(13)) to generate an array of lines from a multiline text file.

TextLength(t)

Returns the number of characters in text «t».

See also TextLength().

TextLength('supercalifragilisticexpialidocious') → 34

SelectText(t, m, n)

Returns text containing the «m»'th through the «n»'th character of text «t» (where the first character is «m»=1). If «n» is omitted it returns characters from the «m»'th through the end of «t».

See also SelectText().

SelectText('One or two', 1, 3) → 'One'
SelectText('One or two', 8) → 'two'

FindInText(substr, text, start, case Insensitive, re, return, subpattern, repeat, repeatSubpattern, repeatIndex)

Returns the position of the first occurrence of «substr» within «text», as the number of characters to the first character of text. If it can't find «substr» in text, it returns 0.

Variable People := 'Amy, Betty, Carla'
FindInText('Amy', People) → 1
FindInText('Betty', People) → 6
FindInText('Fred', People) → 0

Optional parameters

CaseInsensitive

FindInText() is case-sensitive unless the optional parameter «caseInsensitive» is true:

FindInText('amy', People) → 0
FindInText('amy', People, caseInsensitive: true) → 1

Start

The optional third parameter, «start», specifies the position to start searching at, for example, if you want to find a second occurrence of «substr» after you have found the first one.

FindInText('i','Supercalifragilisticexpialidocious') → 9
FindInText('i','Supercalifragilisticexpialidocious', 10) → 14

Repeat, RepeatIndex

Normally FindInText() returns information on the first match, but by using any of the three optional repeat parameters can be used to find all matches in text. When it finds multiple matches, the result is an array. If you specifyrepeat: true, the resulthas a local index named .Repeat with elements 1..n.

Alternatively, you can specify a preexisting index as «repeatIndex» parameter. If that index has n elements, it returns only the first n matches.

This example parses XML text, returning an array of ages with a local index named «.Name», where the labels of the local index are the names of each person:

FindInText('<person.*?name = "(?<name>(.*?))".*?>.*?' & '<age>(?<age>.*?)</age>.*?' & '</person>', xmlText, re: true, return: 'S', repeatSubpattern: 'name', subpattern: 'age')

Re (Regular expression), Return, and Subpattern

If you set optional parameter, «re», to True,it interprets «substr» as a regular expression. The regular expression language is widely used (not just in Analytica) to specify match criteria for text. It offers "wild cards" that match any letter, digit, or other character, identify separate words, and a lot more.

The «return» parameter alters what is returned, according to what letter you provide as the parameter:

  • ‘P’ (or ‘Position’): The position of the matching text or subpattern (default)
  • ‘L’ (or ‘Length’): The length of the matching text or subpattern.
  • 'S’ (or ‘Subpattern’): The text matched the regular expression or subpattern.
  • ‘#’ (or ‘#Subpatterns’): The number of subpatterns in the regular expression.

Parentheses within a regular expression denote subpatterns, numbered in a depth first fashion. Subpatterns can also be named using the regular expression format “(?<name>...)”. You can the match information for any subpattern by specifying the subpattern number or subpattern name in the optional «subpattern» parameter.

FindInText('d.*T', 'FindInText', re:1) → 4
FindInText('d.*T', 'FindInText', re: 1, return:['L', 'S']) → [4, 'dInT']
FindInText('(\d\d\d)-(\d\d\d\d)', '650-212-1212', re:1, return:’S’, subpattern:[0, 1, 2]) → ['212-1212', ’212’, ’1212’]
FindInText('a*(?<bcd>b+(c+)(d*))','zyabdaabbcccfd', re:1, subpattern:['bcd', 0, 1, 2, 3]) → [8, 6, 8, 10, 13]
Tip
Consult the Analytica Wiki for more detailed information on using regular expressions. The Wiki contains additional information on regular expression syntax, and far more detail on the more advanced regular expression matching options. In the Analytica Wiki, see pages on Regular Expressions and FindInText.

TextTrim(t, leftOnly, rightOnly, trimChars)

Removes leading and trailing spaces from the text «t».

TextTrim(' Hello World ') → 'Hello World'

Set optional parameter «leftOnly» to True to remove only preceding spaces, or set «rightOnly» to True to remove only followingspaces:

TextTrim(' Hello World ', leftOnly: True) → 'Hello World'
TextTrim(' Hello World ', rightOnly: True) → ' Hello World'

To remove characters other than spaces, specify those characters in a text for the optional «trimChars» parameter:

TextTrim(' [One, Two, Three] ', trimChars: ' []') → 'One, Two, Three'

TextReplace(text, pattern, substr, all, caseInsensitive, re)

Returns text with the first occurrence of «pattern» replaced by «substr».

TextReplace('StringReplace, StringLength', 'String', 'Text') → 'TextReplace, StringLength'

If «all» is True, it returns «text» with all occurrences of text «pattern» replaced by «subst».

TextReplace('StringReplace, StringLength', 'String', 'Text', All: True) → 'TextReplace, TextLength'

Matches are case-sensitive unless «caseInsensitive» is True.

Re (Regular expression)

When the optional «re» parameter is True, it treats «pattern» as a regular expression. In this mode, it replaces the character sequence \0 in «subst» by the matching text, and it replaces \1, \2, ..., \9 by the subtext matched by the corresponding numbered subpattern in the regular expression. The character sequence <name> in «subst» is replaced by the subtext matched to the indicated named subpattern.

TextReplace('Hello world', '\w+', '«\0»', all:True, re: True) → '«Hello» «world»'
TextReplace('Hello world', '(.{1, 7}).*', '\1…', re: True) → 'Hello w…'
TextReplace(text: 'swap first and last', pattern: '(?<first>\w+)(?<mid>.*)(?<last>\b\w+)', subst:'<last><mid><first>', re: True) → 'last first and swap’
TextReplace('swap first and last', '(\w)(\w*)(\w)', '\3\2\1', re: 1, all: 1 ) → 'pwas tirsf dna tasl'

Joining Text: a & b

The & operator joins (concatenates) two text values to form a single text value, for example:

'What is the' & ' number' & '?' → 'What is the number?'

If one or both operands are numbers, it converts them to text using the number format of the variable whose definition contains this function call (or the default suffix format if none is set), for example:

'The number is ' & 10^8 → 'The number is 100M'

This is also useful for converting (or “coercing”) numbers to text.

JoinText(a, i, separator, finalSeparator, default, textForNull)

Returns the elements of array «a» joined together into a single text value over index «i». If elements of «a» are numeric, JoinText() first converts them to text using the number format settings for the variable whose definition contains this function call. For example:

I := ['A', 'B', 'C']
JoinText(I, I) → 'ABC'
A := Array(I, ['VW', 'Honda', 'BMW'])
B := Array(I, ['VW', Null, 'BMW'])
JoinText(A, I) → 'VWHondaBMW'

If the optional parameter «separator» is specified, it is inserted as a separator between successive elements, for example:

JoinText(A, I, ', ') → 'VW, Honda, BMW'

The optional parameter «finalSeparator», if present, specifies a different separator between the second-to-last and last elements of «a».

JoinText(A, I, '; ', '; and') → 'VW; Honda; and BMW'

Null values in «a» are ignored unless the optional parameter «textForNull» is specified.

JoinText(B, I, ', ') → 'one, two'
JoinText(B, I, ', ', textForNull: ) → 'one, , two'
JoinText(B, I, ', ' , textForNull: 'NULL') → 'one, NULL, two'

The optional «default» parameter is returned when all values are ignored, or «a» has a zero length.

JoinText([Null, Null, Null], default: Null) → «null»

SplitText(text, separator, caseInsensitive, re)

Returns a list of text values formed by splitting the elements of text value text at each occurrence of separator «separator». For example:

SplitText('VW, Honda, BMW', ', ') → ['VW', 'Honda', 'BMW']

SplitText() is the inverse of JoinText(), if you use the same separators. For example:

Var x := SplitText('Humpty Dumpty sat on a wall.', ' ') → ['Humpty', 'Dumpty', 'sat', 'on', 'a', 'wall.']
JoinText(x, , ' ') → 'Humpty Dumpty sat on a wall.'

When «separator» contains letters, setting «caseInsensitive» to True matches in a lower/uppercase-insensitive manner. When the «re» parameter is True, separator is interpreted as a Perl-compatible regular expression.

Variable s := 'Yes, Virginia. There is a Santa Claus!'
SplitText(s, '[\s, \.!]+', re: 1) → ['Yes', 'Virginia', 'There', 'is', 'a', 'Santa', 'Claus', ]
SplitText(TextTrim(s, trimChars: ' , .!'), '[\s, \.!]+', re:1) → ['Yes', 'Virginia', 'There', 'is', 'a', 'Santa', 'Claus']
Tip
With SplitText(), «text» must be a single text value, not an array. Otherwise, it might generate an array of arrays of different length. See Functions expecting atomic parameters on what to do if you want apply it to an array.

TextLowerCase(t)

TextLowerCase() returns the text «t» with all letters as lowercase. For example:

TextLowerCase('What does XML mean?') → 'what does xml mean?'

TextUpperCase(t)

TextUpperCase() returns the text «t» with all letters as uppercase. For example:

TextUpperCase('What does XML mean?') → 'WHAT DOES XML MEAN?'

TextSentenceCase(Text, preserveUC)

TextSentenceCase() returns the text «t» with the first character (if a letter) as uppercase, and any other letters as lowercase. For example:

TextSentenceCase('mary ann FRED Maylene') → 'Mary ann fred maylene'
TextSentenceCase(SplitText('mary ann FRED Maylene', ' ')) → ['Mary', 'Ann', 'Fred', 'Maylene']
TextSentenceCase('they are Fred and Maylene', true) → 'They are Fred and Maylene'

NumberToText(x, format)

NumberToText() converts a number to text using the format specified in the second parameter. The possible formats are: 'Suffix', 'Exponential', 'Fixed Point', 'Integer', 'Percent', 'Date', 'Boolean', or 'Hexadecimal'. For conciseness, the first letter of each may be used:

NumberToText(3.45M, ['S', 'E', 'F', 'I', 'H']) → ['3.45M', '3.45.e+006', '3450000', '3450000', '0x34a490']
NumberToText(0.0012, ['S', 'E', 'F', 'P']) → ['1.2m', '1.2e-003', '0', 0.12%']

Several optional parameters control aspects of each format. The parameter digits specifies the precision for Suffix and Exponential formats, and the digits to the right of the decimal for Fixed Point and Percent formats.

NumberToText(Pi, 'Suffix', digits: 5) → '3.1416'
NumberToText(Pi, 'Fixed Point', digits: 5) → '3.14159'
NumberToText(Pi, 'Percent', digits: 5) → '314.15927%'

You can specify «showZeros» as true to force the inclusion of trailing zeros.

NumberToText(1/4, 'Percent', digits: 2, showZeros: True) → '25.00%'

To include thousands separators, pass True to the «thousandsSeparators» parameter.

NumberToText(7^12,'I', thousandsSeparators: True) → '13,841, 287, 201'

The «currency» parameter specifies a template that includes the currency symbol and its placement relative to the number and the minus sign.

NumberToText(-372, 'F', currency: ['-£#', 'US$-#', '#£-', '($#)']) → ['-£372', 'US$-372', '372£-', '($372)']

When converting date-time numbers, the «dateFormat» parameter provides the date format template.

NumberToText(Today(), dateFormat: 'yyyy MMMM dd (wwww)') → '2013 July 01 (Monday)'

There are places, such as when exporting data, where you want to make sure that the numbers are printed at full precision. When the «fullPrecision» parameter is true, extra digits are included when necessary to ensure that the full precision of the number is output.

NumberToText(Pi, fullPrecision: True) → '3.141592653589793'

ParseNumber(text, badVal)

Parses a text value into a number. Dates are not parsed by ParseNumber() (use ParseDate() for dates). The result is independent of the number format setting. Values that are already numeric are returned. The optional «badVal» parameter specifies the value returned when text is unparseable, which defaults to Null. The usage ParseNumber(x, x) can be used when «x» is an array and you want to pass unparseable entries through.

ParseNumber('12.43K') → 12.43K
ParseNumber('hello') → «null»
ParseNumber(14.3) → 14.3
Var x := ['3, 214', 14, 'foo'] Do ParseNumber(x, x) → [ 3214, 14, 'foo']

See Also


Comments


You are not allowed to post comments.