TextCharacterEncode

Revision as of 01:47, 28 February 2017 by Lchrisman (talk | contribs)

(new to Analytica 5.0)

TextCharacterEncode( type, text )

Converts «text» into a special encoded or unencoded form according to «type». Possible values for «type» include these:

  • For encoding or decoding URLs: 'URL', 'IRI', 'URL%', '-URL'
  • For encoding XML or HTML: 'XML', '-XML'
  • For UTF-8 encodings: 'UTF-8', '-UTF-8'
  • For Unicode normalized forms: 'NFC', 'NFD', 'NFKC', 'NFKD'.

Note that a convention is used that «type» options starting with a minus invert the respective encoding.

Encoding text for inclusion in a URL

«Type» options 'URL', 'IRI' and 'URL%' encode data for inclusion on a URL. The option '-URL' decodes URL data.

Data is often passed in the query string portion of a URL, such as "John Doe" in the following URL:

http://acme.com/somePage?name=John+Doe

Notice that the space has been converted to a '+' before inserting it in the URL. The special characters ""!*'();:@&=+$,/?#[]% have special meaning in a URL and must be converted into text that does not involve those characters. The «type» value 'URL' encodes data according to the RFC-3986 standard.

If you ever need to pass a URL as a data item in another URL, then all its special characters need to be encoded so they aren't interpreted as part of the outer URL.

This same encoding appears in other standards as well, including in the standard for submitting form data in HTTP, and in the JSON standard, among others.

When using TextCharacterEncode('URL', text), your text should only encode the value that will be placed after an equal sign in the query, but nothing more. For example, you should write:

'http://acme.com/somePage?name=' & TextCharacterEncode( 'URL', 'John Doe' )

and not

TextCharacterEncode( 'URL', 'http://acme.com/somePage?name=John Doe' )

since in the latter case the characters ? =:/, etc. will all be encoded, which you don't want.

A problem with the 'URL' encoding is that all characters except the letters, digits, and -._~ are percent encoded, making URLs extremely non-readable, especially for non-English sites. The 'IRI' option (International Resource Identifier) preserves all but the reserved characters ("!*'();:@&=+$,/?#[]%"), which generally still works correctly for URLs.

The standard URL encoding changes space to a plus character. The 'URL%' option uses percent encoding for space (%20) instead.

The type option '-URL' converts the URL-encoded text back into the original text. It works for any of the encodings 'URL', 'IRI' and 'URL%'.

Examples

TextCharacterEncode('URL', '(1+2) = 3') → "%281%2B2%29+%3D+3"
TextCharacterEncode('URL%', '(1+2) = 3') → "%281%2B2%29%20%3D%203"
TextCharacterEncode('-URL','%281%2B2%29+%3D+3') → "(1+2) = 3"
TextCharacterEncode('-URL','%281%2B2%29%20%3D%203') → "(1+2) = 3"
TextCharacterEncode('URL', 'test@中文.com') → "test%40%E4%B8%AD%E6%96%87.com"
TextCharacterEncode('IRI', 'test@中文.com') → "test%40中文.com"
Variable email := "John_Doe@yahoo.com"
Variable website := "http://acme.com?name=johnDoe&type=student"
Variable cityToFind = "San Francisco, CA"
Variable UrlToRead := "http://dataSource.com/query?email=" & TextCharacterEncode( 'URL', email ) & "&site=" & TextCharacterEncode('URL', website) & "&city=" & TextCharacterEncode('URL', cityToFind)
UrlToRead"http://dataSource.com/query?email=John_Doe%40yahoo.com&site=http%3A%2F%2Facme.com%3Fname%3DjohnDoe%26type%3Dstudent&city=San+Francisco%2C+CA"

Encoding text in XML or HTML

The option 'XML' for «type» encodes data for insertion in XML or HTML. Without this encoding, the XML or HTML parser will attempt to interpret special characters such as '<', '>', '&', quotes. Also, a few characters falling in control ranges (below ascii 32 or between ascii 128 and 159) will be automatically converted to entities as required be the standards.

The '-XML' does the inverse decoding.

= Examples

Important note: The following examples are not displaying correctly yet. The

TextCharacterEncode( 'XML', 'One < Two, Three & Four are "Bigger"' ) → "One &lt; Two, &lt;b&gt;Three &amp; Four&lt;/b&gt; are &quot;Bigger&quot;"
TextCharacterEncode('-XML', One &lt; Two, &lt;b&gt;Three &amp; Four&lt;/b&gt; are &quot;Bigger&quot;' ) → 'One < Two, Three & Four are "Bigger"'

UTF-8 encoding

Unicode normalization

See Also

Comments


You are not allowed to post comments.