TextDistance

Revision as of 03:12, 1 March 2017 by Lchrisman (talk | contribs) (First cut)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


( new to Analytica 5.0 )

TextDistance( t1, t2, substitution, deletion, insertion, transpose, default, L1, L2 )

Returns the measure of how different text «t1» is from text «t2», which is called the lexical distance or edit distance between the two texts.

With no optional parameters specified, the result of TextDistance(t1,t2) is known as the Levenshtein distance, which is the minimum number of character insertions, deletions or substitutions required to change «t1» into «t2». The result is 0 when «t1» and «t2» are equal, or the number of edit steps otherwise.

The number of edit to change «t1» into «t2» when only character insertions and deletions (i.e., substitutions are not allowed) are allowed is given by

TextDistance(t1, t2, substitution:false )

The Hamming distance between two strings of equal length, equal to the number of positions at which the symbols are different, is given by

TextDistance(t1, t2, insertion:false, deletion:false, substitution:true )

The Damerau-Levenshtein distance, which also allows transpositions of adjacent characters, is obtained using

TextDistance(t1, t2, transpose:true )

The length of the Longest common subsequence is obtained using

(TextLength(t1&t2) - TextDistance(t1, t2, substutition:false) ) / 2

The longest common subsequence is the longest ordered sequence of characters, not necessarily adjacent, found in both «t1» and «t2».

Other combinations are possible by specifying True or False for the optional parameters:

  • «substitution»: (Default: 1) Set to 0 to disable character substitutions as an edit operation, or 1 to enable.
  • «deletion»: (Default: 1) Set to 0 to disable character deletions, or 1 to enable.
  • «insertion»: (Default: 1) Set to 0 to disable character insertions, or 1 to enable.
  • «transpose»: (Default: 0) Set to 0 to disable transposition of adjacent characters, 1 to enable. When enabled, minimal edit distance is guaranteed only when «deletion» and «insertion» are also enabled.

With some combinations, it might not be possible to transform «t1» into «t2» -- for example when the lengths are different and only character substitutions are allowed. In such a case, Inf is returned.

Non-equal costs for edit operations

Examples

See Also

Comments


You are not allowed to post comments.