TextDistance
( new to Analytica 5.0 )
TextDistance( t1, t2, substitution, deletion, insertion, transpose, default, L1, L2 )
Returns the measure of how different text «t1» is from text «t2», which is called the lexical distance or edit distance between the two texts.
With no optional parameters specified, the result of TextDistance(t1,t2)
is known as the Levenshtein distance, which is the minimum number of character insertions, deletions or substitutions required to change «t1» into «t2». The result is 0 when «t1» and «t2» are equal, or the number of edit steps otherwise.
The number of edit to change «t1» into «t2» when only character insertions and deletions (i.e., substitutions are not allowed) are allowed is given by
TextDistance(t1, t2, substitution:false )
The Hamming distance between two strings of equal length, equal to the number of positions at which the symbols are different, is given by
TextDistance(t1, t2, insertion:false, deletion:false, substitution:true )
The Damerau-Levenshtein distance, which also allows transpositions of adjacent characters, is obtained using
TextDistance(t1, t2, transpose:true )
The length of the Longest common subsequence is obtained using
(TextLength(t1&t2) - TextDistance(t1, t2, substutition:false) ) / 2
The longest common subsequence is the longest ordered sequence of characters, not necessarily adjacent, found in both «t1» and «t2».
Other combinations are possible by specifying True
or False
for the optional parameters:
- «substitution»: (Default: 1) Set to 0 to disable character substitutions as an edit operation, or 1 to enable.
- «deletion»: (Default: 1) Set to 0 to disable character deletions, or 1 to enable.
- «insertion»: (Default: 1) Set to 0 to disable character insertions, or 1 to enable.
- «transpose»: (Default: 0) Set to 0 to disable transposition of adjacent characters, 1 to enable. When enabled, minimal edit distance is guaranteed only when «deletion» and «insertion» are also enabled.
With some combinations, it might not be possible to transform «t1» into «t2» -- for example when the lengths are different and only character substitutions are allowed. In such a case, Inf is returned.
Enable comment auto-refresher