Browse Source

HPCC-13883 ulUnicodeLocaleEditDistanceWithinRadius can access uninitialized data

ulUnicodeLocaleEditDistanceWithinRadius implements an optimized version of
LevenshteinDistance algorithm which works correctly only when difference in
the number of characters of strings being compared does not exceed the radius.
Current implementation of ulUnicodeLocaleEditDistanceWithinRadius does not
enforce that restriction when Locale is provided, and enforces it incorrectly
when Locale is not provided.

Add code to correctly enforce the above restriction for
ulUnicodeLocaleEditDistanceWithinRadius with and without Locale.


Signed-off-by: Edin Muharemagic <edin.muharemagic@lexisnexis.com>
Edin Muharemagic 10 years ago
parent
commit
8f9b9f048d
1 changed files with 5 additions and 8 deletions
  1. 5 8
      plugins/unicodelib/unicodelib.cpp

+ 5 - 8
plugins/unicodelib/unicodelib.cpp

@@ -614,14 +614,6 @@ unsigned unicodeEditDistanceV4(UnicodeString & left, UnicodeString & right, unsi
     unsigned leftLen = left.length();
     unsigned rightLen = right.length();
 
-    // this shortcut is not applicable in the bi mode because unicode characters could take more than 2 UChars
-    if (!bi)
-    {
-        unsigned minED = (leftLen < rightLen)? rightLen - leftLen: leftLen - rightLen;
-        if (minED > radius)
-            return minED>255?255:minED;
-    }
-
     if (leftLen > 255)
         leftLen = 255;
 
@@ -640,9 +632,14 @@ unsigned unicodeEditDistanceV4(UnicodeString & left, UnicodeString & right, unsi
     if (leftCs.isInvalid() || rightCs.isInvalid())
         return DISTANCE_ON_ERROR;
 
+    // get Unicode character lengths
     leftLen = leftCs.length();
     rightLen = rightCs.length();
 
+    unsigned minED = (leftLen < rightLen)? rightLen - leftLen: leftLen - rightLen;
+    if (minED > radius)
+        return minED;
+
     /*
     This function applies two optimizations over the function above.
     a) Adding a character (next row) can at most decrease the edit distance by 1, so short circuit when