소스 검색

HPCC-13883 ulUnicodeLocaleEditDistanceWithinRadius can access uninitialized data

ulUnicodeLocaleEditDistanceWithinRadius implements an optimized version of
LevenshteinDistance algorithm which works correctly only when difference in
the number of characters of strings being compared does not exceed the radius.
Current implementation of ulUnicodeLocaleEditDistanceWithinRadius does not
enforce that restriction when Locale is provided, and enforces it incorrectly
when Locale is not provided.

Add code to correctly enforce the above restriction for
ulUnicodeLocaleEditDistanceWithinRadius with and without Locale.


Signed-off-by: Edin Muharemagic <edin.muharemagic@lexisnexis.com>
Edin Muharemagic 10 년 전
부모
커밋
8f9b9f048d
1개의 변경된 파일5개의 추가작업 그리고 8개의 파일을 삭제
  1. 5 8
      plugins/unicodelib/unicodelib.cpp

+ 5 - 8
plugins/unicodelib/unicodelib.cpp

@@ -614,14 +614,6 @@ unsigned unicodeEditDistanceV4(UnicodeString & left, UnicodeString & right, unsi
     unsigned leftLen = left.length();
     unsigned rightLen = right.length();
 
-    // this shortcut is not applicable in the bi mode because unicode characters could take more than 2 UChars
-    if (!bi)
-    {
-        unsigned minED = (leftLen < rightLen)? rightLen - leftLen: leftLen - rightLen;
-        if (minED > radius)
-            return minED>255?255:minED;
-    }
-
     if (leftLen > 255)
         leftLen = 255;
 
@@ -640,9 +632,14 @@ unsigned unicodeEditDistanceV4(UnicodeString & left, UnicodeString & right, unsi
     if (leftCs.isInvalid() || rightCs.isInvalid())
         return DISTANCE_ON_ERROR;
 
+    // get Unicode character lengths
     leftLen = leftCs.length();
     rightLen = rightCs.length();
 
+    unsigned minED = (leftLen < rightLen)? rightLen - leftLen: leftLen - rightLen;
+    if (minED > radius)
+        return minED;
+
     /*
     This function applies two optimizations over the function above.
     a) Adding a character (next row) can at most decrease the edit distance by 1, so short circuit when