Browse Source

Merge pull request #11284 from JamesDeFabia/HPCC-18345UincodeImplementations

HPCC-18345 Document new Unicode methods in Std library

Reviewed-by: Gavin Halliday <ghalliday@hpccsystems.com>
Gavin Halliday 7 years ago
parent
commit
4acbdcb261

+ 19 - 2
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/CountWords.xml

@@ -10,8 +10,15 @@
       <primary>Str.CountWords</primary>
     </indexterm><indexterm>
       <primary>CountWords</primary>
-    </indexterm>(</emphasis> <emphasis>source, separator </emphasis><emphasis
-  role="bold">)</emphasis></para>
+    </indexterm>(</emphasis> <emphasis>source, separator, [allow_blank]
+  </emphasis><emphasis role="bold">)</emphasis></para>
+
+  <para><emphasis role="bold">STD.Uni.CountWords<indexterm>
+      <primary>STD.Uni.CountWords</primary>
+    </indexterm><indexterm>
+      <primary>Uni.CountWords</primary>
+    </indexterm>(</emphasis> <emphasis>source, separator, [allow_blank]
+  </emphasis><emphasis role="bold">)</emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
@@ -33,6 +40,13 @@
         </row>
 
         <row>
+          <entry><emphasis>allow_blank</emphasis></entry>
+
+          <entry>Optional, A BOOLEAN value indicating if empty/blank string
+          items are included in the results. Defaults to FALSE</entry>
+        </row>
+
+        <row>
           <entry>Return:</entry>
 
           <entry>CountWords returns an integer value.</entry>
@@ -45,6 +59,9 @@
   number of words in the <emphasis>source</emphasis> string based on the
   specified <emphasis>separator</emphasis>.</para>
 
+  <para>Words are separated by one or more separator strings. No spaces are
+  stripped from either string before matching.</para>
+
   <para>Example:</para>
 
   <programlisting format="linespecific">IMPORT Std;

+ 28 - 5
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/EndsWith.xml

@@ -10,9 +10,15 @@
       <primary>Str.EndsWith</primary>
     </indexterm><indexterm>
       <primary>EndsWith</primary>
-    </indexterm>(</emphasis> <emphasis>source, suffix</emphasis>
-  <emphasis role="bold">)</emphasis></para>
+    </indexterm>(</emphasis> <emphasis>src, suffix</emphasis> <emphasis
+  role="bold">)</emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.EndsWith<indexterm>
+      <primary>STD.Uni.EndsWith</primary>
+    </indexterm><indexterm>
+      <primary>Uni.EndsWith</primary>
+    </indexterm>(</emphasis> <emphasis>src, suffix, form</emphasis> <emphasis
+  role="bold">)</emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
@@ -22,7 +28,7 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>The string to search.</entry>
         </row>
@@ -34,6 +40,13 @@
         </row>
 
         <row>
+          <entry><emphasis>form</emphasis></entry>
+
+          <entry>The type of Unicode normalization to be employed. (NFC, NFD,
+          NFKC, or NFKD)</entry>
+        </row>
+
+        <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
           <entry>EndsWith returns a BOOLEAN value.</entry>
@@ -42,10 +55,20 @@
     </tgroup>
   </informaltable>
 
-  <para>The <emphasis role="bold">EndsWith</emphasis> function
-  returns TRUE if the <emphasis>source</emphasis> ends with the text in the 
+  <para>The <emphasis role="bold">EndsWith</emphasis> function returns TRUE if
+  the <emphasis>src</emphasis> ends with the text in the
   <emphasis>suffix</emphasis> parameter.</para>
 
+  <para>Trailing and Leading spaces are stripped from the suffix before
+  matching.</para>
+
+  <para>For the Unicode version, unless specified, normalization will not
+  occur. Unless initiated as hex and then converted to Unicode using TRANSFER,
+  ECL will perform its own normalization on your declared Unicode
+  string.</para>
+
+  <para></para>
+
   <para>Example:</para>
 
   <programlisting format="linespecific">IMPORT STD;

+ 22 - 4
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/ExcludeFirstWord.xml

@@ -13,6 +13,13 @@
     </indexterm>(</emphasis> <emphasis>text</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.ExcludeFirstWord<indexterm>
+      <primary>STD.Uni.ExcludeFirstWord</primary>
+    </indexterm><indexterm>
+      <primary>Uni.ExcludeFirstWord</primary>
+    </indexterm>(</emphasis> <emphasis>text, localename</emphasis> <emphasis
+  role="bold">)</emphasis></para>
+
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
       <colspec colwidth="80.50pt" />
@@ -27,18 +34,29 @@
         </row>
 
         <row>
+          <entry><emphasis>localename</emphasis></entry>
+
+          <entry>Optional. The locale to use for the break semantics. Defaults
+          to ''</entry>
+        </row>
+
+        <row>
           <entry>Return:</entry>
 
-          <entry>ExcludeFirstWord returns a STRING value.</entry>
+          <entry>ExcludeFirstWord returns a STRING or UNICODE value, as
+          appropriate.</entry>
         </row>
       </tbody>
     </tgroup>
   </informaltable>
 
   <para>The <emphasis role="bold">ExcludeFirstWord </emphasis>function returns
-  the <emphasis>text</emphasis> string with the first word removed. Words are
-  separated by one or more whitespace characters. Whitespace before the first
-  word is also removed.</para>
+  the <emphasis>text</emphasis> string with the first word removed.</para>
+
+  <para>Words are separated by one or more whitespace characters. For the
+  Unicode version, words are marked by the Unicode break semantics.</para>
+
+  <para>Whitespace before the first word is also removed.</para>
 
   <para>Example:</para>
 

+ 23 - 5
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/ExcludeLastWord.xml

@@ -13,6 +13,13 @@
     </indexterm>(</emphasis> <emphasis>text</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.ExcludeLastWord<indexterm>
+      <primary>STD.Uni.ExcludeLastWord</primary>
+    </indexterm><indexterm>
+      <primary>Uni.ExcludeLastWord</primary>
+    </indexterm>(</emphasis> <emphasis>text, localename</emphasis> <emphasis
+  role="bold">)</emphasis></para>
+
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
       <colspec colwidth="80.50pt" />
@@ -23,22 +30,33 @@
         <row>
           <entry><emphasis>text</emphasis></entry>
 
-          <entry>A string containing words separated by whitespace. </entry>
+          <entry>A string containing words separated by whitespace.</entry>
+        </row>
+
+        <row>
+          <entry><emphasis>localename</emphasis></entry>
+
+          <entry>Optional. The locale to use for the break semantics. Defaults
+          to ''</entry>
         </row>
 
         <row>
           <entry>Return:</entry>
 
-          <entry>ExcludeLastWord returns a STRING value.</entry>
+          <entry>ExcludeLastWord returns a STRING or UNICODE value, as
+          appropriate.</entry>
         </row>
       </tbody>
     </tgroup>
   </informaltable>
 
   <para>The <emphasis role="bold">ExcludeLastWord</emphasis> function returns
-  the <emphasis>text</emphasis> string with the last word removed. Words are
-  separated by one or more whitespace characters. Whitespace after
-  the last word is also removed.</para>
+  the <emphasis>text</emphasis> string with the last word removed.</para>
+
+  <para>Words are separated by one or more whitespace characters. For the
+  Unicode version, words are marked by the Unicode break semantics.</para>
+
+  <para>Whitespace after the last word is also removed.</para>
 
   <para>Example:</para>
 

+ 30 - 4
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/ExcludeNthWord.xml

@@ -13,6 +13,13 @@
     </indexterm>(</emphasis> <emphasis>text, n</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.ExcludeNthWord<indexterm>
+      <primary>STD.Uni.ExcludeNthWord</primary>
+    </indexterm><indexterm>
+      <primary>Uni.ExcludeNthWord</primary>
+    </indexterm>(</emphasis> <emphasis>text, n, localename</emphasis>
+  <emphasis role="bold">)</emphasis></para>
+
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
       <colspec colwidth="80.50pt" />
@@ -34,9 +41,17 @@
         </row>
 
         <row>
+          <entry><emphasis>localename</emphasis></entry>
+
+          <entry>Optional. The locale to use for the break semantics. Defaults
+          to ''</entry>
+        </row>
+
+        <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
-          <entry>ExcludeNthWord returns a STRING value.</entry>
+          <entry>ExcludeNthWord returns a STRING or UNICODE value, as
+          appropriate.</entry>
         </row>
       </tbody>
     </tgroup>
@@ -44,9 +59,20 @@
 
   <para>The <emphasis role="bold">ExcludeNthWord </emphasis>function returns
   the <emphasis>text</emphasis> string with the <emphasis>n</emphasis>th word
-  removed. Words are separated by one or more whitespace characters.
-  Whitespace after the <emphasis>n</emphasis>th word is also removed (along
-  with whitespace before, if <emphasis>n</emphasis>=1).</para>
+  removed.</para>
+
+  <para>Words are separated by one or more whitespace characters. For the
+  Unicode version, words are marked by the Unicode break semantics.</para>
+
+  <para>Trailing whitespaces are always removed with the word. Leading
+  whitespaces are only removed with the word if the nth word is the first
+  word.</para>
+
+  <para>Returns a blank string if there are no words in the source string.
+  Returns the source string if the number of words in the string is less than
+  the n parameter's assigned value.</para>
+
+  <para></para>
 
   <para>Example:</para>
 

+ 19 - 5
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/FindCount.xml

@@ -10,7 +10,14 @@
       <primary>Str.FindCount</primary>
     </indexterm><indexterm>
       <primary>FindCount</primary>
-    </indexterm>(</emphasis> <emphasis>source, target</emphasis> <emphasis
+    </indexterm>(</emphasis> <emphasis>src, sought</emphasis> <emphasis
+  role="bold">)</emphasis></para>
+
+  <para><emphasis role="bold">STD.Uni.FindCount<indexterm>
+      <primary>STD.Uni.FindCount</primary>
+    </indexterm><indexterm>
+      <primary>Uni.FindCount</primary>
+    </indexterm>(</emphasis> <emphasis>src, sought, form</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
@@ -21,18 +28,25 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>A string containing the data to search.</entry>
         </row>
 
         <row>
-          <entry><emphasis>target </emphasis></entry>
+          <entry><emphasis>sought</emphasis></entry>
 
           <entry>A string containing the substring to search for.</entry>
         </row>
 
         <row>
+          <entry><emphasis>form</emphasis></entry>
+
+          <entry>The type of Unicode normalization to be employed. (NFC, NFD,
+          NFKC, or NFKD)</entry>
+        </row>
+
+        <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
           <entry>StringFindCount returns an INTEGER value.</entry>
@@ -42,8 +56,8 @@
   </informaltable>
 
   <para>The <emphasis role="bold">FindCount </emphasis>function returns the
-  number of non-overlapping instances of the <emphasis>target
-  </emphasis>string within the <emphasis>source</emphasis> string.</para>
+  number of non-overlapping instances of the <emphasis>sought
+  </emphasis>string within the <emphasis>src</emphasis> string.</para>
 
   <para>Example:</para>
 

+ 24 - 8
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/RemoveSuffix.xml

@@ -10,7 +10,14 @@
       <primary>Str.RemoveSuffix</primary>
     </indexterm><indexterm>
       <primary>RemoveSuffix</primary>
-    </indexterm>(</emphasis> <emphasis>source, suffix</emphasis> <emphasis
+    </indexterm>(</emphasis> <emphasis>src, suffix</emphasis> <emphasis
+  role="bold">)</emphasis></para>
+
+  <para><emphasis role="bold">STD.Uni.RemoveSuffix<indexterm>
+      <primary>STD.Uni.RemoveSuffix</primary>
+    </indexterm><indexterm>
+      <primary>Uni.RemoveSuffix</primary>
+    </indexterm>(</emphasis> <emphasis>src, suffix, form</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
@@ -21,18 +28,25 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>The string to search.</entry>
         </row>
 
         <row>
-          <entry><emphasis>suffix</emphasis></entry>
+          <entry><emphasis>suffix </emphasis></entry>
 
           <entry>The ending string to remove.</entry>
         </row>
 
         <row>
+          <entry><emphasis>form</emphasis></entry>
+
+          <entry>The type of Unicode normalization to be employed. (NFC, NFD,
+          NFKC, or NFKD)</entry>
+        </row>
+
+        <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
           <entry>RemoveSuffix returns a string value.</entry>
@@ -41,11 +55,13 @@
     </tgroup>
   </informaltable>
 
-  <para>The <emphasis role="bold">RemoveSuffix</emphasis> function returns 
-  the <emphasis>source</emphasis> string with the ending text in the
-  <emphasis>suffix</emphasis> parameter removed. If the <emphasis>source</emphasis> string 
-  does not end with the <emphasis>suffix</emphasis>, then the <emphasis>source</emphasis> string 
-  is returned unchanged.</para>
+  <para>The <emphasis role="bold">RemoveSuffix</emphasis> function returns the
+  <emphasis>src</emphasis> string with the ending text in the
+  <emphasis>suffix</emphasis> parameter removed. If the
+  <emphasis>src</emphasis> string does not end with the
+  <emphasis>suffix</emphasis>, then the <emphasis>src</emphasis> string is
+  returned unchanged. Trailing spaces are stripped from both strings before
+  matching.</para>
 
   <para>Example:</para>
 

+ 8 - 1
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/Repeat.xml

@@ -13,6 +13,13 @@
     </indexterm>(</emphasis> <emphasis>text</emphasis>, <emphasis>n</emphasis>
   <emphasis role="bold">)</emphasis> <emphasis role="bold"></emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.Repeat<indexterm>
+      <primary>STD.Uni.Repeat</primary>
+    </indexterm><indexterm>
+      <primary>Uni.Repeat</primary>
+    </indexterm>(</emphasis> <emphasis>text</emphasis>, <emphasis>n</emphasis>
+  <emphasis role="bold">)</emphasis> <emphasis role="bold"></emphasis></para>
+
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
       <colspec colwidth="80.50pt" />
@@ -44,7 +51,7 @@
   </informaltable>
 
   <para>The <emphasis role="bold">Repeat </emphasis>function returns the
-  <emphasis>source</emphasis> string repeated n times. </para>
+  <emphasis>text</emphasis> string repeated n times.</para>
 
   <para>Example:</para>
 

+ 17 - 7
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/SplitWords.xml

@@ -10,10 +10,18 @@
       <primary>Str.SplitWords</primary>
     </indexterm><indexterm>
       <primary>SplitWords</primary>
-    </indexterm>(</emphasis> <emphasis>source, separator </emphasis><emphasis
-  role="bold">[ </emphasis><emphasis>, allowblank</emphasis><emphasis
+    </indexterm>(</emphasis> <emphasis>src, separator </emphasis><emphasis
+  role="bold">[ </emphasis><emphasis>, allow_blank</emphasis><emphasis
   role="bold"> ] )</emphasis></para>
 
+  <para><emphasis role="bold">STD.Uni.SplitWords<indexterm>
+      <primary>STD.Uni.SplitWords</primary>
+    </indexterm><indexterm>
+      <primary>Uni.SplitWords</primary>
+    </indexterm><indexterm></indexterm>(</emphasis> <emphasis>src, separator
+  </emphasis><emphasis role="bold">[ </emphasis><emphasis>,
+  allow_blank</emphasis><emphasis role="bold"> ] )</emphasis></para>
+
   <informaltable colsep="1" frame="all" rowsep="1">
     <tgroup cols="2">
       <colspec colwidth="80.50pt" />
@@ -22,7 +30,7 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>A string containing the words to extract.</entry>
         </row>
@@ -34,7 +42,7 @@
         </row>
 
         <row>
-          <entry><emphasis>allowblank</emphasis></entry>
+          <entry><emphasis>allow_blank</emphasis></entry>
 
           <entry>Optional. If TRUE, specifies allowing blank items in the
           result. If omitted, the default is FALSE.</entry>
@@ -43,15 +51,17 @@
         <row>
           <entry>Return:</entry>
 
-          <entry>SplitWords returns a SET OF STRING values.</entry>
+          <entry>SplitWords returns a SET OF STRING or a UnicodeSet, as
+          appropriate .</entry>
         </row>
       </tbody>
     </tgroup>
   </informaltable>
 
   <para>The <emphasis role="bold">SplitWords</emphasis> function returns the
-  list of words in the <emphasis>source</emphasis> string split out by the
-  specified <emphasis>separator</emphasis>.</para>
+  list of words in the <emphasis>src</emphasis> string split out by the
+  specified <emphasis>separator</emphasis>. No spaces are stripped from either
+  string before matching.</para>
 
   <para>Example:</para>
 

+ 25 - 3
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/StartsWith.xml

@@ -10,7 +10,14 @@
       <primary>Str.StartsWith</primary>
     </indexterm><indexterm>
       <primary>StartsWith</primary>
-    </indexterm>(</emphasis> <emphasis>source, prefix</emphasis> <emphasis
+    </indexterm>(</emphasis> <emphasis>src, prefix</emphasis> <emphasis
+  role="bold">)</emphasis></para>
+
+  <para><emphasis role="bold">STD.Uni.StartsWith<indexterm>
+      <primary>STD.Uni.StartsWith</primary>
+    </indexterm><indexterm>
+      <primary>Uni.StartsWith</primary>
+    </indexterm>(</emphasis> <emphasis>src, prefix, form</emphasis> <emphasis
   role="bold">)</emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
@@ -21,7 +28,7 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>The string to search.</entry>
         </row>
@@ -33,6 +40,13 @@
         </row>
 
         <row>
+          <entry><emphasis>form</emphasis></entry>
+
+          <entry>The type of Unicode normalization to be employed. (NFC, NFD,
+          NFKC, or NFKD)</entry>
+        </row>
+
+        <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
           <entry>StartsWith returns a BOOLEAN value.</entry>
@@ -42,9 +56,17 @@
   </informaltable>
 
   <para>The <emphasis role="bold">StartsWith</emphasis> function returns TRUE
-  if the <emphasis>source</emphasis> starts with the text in the
+  if the <emphasis>src</emphasis> starts with the text in the
   <emphasis>prefix</emphasis> parameter.</para>
 
+  <para>Trailing and Leading spaces are stripped from the prefix before
+  matching.</para>
+
+  <para>For the Unicode version, unless specified, normalization will not
+  occur. Unless initiated as hex and then converted to Unicode using TRANSFER,
+  ECL will perform its own normalization on your declared Unicode
+  string.</para>
+
   <para>Example:</para>
 
   <programlisting format="linespecific">IMPORT STD;

+ 15 - 7
docs/EN_US/ECLStandardLibraryReference/SLR-Mods/Translate.xml

@@ -10,7 +10,14 @@
       <primary>Str.Translate</primary>
     </indexterm><indexterm>
       <primary>Translate</primary>
-    </indexterm>(</emphasis> <emphasis>source, search, replacement</emphasis>
+    </indexterm>(</emphasis> <emphasis>src, search, replacement</emphasis>
+  <emphasis role="bold">)</emphasis> <emphasis role="bold"></emphasis></para>
+
+  <para><emphasis role="bold">STD.Uni.Translate<indexterm>
+      <primary>STD.Uni.Translate</primary>
+    </indexterm><indexterm>
+      <primary>Uni.Translate</primary>
+    </indexterm>(</emphasis> <emphasis>src, search, replacement</emphasis>
   <emphasis role="bold">)</emphasis> <emphasis role="bold"></emphasis></para>
 
   <informaltable colsep="1" frame="all" rowsep="1">
@@ -21,7 +28,7 @@
 
       <tbody>
         <row>
-          <entry><emphasis>source</emphasis></entry>
+          <entry><emphasis>src</emphasis></entry>
 
           <entry>A string containing the characters to search.</entry>
         </row>
@@ -43,22 +50,23 @@
         <row>
           <entry>Return:<emphasis> </emphasis></entry>
 
-          <entry>Translate returns a STRING value.</entry>
+          <entry>Translate returns a STRING or UNICODE value, as
+          appropriate.</entry>
         </row>
       </tbody>
     </tgroup>
   </informaltable>
 
   <para>The <emphasis role="bold">Translate </emphasis>functions return the
-  <emphasis>source</emphasis> string with the <emphasis>replacement</emphasis>
-  character substituted for all characters in the <emphasis>source</emphasis>
+  <emphasis>src</emphasis> string with the <emphasis>replacement</emphasis>
+  character substituted for all characters in the <emphasis>src</emphasis>
   string. The <emphasis>search</emphasis> string characters are replaced by
   the characters in the equivalent position in the
   <emphasis>replacement</emphasis> string.</para>
 
   <para>If no <emphasis>search</emphasis> string characters are in the
-  <emphasis>source</emphasis> string, it returns the
-  <emphasis>source</emphasis> string unaltered.</para>
+  <emphasis>src</emphasis> string, it returns the <emphasis>src</emphasis>
+  string unaltered.</para>
 
   <para>Example:</para>