PrG_getting_things_done.xml 27 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="Getting_Things_Done">
  5. <title><emphasis>Getting Things Done</emphasis></title>
  6. <sect2 id="Scanning_Landing_Zone_Files">
  7. <title><emphasis role="bold">Scanning Landing Zone
  8. Files</emphasis></title>
  9. <para><emphasis>Here’s the scenario—you’ve just received a data file from
  10. someone and it has been put on your landing zone. Before you spray that
  11. file to your Thor cluster and start to work with it, you want to have a
  12. quick look to see exactly what kind of data it contains and whether the
  13. format of that data matches the format that you were given by the
  14. supplier. There are a number of ways to do this, including mapping a drive
  15. to your landing zone and using a text/hex editor to open the file and look
  16. at the contents. </emphasis></para>
  17. <para>This article will show you how to accomplish this from within
  18. QueryBuilder using ECL. Here’s the code (contained in the
  19. Default.ProgGuide MODULE attribute):</para>
  20. <programlisting>EXPORT MAC_ScanFile(IP, infile, scansize) := MACRO
  21. ds := DATASET(FileServices.ExternalLogicalFileName(IP, infile),
  22. {DATA1 S},
  23. THOR )[1..scansize];
  24. OUTPUT(TABLE(ds,{hex := ds.s,txt := (STRING1)ds.s}),ALL);
  25. Rec := RECORD
  26. UNSIGNED2 C;
  27. DATA S {MAXLENGTH(8*1024)};
  28. END;
  29. Rec XF1(ds L,INTEGER C) := TRANSFORM
  30. SELF.C := C;
  31. SELF.S := L.s;
  32. END;
  33. ds2 := PROJECT(ds,XF1(LEFT,COUNTER));
  34. Rec XF2(Rec L,Rec R) := TRANSFORM
  35. SELF.S := L.S[1 .. R.C-1] + R.S[1];
  36. SELF := L;
  37. END;
  38. Rolled := ROLLUP(ds2,TRUE,XF2(LEFT,RIGHT));
  39. OUTPUT(TRANSFER(Rolled[1].S,STRING));
  40. ENDMACRO;
  41. </programlisting>
  42. <para>This is written as a MACRO because you could have multiple Landing
  43. Zones, and you certainly are going to want to look into different files
  44. each time. Therefore, a MACRO that generates the standard process code to
  45. scan the file is precisely what’s needed here.</para>
  46. <para>This MACRO takes three parameters: the IP of the landing zone
  47. containing the file the fully-qualified path to that file on the landing
  48. zone the number of bytes to read (maximum 8K)</para>
  49. <para>The initial DATASET declaration uses the
  50. FileServices.ExternalLogicalFileName function to name the file. Defining
  51. the RECORD structure as a single DATA1 field is necessary to ensure that
  52. both text and binary fields can be read correctly. Specifying the DATASET
  53. as a THOR file (no matter what type of file it actually is) makes it
  54. simple to read as a fixed-length record file. The square brackets at the
  55. end of the DATASET declaration automatically limit the number of 1-byte
  56. records read to the first <emphasis>scansize</emphasis> number of bytes in
  57. the file.</para>
  58. <para>The first OUTPUT action allows you to see the raw Hexadecimal data
  59. from the file.</para>
  60. <para>The TABLE function doubles up the input data, producing a DATA1
  61. displaying the Hex value and a STRING1 that type casts each byte to a
  62. STRING1 for display. Viewing the raw Hex value is necessary because most
  63. binary fields will not contain text-displayable characters (and those that
  64. do may mislead you as to the actual contents of the field).
  65. Non-displayable binary characters show up as a square box in the text
  66. column display.</para>
  67. <para>Next, we’ll construct a more text-friendly view of the data. To do
  68. that we’ll start with the Rec RECORD structure, which defines a
  69. byte-counter field (UNSIGNED2 C) and a variable-length field (DATA S
  70. {MAXLENGTH(8*1024)} to contain the text representation of the data as a
  71. single horizontal line of text.</para>
  72. <para>The XF1 TRANSFORM and its associated PROJECT moves the data from the
  73. input format into the format needed to roll up that data into a single
  74. text string. Adding the byte-counter field is necessary to ensure that
  75. blank spaces are not accidentally trimmed out of the final display.</para>
  76. <para>The XF2 TRANSFORM and its associated ROLLUP function performs the
  77. actual data append. The TRUE condition parameter ensures that only one
  78. record will result containing all the input bytes rolled into a single
  79. record.</para>
  80. <para>The last OUTPUT action uses the TRANSFER function instead of type
  81. casting to ensure that all the text characters in the original data are
  82. accurately represented.</para>
  83. <para>You call this MACRO like this:</para>
  84. <programlisting>ProgGuide.MAC_ScanFile( '10.173.9.4',
  85. 'C:\\training\\import\\BOCA.XML', 200)
  86. </programlisting>
  87. <para>When viewing the result, the QueryBuilder Result_1 tab displays a
  88. column of hexadecimal values and the text character (if any) next to it in
  89. the second column. This byte-by-byte view of the data is designed to allow
  90. you to see the raw Hexadecimal values of each byte alongside its text
  91. representation. This is the primary view to use when looking at the
  92. contents of files containing binary data.</para>
  93. <para>The QueryBuilder Result_2 tab displays a single record with a single
  94. field. You can click on that field to highlight it, right-click and select
  95. “Copy” from the popup menu, then paste the text into any text editor to
  96. view. Binary fields will appear as square blocks or “garbage” characters,
  97. depending on their hex value. Once pasted into a text editor, you can
  98. easily look for data patterns that indicate the start for fields or
  99. records and validate that the data layout information provided by the data
  100. vendor is accurate (or not).</para>
  101. </sect2>
  102. <sect2 id="Cartesian_Product_of_Two-Datasets">
  103. <title><emphasis role="bold">Cartesian Product of Two
  104. Datasets</emphasis></title>
  105. <para><emphasis>A Cartesian Product is the product of two non-empty sets
  106. in terms of ordered pairs. As an example, if we take the set of values, A,
  107. B and C, and a second set of values, 1, 2, and 3, the Cartesian Product of
  108. these two sets would be the set of ordered pairs, A1, A2, A3, B1, B2, B3,
  109. C1, C2, C3.</emphasis></para>
  110. <para>The ECL code to produce this kind of result from any two input
  111. datasets would look like this (contained in Cartesian.ECL):</para>
  112. <programlisting>OutFile1 := '~PROGGUIDE::OUT::CP1';
  113. rec := RECORD
  114. STRING1 Letter;
  115. END;
  116. Inds1 := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'},
  117. {'F'},{'G'},{'H'},{'I'},{'J'},
  118. {'K'},{'L'},{'M'},{'N'},{'O'},
  119. {'P'},{'Q'},{'R'},{'S'},{'T'},
  120. {'U'},{'V'},{'W'},{'X'},{'Y'}],
  121. rec);
  122. Inds2 := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'},
  123. {'F'},{'G'},{'H'},{'I'},{'J'},
  124. {'K'},{'L'},{'M'},{'N'},{'O'},
  125. {'P'},{'Q'},{'R'},{'S'},{'T'},
  126. {'U'},{'V'},{'W'},{'X'},{'Y'}],
  127. rec);
  128. CntInDS2 := COUNT(Inds2);
  129. SetInDS2 := SET(inds2,letter);
  130. outrec := RECORD
  131. STRING1 LeftLetter;
  132. STRING1 RightLetter;
  133. END;
  134. outrec CartProd(rec L, INTEGER C) := TRANSFORM
  135. SELF.LeftLetter := L.Letter;
  136. SELF.RightLetter := SetInDS2[C];
  137. END;
  138. //Run the small datasets
  139. CP1 := NORMALIZE(Inds1,CntInDS2,CartProd(LEFT,COUNTER));
  140. OUTPUT(CP1,,OutFile1,OVERWRITE);
  141. </programlisting>
  142. <para>The core structure of this code is the NORMALIZE that will produce
  143. the Cartesian Product. The two input datasets each have twenty-five
  144. records, so the number of result records will be six hundred twenty-five
  145. (twenty-five squared).</para>
  146. <para>Each record in the LEFT input dataset to the NORMALIZE will execute
  147. the TRANSFORM once for each entry in the SET of values. Making the values
  148. a SET is the key to allowing NORMALIZE to perform this operation,
  149. otherwise you would need to do a JOIN where the join condition is the
  150. keyword TRUE to accomplish this task. However, in testing this with
  151. sizable datasets (as in the next instance of this code below), the
  152. NORMALIZE version was about 25% faster than using JOIN. If there is more
  153. than one field, then multiple SETs may be defined and the process stays
  154. the same.</para>
  155. <para>This next example does the same operation as above, but first
  156. generates two sizeable datasets to work with (also contained in
  157. Cartesian.ECL):</para>
  158. <programlisting>InFile1 := '~PROGGUIDE::IN::CP1';
  159. InFile2 := '~PROGGUIDE::IN::CP2';
  160. OutFile2 := '~PROGGUIDE::OUT::CP2';
  161. //generate data files
  162. rec BuildFile(rec L, INTEGER C) := TRANSFORM
  163. SELF.Letter := Inds2[C].Letter;
  164. END;
  165. GenCP1 := NORMALIZE(InDS1,CntInDS2,BuildFile(LEFT,COUNTER));
  166. GenCP2 := NORMALIZE(GenCP1,CntInDS2,BuildFile(LEFT,COUNTER));
  167. GenCP3 := NORMALIZE(GenCP2,CntInDS2,BuildFile(LEFT,COUNTER));
  168. Out1 := OUTPUT(DISTRIBUTE(GenCP3,RANDOM()),,InFile1,OVERWRITE);
  169. Out2 := OUTPUT(DISTRIBUTE(GenCP2,RANDOM()),,InFile2,OVERWRITE);
  170. // Use the generated datasets in a cartesian join:
  171. ds1 := DATASET(InFile1,rec,thor);
  172. ds2 := DATASET(InFile2,rec,thor);
  173. CntDS2 := COUNT(ds2);
  174. SetDS2 := SET(ds2,letter);
  175. CP2 := NORMALIZE(ds1,CntDS2,CartProd(LEFT,COUNTER));
  176. Out3 := OUTPUT(CP2,,OutFile2,OVERWRITE);
  177. SEQUENTIAL(Out1,Out2,Out3) </programlisting>
  178. <para>Using NORMALIZE in this case to generate the datasets is the same
  179. type of usage previously described in the Creating Example Data article.
  180. After that, the process to achieve the Cartesian Product is exactly the
  181. same as the previous example.</para>
  182. <para>Here’s an example of how this same operation can be done using JOIN
  183. (also contained in Cartesian.ECL):</para>
  184. <programlisting>// outrec joinEm(rec L, rec R) := TRANSFORM
  185. // SELF.LeftLetter := L.Letter;
  186. // SELF.RightLetter := R.Letter;
  187. // END;
  188. // ds4 := JOIN(ds1, ds2, TRUE, joinEM(LEFT, RIGHT), ALL);
  189. // OUTPUT(ds4);
  190. </programlisting>
  191. </sect2>
  192. <sect2 id="Records_Containing_Any_of_a-Set_of_Words">
  193. <title><emphasis role="bold">Records Containing Any of a Set of
  194. Words</emphasis></title>
  195. <para><emphasis>Part of the data cleanup problem is the possible presence
  196. of profanity or cartoon character names in the data. This can become an
  197. issue whenever you are working with data that originated from direct input
  198. by end-users to a website. The following code (contained in the
  199. BadWordSearch.ECL file) will detect the presence of any of a set of “bad”
  200. words in a given field:</emphasis></para>
  201. <programlisting>SetBadWords := ['JUNK', 'GARBAGE', 'CRAP'];
  202. BadWordDS := DATASET(SetBadWords,{STRING10 word});
  203. SearchDS := DATASET([{1,'FRED','FLINTSTONE'},
  204. {2,'GEORGE','JETSON'},
  205. {3,'CRAPOLA','NASTYGUY'},
  206. {4,'JUNKER','JUNKEE'},
  207. {5,'GARBAGEGUY','JUNKMAN'},
  208. {6,'FREDDY','KRUEGER'},
  209. {7,'TIM','JONES'},
  210. {8,'JOHN','SMITH'},
  211. {9,'MIKE','MALARKEY'},
  212. {10,'GEORGE','KRUEGER'}
  213. ],{UNSIGNED6 ID,STRING10 firstname,STRING10 lastname});
  214. outrec := RECORD
  215. SearchDS.ID;
  216. SearchDS.firstname;
  217. BOOLEAN FoundWord;
  218. END;
  219. {BOOLEAN Found} FindWord(BadWordDS L, STRING10 inword) := TRANSFORM
  220. SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) &gt; 0;
  221. END;
  222. outrec CheckWords(SearchDS L) := TRANSFORM
  223. SELF.FoundWord := EXISTS(PROJECT(BadWordDS,
  224. FindWord(LEFT,L.firstname))(Found=TRUE));
  225. SELF := L;
  226. END;
  227. result := PROJECT(SearchDS,CheckWords(LEFT));
  228. OUTPUT(result(FoundWord=TRUE),NAMED('BadWordsInFirstName'));
  229. OUTPUT(result(FoundWord=FALSE),NAMED('NoBadWordsInFirstName'));
  230. </programlisting>
  231. <para>This code is a simple PROJECT of each record that you want to
  232. search. The result will be a record set containing the record ID field,
  233. the firstname search field, and a BOOLEAN FoundWord flag field indicating
  234. whether any “bad” word was found.</para>
  235. <para>The search itself is done by a nested PROJECT of the field to be
  236. searched against the DATASET of “bad” words. Using the EXISTS function to
  237. detect if any records are returned from that PROJECT where the returned
  238. Found field is TRUE sets the FoundWord flag field value.</para>
  239. <para>The StringLib.StringFind function simoply detects the presence
  240. anywhere within the search strin of any of the “bad” words. The OUTPUT of
  241. the set of records where the FoundWord is TRUE allows post-processing to
  242. evaluate whether the record is worth keeping or garbage (probably
  243. requiring human intervention).</para>
  244. <para>The above code is a specific example of this technique, but it would
  245. be much more useful to have a MACRO that accomplishes this task, something
  246. like this one (also contained in the BadWordSearch.ECL file):</para>
  247. <programlisting>MAC_FindBadWords(BadWordSet,InFile,IDfld,SeekFld,ResAttr,MatchType=1)
  248. := MACRO
  249. #UNIQUENAME(BadWordDS)
  250. %BadWordDS% := DATASET(BadWordSet,{STRING word{MAXLENGTH(50)}});
  251. #UNIQUENAME(outrec)
  252. %outrec% := RECORD
  253. InFile.IDfld;
  254. InFile.SeekFld;
  255. BOOLEAN FoundWord := FALSE;
  256. UNSIGNED2 FoundPos := 0;
  257. END;
  258. #UNIQUENAME(ChkTbl)
  259. %ChkTbl% := TABLE(InFile,%outrec%);
  260. #UNIQUENAME(FindWord)
  261. {BOOLEAN Found,UNSIGNED2 FoundPos} %FindWord%(%BadWordDS% L,
  262. INTEGER C,
  263. STRING inword)
  264. := TRANSFORM
  265. #IF(MatchType=1) //"contains" search
  266. SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) &gt; 0;
  267. #END
  268. #IF(MatchType=2) //"exact match" search
  269. SELF.Found := inword = L.word;
  270. #END
  271. #IF(MatchType=3) //"starts with" search
  272. SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) = 1;
  273. #END
  274. SELF.FoundPos := IF(SELF.FOUND=TRUE,C,0);
  275. END;
  276. #UNIQUENAME(CheckWords)
  277. %outrec% %CheckWords%(%ChkTbl% L) := TRANSFORM
  278. WordDS := PROJECT(%BadWordDS%,%FindWord%(LEFT,COUNTER,L.SeekFld));
  279. SELF.FoundWord := EXISTS(WordDS(Found=TRUE));
  280. SELF.FoundPos := WordDS(Found=TRUE)[1].FoundPos;
  281. SELF := L;
  282. END;
  283. ResAttr := PROJECT(%ChkTbl%,%CheckWords%(LEFT));
  284. ENDMACRO; </programlisting>
  285. <para>This MACRO does a bit more than the previous example. It begins by
  286. passing in:</para>
  287. <para>* The set of words to find* The file to search* The unique
  288. identifier field for the search record* The field to search in* The
  289. attribute name of the resulting recordset* The type of matching to do
  290. (defaulting to 1)</para>
  291. <para>Passing in the set of words to seek allows the MACRO to operate
  292. against any given set of strings. Specifying the result attribute name
  293. allows easy post-processing of the data.</para>
  294. <para>Where this MACRO starts going beyond the previous example is in the
  295. MatchType parameter, which allows the MACRO to use the Template Language
  296. #IF function to generate three different kinds of searches from the same
  297. codebase: a “contains” search (the default), an exact match, and a “starts
  298. with” search.</para>
  299. <para>It also has an expanded output RECORD structure that includes a
  300. FoundPos field to contain the pointer to the first entry in the passed in
  301. set that matched. This allows post processing to detect positional matches
  302. within the set so that “matched pairs” of words can be detected, as in
  303. this example (also contained in the BadWordSearch.ECL file):</para>
  304. <programlisting>SetCartoonFirstNames := ['GEORGE','FRED', 'FREDDY'];
  305. SetCartoonLastNames := ['JETSON','FLINTSTONE','KRUEGER'];
  306. MAC_FindBadWords(SetCartoonFirstNames,SearchDS,ID,firstname,Res1,2)
  307. MAC_FindBadWords(SetCartoonLastNames,SearchDS,ID,lastname,Res2,2)
  308. Cartoons := JOIN(Res1(FoundWord=TRUE),
  309. Res2(FoundWord=TRUE),
  310. LEFT.ID=RIGHT.ID AND LEFT.FoundPos=RIGHT.FoundPos);
  311. MAC_FindBadWords(SetBadWords,SearchDS,ID,firstname,Res3,3)
  312. MAC_FindBadWords(SetBadWords,SearchDS,ID,lastname,Res4)
  313. SetBadGuys := SET(Cartoons,ID) +
  314. SET(Res3(FoundWord=TRUE),ID) +
  315. SET(Res4(FoundWord=TRUE),ID);
  316. GoodGuys := SearchDS(ID NOT IN SetBadGuys);
  317. BadGuys := SearchDS(ID IN SetBadGuys);
  318. OUTPUT(BadGuys,NAMED('BadGuys'));
  319. OUTPUT(GoodGuys,NAMED('GoodGuys'));
  320. </programlisting>
  321. <para>Notice that the position of the cartoon character names in their
  322. separate sets define a single character name to search for in multiple
  323. passes. Calling the MACRO twice, searching for the first and last names
  324. separately, allows you to post-process their results with a simple inner
  325. JOIN where the same record was found in each and, most importantly, the
  326. positional values of the matches are the same. This prevents “GEORGE
  327. KRUEGER” from being mis-labelled a cartoon chracter name.</para>
  328. </sect2>
  329. <sect2 id="Simple_Random_Samples">
  330. <title><emphasis role="bold">Simple Random Samples</emphasis></title>
  331. <para><emphasis>There is a statistical concept called a “Simple Random
  332. Sample” in which a statistically “random” (different from simply using the
  333. RANDOM() function) sample of records is generated from any dataset. The
  334. algorithm inmplemented in the following code example was provided by a
  335. customer.</emphasis></para>
  336. <para>This code is implemented as a MACRO to allow multiple samples to be
  337. produced in the same workunit (contained in the SimpleRandomSamples.ECL
  338. file):</para>
  339. <programlisting>SimpleRandomSample(InFile,UID_Field,SampleSize,Result) := MACRO
  340. //build a table of the UIDs
  341. #UNIQUENAME(Layout_Plus_RecID)
  342. %Layout_Plus_RecID% := RECORD
  343. UNSIGNED8 RecID := 0;
  344. InFile.UID_Field;
  345. END;
  346. #UNIQUENAME(InTbl)
  347. %InTbl% := TABLE(InFile,%Layout_Plus_RecID%);
  348. //then assign unique record IDs to the table entries
  349. #UNIQUENAME(IDRecs)
  350. %Layout_Plus_RecID% %IDRecs%(%Layout_Plus_RecID% L, INTEGER C) :=
  351. TRANSFORM
  352. SELF.RecID := C;
  353. SELF := L;
  354. END;
  355. #UNIQUENAME(UID_Recs)
  356. %UID_Recs% := PROJECT(%InTbl%,%IDRecs%(LEFT,COUNTER));
  357. //discover the number of records
  358. #UNIQUENAME(WholeSet)
  359. %WholeSet% := COUNT(InFile) : GLOBAL;
  360. //then generate the unique record IDs to include in the sample
  361. #UNIQUENAME(BlankSet)
  362. %BlankSet% := DATASET([{0}],{UNSIGNED8 seq});
  363. #UNIQUENAME(SelectEm)
  364. TYPEOF(%BlankSet%) %SelectEm%(%BlankSet% L, INTEGER c) := TRANSFORM
  365. SELF.seq := ROUNDUP(%WholeSet% * (((RANDOM()%100000)+1)/100000));
  366. END;
  367. #UNIQUENAME(selected)
  368. %selected% := NORMALIZE( %BlankSet%, SampleSize,
  369. %SelectEm%(LEFT, COUNTER));
  370. //then filter the original dataset by the selected UIDs
  371. #UNIQUENAME(SetSelectedRecs)
  372. %SetSelectedRecs% := SET(%UID_Recs%(RecID IN SET(%selected%,seq)),
  373. UID_Field);
  374. result := infile(UID_Field IN %SetSelectedRecs% );
  375. ENDMACRO;
  376. </programlisting>
  377. <para>This MACRO takes four parameters:</para>
  378. <para>* The name of the file to sample * The name of the unique identifier
  379. field in that file * The size of the sample to generate * The name of the
  380. attribute for the result, so that it may be post-processed</para>
  381. <para>The algorithm itself is fairly simple. We first create a TABLE of
  382. uniquely numbered unique identifier fields. Then we use NORMALIZE to
  383. produce a recordset of the candidate records. Which candidate is chosen
  384. each time the TRANSFORM function is called is determined by generating a
  385. “random” value between zero and one, using modulus division by one hundred
  386. thousand on the return from the RANDOM() function, then multiplying that
  387. result by the number of records to sample from, rounding up to the next
  388. larger integer. This determines the position of the field identifier to
  389. use. Once the set of positions within the TABLE is determined, they are
  390. used to define the SET of unique fields to use in the final result.</para>
  391. <para>This algorithm is designed to produce a sample “with replacement” so
  392. that it is possible to have a smaller number of records returned than the
  393. sample size requested. To produce exactly the size sample you need (that
  394. is, a “without replacement” sample), you can request a larger sample size
  395. (say, 10% larger) then use the CHOOSEN function to retrieve only the
  396. actual number of records required, as in this example (also contained in
  397. the SimpleRandomSamples.ECL file).</para>
  398. <programlisting>SomeFile := DATASET([{'A1'},{'B1'},{'C1'},{'D1'},{'E1'},
  399. {'F1'},{'G1'},{'H1'},{'I1'},{'J1'},
  400. {'K1'},{'L1'},{'M1'},{'N1'},{'O1'},
  401. {'P1'},{'Q1'},{'R1'},{'S1'},{'T1'},
  402. {'U1'},{'V1'},{'W1'},{'X1'},{'Y1'},
  403. {'A2'},{'B2'},{'C2'},{'D2'},{'E2'},
  404. {'F2'},{'G2'},{'H2'},{'I2'},{'J2'},
  405. {'K2'},{'L2'},{'M2'},{'N2'},{'O2'},
  406. {'P2'},{'Q2'},{'R2'},{'S2'},{'T2'},
  407. {'U2'},{'V2'},{'W2'},{'X2'},{'Y2'},
  408. {'A3'},{'B3'},{'C3'},{'D3'},{'E3'},
  409. {'F3'},{'G3'},{'H3'},{'I3'},{'J3'},
  410. {'K3'},{'L3'},{'M3'},{'N3'},{'O3'},
  411. {'P3'},{'Q3'},{'R3'},{'S3'},{'T3'},
  412. {'U3'},{'V3'},{'W3'},{'X3'},{'Y3'},
  413. {'A4'},{'B4'},{'C4'},{'D4'},{'E4'},
  414. {'F4'},{'G4'},{'H4'},{'I4'},{'J4'},
  415. {'K4'},{'L4'},{'M4'},{'N4'},{'O4'},
  416. {'P4'},{'Q4'},{'R4'},{'S4'},{'T4'},
  417. {'U4'},{'V4'},{'W4'},{'X4'},{'Y4'}
  418. ],{STRING2 Letter});
  419. ds := DISTRIBUTE(SomeFile,HASH(letter[2]));
  420. SimpleRandomSample(ds,Letter,6, res1) //ask for 6
  421. SimpleRandomSample(ds,Letter,6, res2)
  422. SimpleRandomSample(ds,Letter,6, res3)
  423. OUTPUT(CHOOSEN(res1,5)); //actually need 5
  424. OUTPUT(CHOOSEN(res3,5));
  425. </programlisting>
  426. </sect2>
  427. <sect2 id="Hex_String_to_Decimal_String">
  428. <title><emphasis role="bold">Hex String to Decimal
  429. String</emphasis></title>
  430. <para><emphasis>An email request came to me to suggest a way to convert a
  431. string containing Hexadecimal values to a string containing the decimal
  432. equivalent of that value. The problem was that this code needed to run in
  433. Roxie and the StringLib.String2Data plugin library fiunction was not
  434. available for use in Roxie queries at that time. Therefore, an all-ECL
  435. solution was needed.</emphasis></para>
  436. <para>This example function (contained in the Hex2Decimal.ECL file)
  437. provides that functionality, while at the same time demonstrating
  438. practical usage of BIG ENDIAN integers and type transfer.</para>
  439. <programlisting>HexStr2Decimal(STRING HexIn) := FUNCTION
  440. //type re-definitions to make code more readable below
  441. BE1 := BIG_ENDIAN UNSIGNED1;
  442. BE2 := BIG_ENDIAN UNSIGNED2;
  443. BE3 := BIG_ENDIAN UNSIGNED3;
  444. BE4 := BIG_ENDIAN UNSIGNED4;
  445. BE5 := BIG_ENDIAN UNSIGNED5;
  446. BE6 := BIG_ENDIAN UNSIGNED6;
  447. BE7 := BIG_ENDIAN UNSIGNED7;
  448. BE8 := BIG_ENDIAN UNSIGNED8;
  449. TrimHex := TRIM(HexIn,ALL);
  450. HexLen := LENGTH(TrimHex);
  451. UseHex := IF(HexLen % 2 = 1,'0','') + TrimHex;
  452. //a sub-function to translate two hex chars to a packed hex format
  453. STRING1 Str2Data(STRING2 Hex) := FUNCTION
  454. UNSIGNED1 N1 :=
  455. CASE( Hex[1],
  456. '0'=&gt;00x,'1'=&gt;10x,'2'=&gt;20x,'3'=&gt;30x,
  457. '4'=&gt;40x,'5'=&gt;50x,'6'=&gt;60x,'7'=&gt;70x,
  458. '8'=&gt;80x,'9'=&gt;90x,'A'=&gt;0A0x,'B'=&gt;0B0x,
  459. 'C'=&gt;0C0x,'D'=&gt;0D0x,'E'=&gt;0E0x,'F'=&gt;0F0x,00x);
  460. UNSIGNED1 N2 :=
  461. CASE( Hex[2],
  462. '0'=&gt;00x,'1'=&gt;01x,'2'=&gt;02x,'3'=&gt;03x,
  463. '4'=&gt;04x,'5'=&gt;05x,'6'=&gt;06x,'7'=&gt;07x,
  464. '8'=&gt;08x,'9'=&gt;09x,'A'=&gt;0Ax,'B'=&gt;0Bx,
  465. 'C'=&gt;0Cx,'D'=&gt;0Dx,'E'=&gt;0Ex,'F'=&gt;0Fx,00x);
  466. RETURN (&gt;STRING1&lt;)(N1 | N2);
  467. END;
  468. UseHexLen := LENGTH(TRIM(UseHex));
  469. InHex2 := Str2Data(UseHex[1..2]);
  470. InHex4 := InHex2 + Str2Data(UseHex[3..4]);
  471. InHex6 := InHex4 + Str2Data(UseHex[5..6]);
  472. InHex8 := InHex6 + Str2Data(UseHex[7..8]);
  473. InHex10 := InHex8 + Str2Data(UseHex[9..10]);;
  474. InHex12 := InHex10 + Str2Data(UseHex[11..12]);
  475. InHex14 := InHex12 + Str2Data(UseHex[13..14]);
  476. InHex16 := InHex14 + Str2Data(UseHex[15..16]);
  477. RETURN CASE(UseHexLen,
  478. 2 =&gt; (STRING)(&gt;BE1&lt;)InHex2,
  479. 4 =&gt; (STRING)(&gt;BE2&lt;)InHex4,
  480. 6 =&gt; (STRING)(&gt;BE3&lt;)InHex6,
  481. 8 =&gt; (STRING)(&gt;BE4&lt;)InHex8,
  482. 10 =&gt; (STRING)(&gt;BE5&lt;)InHex10,
  483. 12 =&gt; (STRING)(&gt;BE6&lt;)InHex12,
  484. 14 =&gt; (STRING)(&gt;BE7&lt;)InHex14,
  485. 16 =&gt; (STRING)(&gt;BE8&lt;)InHex16,
  486. 'ERROR');
  487. END;
  488. </programlisting>
  489. <para>This HexStr2Decimal FUNCTION takes a variable-length STRING
  490. parameter containing the hexadecimal value to evaluate. It begins by
  491. re-defining the eight possible sizes of unsigned BIG ENDIAN integers. This
  492. re-definition is purely for cosmetic purposes—to make the subsequent code
  493. more readable.</para>
  494. <para>The next three attributes detect whether an even or odd number of
  495. hexadecimal characters have been passed. If an odd number is passed, then
  496. a “0” character is prepended to the passed value to ensure the hex values
  497. are placed inthe correct nibbles.</para>
  498. <para>The Str2Data FUNCTION takes a two-character STRING parameter and
  499. translates each character into the appropriate hexadecimal value for each
  500. nibble of the resulting 1-character STRING that it returns. The first
  501. character defines the first nibble and the second defines the second.
  502. These two values are ORed together (using the bitwise | operator) then the
  503. result is type transferred to a one-character string, using the shorthand
  504. syntax— (&gt;STRING1&lt;) —so that the bit pattern remains untouched. The
  505. RETURN result from this FUNCTION is a STRING1 because each succeeding
  506. two-character portion of the HexStr2Decimal FUNCTION’s input parameter
  507. will pass through the Str2Data FUNCTION and be concatenated with all the
  508. preceding results.</para>
  509. <para>The UseHexLen attribute determines the appropriate size of BIG
  510. ENDIAN integer to use to translate the hex into decimal, while the InHex2
  511. through InHex16 attributes define the final packed-hexadecimal value to
  512. evaluate. The CASE function then uses that UseHexLen to determine which
  513. InHex attribute to use for the number of bytes of hex value passed in.
  514. Only even numbers of hex characters are allowed (meaning the call to the
  515. function would need to add a leading zero to any odd-numbered hex values
  516. to translate) and the maximum number of characters allowed is sixteen
  517. (representing an eight-byte packed hexadecimal value to translate).</para>
  518. <para>In all cases, the result from the InHex attribute is
  519. type-transferred to the appropriately sized BIG ENDIAN integer. The
  520. standard type cast to STRING then performs the actual value translation
  521. from the hexadecimal to decimal.</para>
  522. <para>The following calls return the indicated results:</para>
  523. <programlisting>OUTPUT(HexStr2Decimal('0101')); // 257
  524. OUTPUT(HexStr2Decimal('FF')); // 255
  525. OUTPUT(HexStr2Decimal('FFFF')); // 65535
  526. OUTPUT(HexStr2Decimal('FFFFFF')); // 16777215
  527. OUTPUT(HexStr2Decimal('FFFFFFFF')); // 4294967295
  528. OUTPUT(HexStr2Decimal('FFFFFFFFFF')); // 1099511627775
  529. OUTPUT(HexStr2Decimal('FFFFFFFFFFFF')); // 281474976710655
  530. OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFF')); // 72057594037927935
  531. OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFFFF')); // 18446744073709551615
  532. OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFFFFFF')); // ERROR
  533. </programlisting>
  534. </sect2>
  535. </sect1>