123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638 |
- <?xml version="1.0" encoding="UTF-8"?>
- <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
- "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
- <sect1 id="Getting_Things_Done">
- <title><emphasis>Getting Things Done</emphasis></title>
- <sect2 id="Scanning_Landing_Zone_Files">
- <title><emphasis role="bold">Scanning Landing Zone
- Files</emphasis></title>
- <para><emphasis>Here’s the scenario—you’ve just received a data file from
- someone and it has been put on your landing zone. Before you spray that
- file to your Thor cluster and start to work with it, you want to have a
- quick look to see exactly what kind of data it contains and whether the
- format of that data matches the format that you were given by the
- supplier. There are a number of ways to do this, including mapping a drive
- to your landing zone and using a text/hex editor to open the file and look
- at the contents. </emphasis></para>
- <para>This article will show you how to accomplish this from within
- QueryBuilder using ECL. Here’s the code (contained in the
- Default.ProgGuide MODULE attribute):</para>
- <programlisting>EXPORT MAC_ScanFile(IP, infile, scansize) := MACRO
- ds := DATASET(FileServices.ExternalLogicalFileName(IP, infile),
- {DATA1 S},
- THOR )[1..scansize];
- OUTPUT(TABLE(ds,{hex := ds.s,txt := (STRING1)ds.s}),ALL);
- Rec := RECORD
- UNSIGNED2 C;
- DATA S {MAXLENGTH(8*1024)};
- END;
- Rec XF1(ds L,INTEGER C) := TRANSFORM
- SELF.C := C;
- SELF.S := L.s;
- END;
- ds2 := PROJECT(ds,XF1(LEFT,COUNTER));
- Rec XF2(Rec L,Rec R) := TRANSFORM
- SELF.S := L.S[1 .. R.C-1] + R.S[1];
- SELF := L;
- END;
- Rolled := ROLLUP(ds2,TRUE,XF2(LEFT,RIGHT));
- OUTPUT(TRANSFER(Rolled[1].S,STRING));
- ENDMACRO;
- </programlisting>
- <para>This is written as a MACRO because you could have multiple Landing
- Zones, and you certainly are going to want to look into different files
- each time. Therefore, a MACRO that generates the standard process code to
- scan the file is precisely what’s needed here.</para>
- <para>This MACRO takes three parameters: the IP of the landing zone
- containing the file the fully-qualified path to that file on the landing
- zone the number of bytes to read (maximum 8K)</para>
- <para>The initial DATASET declaration uses the
- FileServices.ExternalLogicalFileName function to name the file. Defining
- the RECORD structure as a single DATA1 field is necessary to ensure that
- both text and binary fields can be read correctly. Specifying the DATASET
- as a THOR file (no matter what type of file it actually is) makes it
- simple to read as a fixed-length record file. The square brackets at the
- end of the DATASET declaration automatically limit the number of 1-byte
- records read to the first <emphasis>scansize</emphasis> number of bytes in
- the file.</para>
- <para>The first OUTPUT action allows you to see the raw Hexadecimal data
- from the file.</para>
- <para>The TABLE function doubles up the input data, producing a DATA1
- displaying the Hex value and a STRING1 that type casts each byte to a
- STRING1 for display. Viewing the raw Hex value is necessary because most
- binary fields will not contain text-displayable characters (and those that
- do may mislead you as to the actual contents of the field).
- Non-displayable binary characters show up as a square box in the text
- column display.</para>
- <para>Next, we’ll construct a more text-friendly view of the data. To do
- that we’ll start with the Rec RECORD structure, which defines a
- byte-counter field (UNSIGNED2 C) and a variable-length field (DATA S
- {MAXLENGTH(8*1024)} to contain the text representation of the data as a
- single horizontal line of text.</para>
- <para>The XF1 TRANSFORM and its associated PROJECT moves the data from the
- input format into the format needed to roll up that data into a single
- text string. Adding the byte-counter field is necessary to ensure that
- blank spaces are not accidentally trimmed out of the final display.</para>
- <para>The XF2 TRANSFORM and its associated ROLLUP function performs the
- actual data append. The TRUE condition parameter ensures that only one
- record will result containing all the input bytes rolled into a single
- record.</para>
- <para>The last OUTPUT action uses the TRANSFER function instead of type
- casting to ensure that all the text characters in the original data are
- accurately represented.</para>
- <para>You call this MACRO like this:</para>
- <programlisting>ProgGuide.MAC_ScanFile( '10.173.9.4',
- 'C:\\training\\import\\BOCA.XML', 200)
- </programlisting>
- <para>When viewing the result, the QueryBuilder Result_1 tab displays a
- column of hexadecimal values and the text character (if any) next to it in
- the second column. This byte-by-byte view of the data is designed to allow
- you to see the raw Hexadecimal values of each byte alongside its text
- representation. This is the primary view to use when looking at the
- contents of files containing binary data.</para>
- <para>The QueryBuilder Result_2 tab displays a single record with a single
- field. You can click on that field to highlight it, right-click and select
- “Copy” from the popup menu, then paste the text into any text editor to
- view. Binary fields will appear as square blocks or “garbage” characters,
- depending on their hex value. Once pasted into a text editor, you can
- easily look for data patterns that indicate the start for fields or
- records and validate that the data layout information provided by the data
- vendor is accurate (or not).</para>
- </sect2>
- <sect2 id="Cartesian_Product_of_Two-Datasets">
- <title><emphasis role="bold">Cartesian Product of Two
- Datasets</emphasis></title>
- <para><emphasis>A Cartesian Product is the product of two non-empty sets
- in terms of ordered pairs. As an example, if we take the set of values, A,
- B and C, and a second set of values, 1, 2, and 3, the Cartesian Product of
- these two sets would be the set of ordered pairs, A1, A2, A3, B1, B2, B3,
- C1, C2, C3.</emphasis></para>
- <para>The ECL code to produce this kind of result from any two input
- datasets would look like this (contained in Cartesian.ECL):</para>
- <programlisting>OutFile1 := '~PROGGUIDE::OUT::CP1';
- rec := RECORD
- STRING1 Letter;
- END;
- Inds1 := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'},
- {'F'},{'G'},{'H'},{'I'},{'J'},
- {'K'},{'L'},{'M'},{'N'},{'O'},
- {'P'},{'Q'},{'R'},{'S'},{'T'},
- {'U'},{'V'},{'W'},{'X'},{'Y'}],
- rec);
- Inds2 := DATASET([{'A'},{'B'},{'C'},{'D'},{'E'},
- {'F'},{'G'},{'H'},{'I'},{'J'},
- {'K'},{'L'},{'M'},{'N'},{'O'},
- {'P'},{'Q'},{'R'},{'S'},{'T'},
- {'U'},{'V'},{'W'},{'X'},{'Y'}],
- rec);
- CntInDS2 := COUNT(Inds2);
- SetInDS2 := SET(inds2,letter);
- outrec := RECORD
- STRING1 LeftLetter;
- STRING1 RightLetter;
- END;
- outrec CartProd(rec L, INTEGER C) := TRANSFORM
- SELF.LeftLetter := L.Letter;
- SELF.RightLetter := SetInDS2[C];
- END;
- //Run the small datasets
- CP1 := NORMALIZE(Inds1,CntInDS2,CartProd(LEFT,COUNTER));
- OUTPUT(CP1,,OutFile1,OVERWRITE);
- </programlisting>
- <para>The core structure of this code is the NORMALIZE that will produce
- the Cartesian Product. The two input datasets each have twenty-five
- records, so the number of result records will be six hundred twenty-five
- (twenty-five squared).</para>
- <para>Each record in the LEFT input dataset to the NORMALIZE will execute
- the TRANSFORM once for each entry in the SET of values. Making the values
- a SET is the key to allowing NORMALIZE to perform this operation,
- otherwise you would need to do a JOIN where the join condition is the
- keyword TRUE to accomplish this task. However, in testing this with
- sizable datasets (as in the next instance of this code below), the
- NORMALIZE version was about 25% faster than using JOIN. If there is more
- than one field, then multiple SETs may be defined and the process stays
- the same.</para>
- <para>This next example does the same operation as above, but first
- generates two sizeable datasets to work with (also contained in
- Cartesian.ECL):</para>
- <programlisting>InFile1 := '~PROGGUIDE::IN::CP1';
- InFile2 := '~PROGGUIDE::IN::CP2';
- OutFile2 := '~PROGGUIDE::OUT::CP2';
- //generate data files
- rec BuildFile(rec L, INTEGER C) := TRANSFORM
- SELF.Letter := Inds2[C].Letter;
- END;
- GenCP1 := NORMALIZE(InDS1,CntInDS2,BuildFile(LEFT,COUNTER));
- GenCP2 := NORMALIZE(GenCP1,CntInDS2,BuildFile(LEFT,COUNTER));
- GenCP3 := NORMALIZE(GenCP2,CntInDS2,BuildFile(LEFT,COUNTER));
- Out1 := OUTPUT(DISTRIBUTE(GenCP3,RANDOM()),,InFile1,OVERWRITE);
- Out2 := OUTPUT(DISTRIBUTE(GenCP2,RANDOM()),,InFile2,OVERWRITE);
- // Use the generated datasets in a cartesian join:
- ds1 := DATASET(InFile1,rec,thor);
- ds2 := DATASET(InFile2,rec,thor);
- CntDS2 := COUNT(ds2);
- SetDS2 := SET(ds2,letter);
- CP2 := NORMALIZE(ds1,CntDS2,CartProd(LEFT,COUNTER));
- Out3 := OUTPUT(CP2,,OutFile2,OVERWRITE);
- SEQUENTIAL(Out1,Out2,Out3) </programlisting>
- <para>Using NORMALIZE in this case to generate the datasets is the same
- type of usage previously described in the Creating Example Data article.
- After that, the process to achieve the Cartesian Product is exactly the
- same as the previous example.</para>
- <para>Here’s an example of how this same operation can be done using JOIN
- (also contained in Cartesian.ECL):</para>
- <programlisting>// outrec joinEm(rec L, rec R) := TRANSFORM
- // SELF.LeftLetter := L.Letter;
- // SELF.RightLetter := R.Letter;
- // END;
- // ds4 := JOIN(ds1, ds2, TRUE, joinEM(LEFT, RIGHT), ALL);
- // OUTPUT(ds4);
- </programlisting>
- </sect2>
- <sect2 id="Records_Containing_Any_of_a-Set_of_Words">
- <title><emphasis role="bold">Records Containing Any of a Set of
- Words</emphasis></title>
- <para><emphasis>Part of the data cleanup problem is the possible presence
- of profanity or cartoon character names in the data. This can become an
- issue whenever you are working with data that originated from direct input
- by end-users to a website. The following code (contained in the
- BadWordSearch.ECL file) will detect the presence of any of a set of “bad”
- words in a given field:</emphasis></para>
- <programlisting>SetBadWords := ['JUNK', 'GARBAGE', 'CRAP'];
- BadWordDS := DATASET(SetBadWords,{STRING10 word});
- SearchDS := DATASET([{1,'FRED','FLINTSTONE'},
- {2,'GEORGE','JETSON'},
- {3,'CRAPOLA','NASTYGUY'},
- {4,'JUNKER','JUNKEE'},
- {5,'GARBAGEGUY','JUNKMAN'},
- {6,'FREDDY','KRUEGER'},
- {7,'TIM','JONES'},
- {8,'JOHN','SMITH'},
- {9,'MIKE','MALARKEY'},
- {10,'GEORGE','KRUEGER'}
- ],{UNSIGNED6 ID,STRING10 firstname,STRING10 lastname});
- outrec := RECORD
- SearchDS.ID;
- SearchDS.firstname;
- BOOLEAN FoundWord;
- END;
- {BOOLEAN Found} FindWord(BadWordDS L, STRING10 inword) := TRANSFORM
- SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) > 0;
- END;
- outrec CheckWords(SearchDS L) := TRANSFORM
- SELF.FoundWord := EXISTS(PROJECT(BadWordDS,
- FindWord(LEFT,L.firstname))(Found=TRUE));
- SELF := L;
- END;
- result := PROJECT(SearchDS,CheckWords(LEFT));
- OUTPUT(result(FoundWord=TRUE),NAMED('BadWordsInFirstName'));
- OUTPUT(result(FoundWord=FALSE),NAMED('NoBadWordsInFirstName'));
- </programlisting>
- <para>This code is a simple PROJECT of each record that you want to
- search. The result will be a record set containing the record ID field,
- the firstname search field, and a BOOLEAN FoundWord flag field indicating
- whether any “bad” word was found.</para>
- <para>The search itself is done by a nested PROJECT of the field to be
- searched against the DATASET of “bad” words. Using the EXISTS function to
- detect if any records are returned from that PROJECT where the returned
- Found field is TRUE sets the FoundWord flag field value.</para>
- <para>The StringLib.StringFind function simoply detects the presence
- anywhere within the search strin of any of the “bad” words. The OUTPUT of
- the set of records where the FoundWord is TRUE allows post-processing to
- evaluate whether the record is worth keeping or garbage (probably
- requiring human intervention).</para>
- <para>The above code is a specific example of this technique, but it would
- be much more useful to have a MACRO that accomplishes this task, something
- like this one (also contained in the BadWordSearch.ECL file):</para>
- <programlisting>MAC_FindBadWords(BadWordSet,InFile,IDfld,SeekFld,ResAttr,MatchType=1)
- := MACRO
- #UNIQUENAME(BadWordDS)
- %BadWordDS% := DATASET(BadWordSet,{STRING word{MAXLENGTH(50)}});
- #UNIQUENAME(outrec)
- %outrec% := RECORD
- InFile.IDfld;
- InFile.SeekFld;
- BOOLEAN FoundWord := FALSE;
- UNSIGNED2 FoundPos := 0;
- END;
- #UNIQUENAME(ChkTbl)
- %ChkTbl% := TABLE(InFile,%outrec%);
- #UNIQUENAME(FindWord)
- {BOOLEAN Found,UNSIGNED2 FoundPos} %FindWord%(%BadWordDS% L,
- INTEGER C,
- STRING inword)
- := TRANSFORM
- #IF(MatchType=1) //"contains" search
- SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) > 0;
- #END
- #IF(MatchType=2) //"exact match" search
- SELF.Found := inword = L.word;
- #END
- #IF(MatchType=3) //"starts with" search
- SELF.Found := StringLib.StringFind(inword,TRIM(L.word),1) = 1;
- #END
- SELF.FoundPos := IF(SELF.FOUND=TRUE,C,0);
- END;
- #UNIQUENAME(CheckWords)
- %outrec% %CheckWords%(%ChkTbl% L) := TRANSFORM
- WordDS := PROJECT(%BadWordDS%,%FindWord%(LEFT,COUNTER,L.SeekFld));
- SELF.FoundWord := EXISTS(WordDS(Found=TRUE));
- SELF.FoundPos := WordDS(Found=TRUE)[1].FoundPos;
- SELF := L;
- END;
- ResAttr := PROJECT(%ChkTbl%,%CheckWords%(LEFT));
- ENDMACRO; </programlisting>
- <para>This MACRO does a bit more than the previous example. It begins by
- passing in:</para>
- <para>* The set of words to find* The file to search* The unique
- identifier field for the search record* The field to search in* The
- attribute name of the resulting recordset* The type of matching to do
- (defaulting to 1)</para>
- <para>Passing in the set of words to seek allows the MACRO to operate
- against any given set of strings. Specifying the result attribute name
- allows easy post-processing of the data.</para>
- <para>Where this MACRO starts going beyond the previous example is in the
- MatchType parameter, which allows the MACRO to use the Template Language
- #IF function to generate three different kinds of searches from the same
- codebase: a “contains” search (the default), an exact match, and a “starts
- with” search.</para>
- <para>It also has an expanded output RECORD structure that includes a
- FoundPos field to contain the pointer to the first entry in the passed in
- set that matched. This allows post processing to detect positional matches
- within the set so that “matched pairs” of words can be detected, as in
- this example (also contained in the BadWordSearch.ECL file):</para>
- <programlisting>SetCartoonFirstNames := ['GEORGE','FRED', 'FREDDY'];
- SetCartoonLastNames := ['JETSON','FLINTSTONE','KRUEGER'];
- MAC_FindBadWords(SetCartoonFirstNames,SearchDS,ID,firstname,Res1,2)
- MAC_FindBadWords(SetCartoonLastNames,SearchDS,ID,lastname,Res2,2)
- Cartoons := JOIN(Res1(FoundWord=TRUE),
- Res2(FoundWord=TRUE),
- LEFT.ID=RIGHT.ID AND LEFT.FoundPos=RIGHT.FoundPos);
- MAC_FindBadWords(SetBadWords,SearchDS,ID,firstname,Res3,3)
- MAC_FindBadWords(SetBadWords,SearchDS,ID,lastname,Res4)
- SetBadGuys := SET(Cartoons,ID) +
- SET(Res3(FoundWord=TRUE),ID) +
- SET(Res4(FoundWord=TRUE),ID);
- GoodGuys := SearchDS(ID NOT IN SetBadGuys);
- BadGuys := SearchDS(ID IN SetBadGuys);
- OUTPUT(BadGuys,NAMED('BadGuys'));
- OUTPUT(GoodGuys,NAMED('GoodGuys'));
- </programlisting>
- <para>Notice that the position of the cartoon character names in their
- separate sets define a single character name to search for in multiple
- passes. Calling the MACRO twice, searching for the first and last names
- separately, allows you to post-process their results with a simple inner
- JOIN where the same record was found in each and, most importantly, the
- positional values of the matches are the same. This prevents “GEORGE
- KRUEGER” from being mis-labelled a cartoon chracter name.</para>
- </sect2>
- <sect2 id="Simple_Random_Samples">
- <title><emphasis role="bold">Simple Random Samples</emphasis></title>
- <para><emphasis>There is a statistical concept called a “Simple Random
- Sample” in which a statistically “random” (different from simply using the
- RANDOM() function) sample of records is generated from any dataset. The
- algorithm inmplemented in the following code example was provided by a
- customer.</emphasis></para>
- <para>This code is implemented as a MACRO to allow multiple samples to be
- produced in the same workunit (contained in the SimpleRandomSamples.ECL
- file):</para>
- <programlisting>SimpleRandomSample(InFile,UID_Field,SampleSize,Result) := MACRO
- //build a table of the UIDs
- #UNIQUENAME(Layout_Plus_RecID)
- %Layout_Plus_RecID% := RECORD
- UNSIGNED8 RecID := 0;
- InFile.UID_Field;
- END;
- #UNIQUENAME(InTbl)
- %InTbl% := TABLE(InFile,%Layout_Plus_RecID%);
- //then assign unique record IDs to the table entries
- #UNIQUENAME(IDRecs)
- %Layout_Plus_RecID% %IDRecs%(%Layout_Plus_RecID% L, INTEGER C) :=
- TRANSFORM
- SELF.RecID := C;
- SELF := L;
- END;
- #UNIQUENAME(UID_Recs)
- %UID_Recs% := PROJECT(%InTbl%,%IDRecs%(LEFT,COUNTER));
- //discover the number of records
- #UNIQUENAME(WholeSet)
- %WholeSet% := COUNT(InFile) : GLOBAL;
- //then generate the unique record IDs to include in the sample
- #UNIQUENAME(BlankSet)
- %BlankSet% := DATASET([{0}],{UNSIGNED8 seq});
- #UNIQUENAME(SelectEm)
- TYPEOF(%BlankSet%) %SelectEm%(%BlankSet% L, INTEGER c) := TRANSFORM
- SELF.seq := ROUNDUP(%WholeSet% * (((RANDOM()%100000)+1)/100000));
- END;
- #UNIQUENAME(selected)
- %selected% := NORMALIZE( %BlankSet%, SampleSize,
- %SelectEm%(LEFT, COUNTER));
- //then filter the original dataset by the selected UIDs
- #UNIQUENAME(SetSelectedRecs)
- %SetSelectedRecs% := SET(%UID_Recs%(RecID IN SET(%selected%,seq)),
- UID_Field);
- result := infile(UID_Field IN %SetSelectedRecs% );
- ENDMACRO;
- </programlisting>
- <para>This MACRO takes four parameters:</para>
- <para>* The name of the file to sample * The name of the unique identifier
- field in that file * The size of the sample to generate * The name of the
- attribute for the result, so that it may be post-processed</para>
- <para>The algorithm itself is fairly simple. We first create a TABLE of
- uniquely numbered unique identifier fields. Then we use NORMALIZE to
- produce a recordset of the candidate records. Which candidate is chosen
- each time the TRANSFORM function is called is determined by generating a
- “random” value between zero and one, using modulus division by one hundred
- thousand on the return from the RANDOM() function, then multiplying that
- result by the number of records to sample from, rounding up to the next
- larger integer. This determines the position of the field identifier to
- use. Once the set of positions within the TABLE is determined, they are
- used to define the SET of unique fields to use in the final result.</para>
- <para>This algorithm is designed to produce a sample “with replacement” so
- that it is possible to have a smaller number of records returned than the
- sample size requested. To produce exactly the size sample you need (that
- is, a “without replacement” sample), you can request a larger sample size
- (say, 10% larger) then use the CHOOSEN function to retrieve only the
- actual number of records required, as in this example (also contained in
- the SimpleRandomSamples.ECL file).</para>
- <programlisting>SomeFile := DATASET([{'A1'},{'B1'},{'C1'},{'D1'},{'E1'},
- {'F1'},{'G1'},{'H1'},{'I1'},{'J1'},
- {'K1'},{'L1'},{'M1'},{'N1'},{'O1'},
- {'P1'},{'Q1'},{'R1'},{'S1'},{'T1'},
- {'U1'},{'V1'},{'W1'},{'X1'},{'Y1'},
- {'A2'},{'B2'},{'C2'},{'D2'},{'E2'},
- {'F2'},{'G2'},{'H2'},{'I2'},{'J2'},
- {'K2'},{'L2'},{'M2'},{'N2'},{'O2'},
- {'P2'},{'Q2'},{'R2'},{'S2'},{'T2'},
- {'U2'},{'V2'},{'W2'},{'X2'},{'Y2'},
- {'A3'},{'B3'},{'C3'},{'D3'},{'E3'},
- {'F3'},{'G3'},{'H3'},{'I3'},{'J3'},
- {'K3'},{'L3'},{'M3'},{'N3'},{'O3'},
- {'P3'},{'Q3'},{'R3'},{'S3'},{'T3'},
- {'U3'},{'V3'},{'W3'},{'X3'},{'Y3'},
- {'A4'},{'B4'},{'C4'},{'D4'},{'E4'},
- {'F4'},{'G4'},{'H4'},{'I4'},{'J4'},
- {'K4'},{'L4'},{'M4'},{'N4'},{'O4'},
- {'P4'},{'Q4'},{'R4'},{'S4'},{'T4'},
- {'U4'},{'V4'},{'W4'},{'X4'},{'Y4'}
- ],{STRING2 Letter});
- ds := DISTRIBUTE(SomeFile,HASH(letter[2]));
- SimpleRandomSample(ds,Letter,6, res1) //ask for 6
- SimpleRandomSample(ds,Letter,6, res2)
- SimpleRandomSample(ds,Letter,6, res3)
- OUTPUT(CHOOSEN(res1,5)); //actually need 5
- OUTPUT(CHOOSEN(res3,5));
- </programlisting>
- </sect2>
- <sect2 id="Hex_String_to_Decimal_String">
- <title><emphasis role="bold">Hex String to Decimal
- String</emphasis></title>
- <para><emphasis>An email request came to me to suggest a way to convert a
- string containing Hexadecimal values to a string containing the decimal
- equivalent of that value. The problem was that this code needed to run in
- Roxie and the StringLib.String2Data plugin library fiunction was not
- available for use in Roxie queries at that time. Therefore, an all-ECL
- solution was needed.</emphasis></para>
- <para>This example function (contained in the Hex2Decimal.ECL file)
- provides that functionality, while at the same time demonstrating
- practical usage of BIG ENDIAN integers and type transfer.</para>
- <programlisting>HexStr2Decimal(STRING HexIn) := FUNCTION
- //type re-definitions to make code more readable below
- BE1 := BIG_ENDIAN UNSIGNED1;
- BE2 := BIG_ENDIAN UNSIGNED2;
- BE3 := BIG_ENDIAN UNSIGNED3;
- BE4 := BIG_ENDIAN UNSIGNED4;
- BE5 := BIG_ENDIAN UNSIGNED5;
- BE6 := BIG_ENDIAN UNSIGNED6;
- BE7 := BIG_ENDIAN UNSIGNED7;
- BE8 := BIG_ENDIAN UNSIGNED8;
- TrimHex := TRIM(HexIn,ALL);
- HexLen := LENGTH(TrimHex);
- UseHex := IF(HexLen % 2 = 1,'0','') + TrimHex;
- //a sub-function to translate two hex chars to a packed hex format
- STRING1 Str2Data(STRING2 Hex) := FUNCTION
- UNSIGNED1 N1 :=
- CASE( Hex[1],
- '0'=>00x,'1'=>10x,'2'=>20x,'3'=>30x,
- '4'=>40x,'5'=>50x,'6'=>60x,'7'=>70x,
- '8'=>80x,'9'=>90x,'A'=>0A0x,'B'=>0B0x,
- 'C'=>0C0x,'D'=>0D0x,'E'=>0E0x,'F'=>0F0x,00x);
- UNSIGNED1 N2 :=
- CASE( Hex[2],
- '0'=>00x,'1'=>01x,'2'=>02x,'3'=>03x,
- '4'=>04x,'5'=>05x,'6'=>06x,'7'=>07x,
- '8'=>08x,'9'=>09x,'A'=>0Ax,'B'=>0Bx,
- 'C'=>0Cx,'D'=>0Dx,'E'=>0Ex,'F'=>0Fx,00x);
- RETURN (>STRING1<)(N1 | N2);
- END;
- UseHexLen := LENGTH(TRIM(UseHex));
- InHex2 := Str2Data(UseHex[1..2]);
- InHex4 := InHex2 + Str2Data(UseHex[3..4]);
- InHex6 := InHex4 + Str2Data(UseHex[5..6]);
- InHex8 := InHex6 + Str2Data(UseHex[7..8]);
- InHex10 := InHex8 + Str2Data(UseHex[9..10]);;
- InHex12 := InHex10 + Str2Data(UseHex[11..12]);
- InHex14 := InHex12 + Str2Data(UseHex[13..14]);
- InHex16 := InHex14 + Str2Data(UseHex[15..16]);
- RETURN CASE(UseHexLen,
- 2 => (STRING)(>BE1<)InHex2,
- 4 => (STRING)(>BE2<)InHex4,
- 6 => (STRING)(>BE3<)InHex6,
- 8 => (STRING)(>BE4<)InHex8,
- 10 => (STRING)(>BE5<)InHex10,
- 12 => (STRING)(>BE6<)InHex12,
- 14 => (STRING)(>BE7<)InHex14,
- 16 => (STRING)(>BE8<)InHex16,
- 'ERROR');
- END;
- </programlisting>
- <para>This HexStr2Decimal FUNCTION takes a variable-length STRING
- parameter containing the hexadecimal value to evaluate. It begins by
- re-defining the eight possible sizes of unsigned BIG ENDIAN integers. This
- re-definition is purely for cosmetic purposes—to make the subsequent code
- more readable.</para>
- <para>The next three attributes detect whether an even or odd number of
- hexadecimal characters have been passed. If an odd number is passed, then
- a “0” character is prepended to the passed value to ensure the hex values
- are placed inthe correct nibbles.</para>
- <para>The Str2Data FUNCTION takes a two-character STRING parameter and
- translates each character into the appropriate hexadecimal value for each
- nibble of the resulting 1-character STRING that it returns. The first
- character defines the first nibble and the second defines the second.
- These two values are ORed together (using the bitwise | operator) then the
- result is type transferred to a one-character string, using the shorthand
- syntax— (>STRING1<) —so that the bit pattern remains untouched. The
- RETURN result from this FUNCTION is a STRING1 because each succeeding
- two-character portion of the HexStr2Decimal FUNCTION’s input parameter
- will pass through the Str2Data FUNCTION and be concatenated with all the
- preceding results.</para>
- <para>The UseHexLen attribute determines the appropriate size of BIG
- ENDIAN integer to use to translate the hex into decimal, while the InHex2
- through InHex16 attributes define the final packed-hexadecimal value to
- evaluate. The CASE function then uses that UseHexLen to determine which
- InHex attribute to use for the number of bytes of hex value passed in.
- Only even numbers of hex characters are allowed (meaning the call to the
- function would need to add a leading zero to any odd-numbered hex values
- to translate) and the maximum number of characters allowed is sixteen
- (representing an eight-byte packed hexadecimal value to translate).</para>
- <para>In all cases, the result from the InHex attribute is
- type-transferred to the appropriately sized BIG ENDIAN integer. The
- standard type cast to STRING then performs the actual value translation
- from the hexadecimal to decimal.</para>
- <para>The following calls return the indicated results:</para>
- <programlisting>OUTPUT(HexStr2Decimal('0101')); // 257
- OUTPUT(HexStr2Decimal('FF')); // 255
- OUTPUT(HexStr2Decimal('FFFF')); // 65535
- OUTPUT(HexStr2Decimal('FFFFFF')); // 16777215
- OUTPUT(HexStr2Decimal('FFFFFFFF')); // 4294967295
- OUTPUT(HexStr2Decimal('FFFFFFFFFF')); // 1099511627775
- OUTPUT(HexStr2Decimal('FFFFFFFFFFFF')); // 281474976710655
- OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFF')); // 72057594037927935
- OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFFFF')); // 18446744073709551615
- OUTPUT(HexStr2Decimal('FFFFFFFFFFFFFFFFFF')); // ERROR
- </programlisting>
- </sect2>
- </sect1>
|