PrG_Workwith_XML_Data.xml 20 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="Working_with_XML_Data">
  5. <title>Working with XML Data</title>
  6. <para>Data is not always handed to you in nice, easy-to-work-with,
  7. fixed-length flat files; it comes in many forms. One form growing in usage
  8. every day is XML. ECL has a number of ways of handling XML data—some obvious
  9. and some not so obvious.</para>
  10. <para><emphasis role="bold">NOTE:</emphasis> XML reading and parsing can
  11. consume a large amount of memory, depending on the usage. In particular, if
  12. the specified XPATH matches a very large amount of data, then a large data
  13. structure will be provided to the transform. Therefore, the more you match,
  14. the more resources you consume per match. For example, if you have a very
  15. large document and you match an element near the root that virtually
  16. encompasses the whole thing, then the whole thing will be constructed as a
  17. referenceable structure that the ECL can get at.</para>
  18. <sect2 id="Simple_XML_Data_Handling">
  19. <title>Simple XML Data Handling</title>
  20. <para>The XML options on DATASET and OUTPUT allow you to easily work with
  21. simple XML data. For example, an XML file that looks like this (this data
  22. generated by the code in GenData.ECL):</para>
  23. <programlisting>&lt;?xml version=1.0 ...?&gt;
  24. &lt;timezones&gt;
  25. &lt;area&gt;
  26. &lt;code&gt;
  27. 215
  28. &lt;/code&gt;
  29. &lt;state&gt;
  30. PA
  31. &lt;/state&gt;
  32. &lt;description&gt;
  33. Pennsylvania (Philadelphia area)
  34. &lt;/description&gt;
  35. &lt;zone&gt;
  36. Eastern Time Zone
  37. &lt;/zone&gt;
  38. &lt;/area&gt;
  39. &lt;area&gt;
  40. &lt;code&gt;
  41. 216
  42. &lt;/code&gt;
  43. &lt;state&gt;
  44. OH
  45. &lt;/state&gt;
  46. &lt;description&gt;
  47. Ohio (Cleveland area)
  48. &lt;/description&gt;
  49. &lt;zone&gt;
  50. Eastern Time Zone
  51. &lt;/zone&gt;
  52. &lt;/area&gt;
  53. &lt;/timezones&gt;
  54. </programlisting>
  55. <para>This file can be declared for use in your ECL code (as this file is
  56. declared as the TimeZonesXML DATASET declared in the DeclareData MODULE
  57. Structure) like this:</para>
  58. <programlisting>EXPORT TimeZonesXML :=
  59. DATASET('~PROGGUIDE::EXAMPLEDATA::XML_timezones',
  60. {STRING code,
  61. STRING state,
  62. STRING description,
  63. STRING timezone{XPATH('zone')}},
  64. XML('timezones/area') );
  65. </programlisting>
  66. <para>This makes the data contained within each XML tag in the file
  67. available for use in your ECL code just like any flat-file dataset. The
  68. field names in the RECORD structure (in this case, in-lined in the DATASET
  69. declaration) duplicate the tag names in the file. The use of the XPATH
  70. modifier on the timezone field allows us to specify that the field comes
  71. from the &lt;zone&gt; tag. This mechanism allows us to name fields
  72. differently from their tag names.</para>
  73. <para>By defining the fields as STRING types without specifying their
  74. length, you can be sure you're getting all the data—including any
  75. carriage-returns, line feeds, and tabs in the XML file that are contained
  76. within the field tags (as are present in this file). This simple OUTPUT
  77. shows the result (this and all subsequent code examples in this article
  78. are contained in the XMLcode.ECL file).</para>
  79. <programlisting>IMPORT $;
  80. ds := $.DeclareData.timezonesXML;
  81. OUTPUT(ds);</programlisting>
  82. <para>Notice that the result displayed in the ECL IDE program contains
  83. squares in the data—these are the carriage-returns, line feeds, and tabs
  84. in the data. You can get rid of the extraneous carriage-returns, line
  85. feeds, and tabs by simply passing the records through a PROJECT operation,
  86. like this:</para>
  87. <programlisting>StripIt(STRING str) := REGEXREPLACE('[\r\n\t]',str,'$1');
  88. RECORDOF(ds) DoStrip(ds L) := TRANSFORM
  89. SELF.code := StripIt(L.code);
  90. SELF.state := StripIt(L.state);
  91. SELF.description := StripIt(L.description);
  92. SELF.timezone := StripIt(L.timezone);
  93. END;
  94. StrippedRecs := PROJECT(ds,DoStrip(LEFT));
  95. OUTPUT(StrippedRecs);
  96. </programlisting>
  97. <para>The use of the REGEXREPLACE function makes the process very simple.
  98. Its first parameter is a standard Perl regular expression representing the
  99. characters to look for: carriage return (\r), line feed (\n), and tab
  100. (\t).</para>
  101. <para>You can now operate on the StrippedRecs recordset (or
  102. ProgGuide.TimeZonesXML dataset) just as you would with any other. For
  103. example, you might want to simply filter out unnecessary fields and
  104. records and write the result to a new XML file to pass on, something like
  105. this:</para>
  106. <programlisting>InterestingRecs := StrippedRecs((INTEGER)code BETWEEN 301 AND 303);
  107. OUTPUT(InterestingRecs,{code,timezone},
  108. '~PROGGUIDE::EXAMPLEDATA::OUT::timezones300',
  109. XML('area',HEADING('&lt;?xml version=1.0 ...?&gt;\n&lt;timezones&gt;\n','&lt;/timezones&gt;')),OVERWRITE);
  110. </programlisting>
  111. <para>The resulting XML file looks like this:</para>
  112. <programlisting>&lt;?xml version=1.0 ...?&gt;
  113. &lt;timezones&gt;
  114. &lt;area&gt;&lt;code&gt;301&lt;/code&gt;&lt;zone&gt;Eastern Time Zone&lt;/zone&gt;&lt;/area&gt;
  115. &lt;area&gt;&lt;code&gt;302&lt;/code&gt;&lt;zone&gt;Eastern Time Zone&lt;/zone&gt;&lt;/area&gt;
  116. &lt;area&gt;&lt;code&gt;303&lt;/code&gt;&lt;zone&gt;Mountain Time Zone&lt;/zone&gt;&lt;/area&gt;
  117. &lt;/timezones&gt;
  118. </programlisting>
  119. </sect2>
  120. <sect2 id="Complex_XML_Data_Handling">
  121. <title>Complex XML Data Handling</title>
  122. <para>You can create much more complex XML output by using the CSV option
  123. on OUTPUT instead of the XML option. The XML option will only produce the
  124. straight-forward style of XML shown above. However, some applications
  125. require the use of XML attributes inside the tags. This code demonstrates
  126. how to produce that format:</para>
  127. <programlisting>CRLF := (STRING)x'0D0A';
  128. OutRec := RECORD
  129. STRING Line;
  130. END;
  131. OutRec DoComplexXML(InterestingRecs L) := TRANSFORM
  132. SELF.Line := ' &lt;area code="' + L.code + '"&gt;' + CRLF +
  133. ' &lt;zone&gt;' + L.timezone + '&lt;/zone&gt;' + CRLF +
  134. ' &lt;/area&gt;';
  135. END;
  136. ComplexXML := PROJECT(InterestingRecs,DoComplexXML(LEFT));
  137. OUTPUT(ComplexXML,,'~PROGGUIDE::EXAMPLEDATA::OUT::Complextimezones301',
  138. CSV(HEADING('&lt;?xml version=1.0 ...?&gt;'+CRLF+'&lt;timezones&gt;'+CRLF,'&lt;/timezones&gt;')),OVERWRITE);
  139. </programlisting>
  140. <para>The RECORD structure defines a single output field to contain each
  141. logical XML record that you build with the TRANSFORM function. The PROJECT
  142. operation builds all of the individual output records, then the CSV option
  143. on the OUTPUT action specifies the file header and footer records (in this
  144. case, the XML file tags) and you get the result shown here:</para>
  145. <programlisting>&lt;?xml version=1.0 ...?&gt;
  146. &lt;timezones&gt;
  147. &lt;area code="301"&gt;
  148. &lt;zone&gt;Eastern Time Zone&lt;/zone&gt;
  149. &lt;/area&gt;
  150. &lt;area code="302"&gt;
  151. &lt;zone&gt;Eastern Time Zone&lt;/zone&gt;
  152. &lt;/area&gt;
  153. &lt;area code="303"&gt;
  154. &lt;zone&gt;Mountain Time Zone&lt;/zone&gt;
  155. &lt;/area&gt;
  156. &lt;/timezones&gt;
  157. </programlisting>
  158. <para>So, if using the CSV option is the way to OUTPUT complex XML data
  159. formats, how can you access existing complex-format XML data and use ECL
  160. to work with it?</para>
  161. <para>The answer lies in using the XPATH option on field definitions in
  162. the input RECORD structure, like this:</para>
  163. <programlisting>NewTimeZones :=
  164. DATASET('~PROGGUIDE::EXAMPLEDATA::OUT::Complextimezones301',
  165. {STRING area {XPATH('&lt;&gt;')}},
  166. XML('timezones/area'));
  167. </programlisting>
  168. <para>The specified {XPATH('&lt;&gt;')} option basically says “give me
  169. everything that's in this XML tag, including the tags themselves” so that
  170. you can then use ECL to parse through the text to do your work. The
  171. NewTimeZones data records look like this one (since it includes all the
  172. carriage return/line feeds) when you do a simple OUTPUT and copy the
  173. record to a text editor:</para>
  174. <programlisting>&lt;area code="301"&gt;
  175. &lt;zone&gt;Eastern Time Zone&lt;/zone&gt;
  176. &lt;/area&gt;</programlisting>
  177. <para>You can then use any of the string handling functions in ECL or the
  178. Service Library functions in StringLib or UnicodeLib (see the
  179. <emphasis>Services Library Reference</emphasis>) to work with the text.
  180. However, the more powerful ECL text parsing tool is the PARSE function,
  181. allowing you to define regular expressions and/or ECL PATTERN attribute
  182. definitions to process the data.</para>
  183. <para>This example uses the TRANSFORM version of PARSE to get at the XML
  184. data:</para>
  185. <programlisting>{ds.code, ds.timezone} Xform(NewTimeZones L) := TRANSFORM
  186. SELF.code := XMLTEXT('@code');
  187. SELF.timezone := XMLTEXT('zone');
  188. END;
  189. ParsedZones := PARSE(NewTimeZones,area,Xform(LEFT),XML('area'));
  190. OUTPUT(ParsedZones);
  191. </programlisting>
  192. <para>In this code we're using the XML form of PARSE and its associated
  193. XMLTEXT function to parse the data from the complex XML structure. The
  194. parameter to XMLTEXT is the XPATH to the data we're interested in (the
  195. major subset of the XPATH standard that ECL supports is documented in the
  196. Language Reference in the RECORD structure discussion).</para>
  197. </sect2>
  198. <sect2 id="Input_with_Complex_XML_Formats">
  199. <title>Input with Complex XML Formats</title>
  200. <para>XML data comes in many possible formats, and some of them make use
  201. of “child datasets” such that a given tag may contain multiple instances
  202. of other tags that contain individual field tags themselves.</para>
  203. <para>Here's an example of such a complex structure using UCC data. An
  204. individual Filing may contain one or more Transactions, which in turn may
  205. contain multiple Debtor and SecuredParty records:</para>
  206. <programlisting>&lt;UCC&gt;
  207. &lt;Filing number='5200105'&gt;
  208. &lt;Transaction ID='5'&gt;
  209. &lt;StartDate&gt;08/01/2001&lt;/StartDate&gt;
  210. &lt;LapseDate&gt;08/01/2006&lt;/LapseDate&gt;
  211. &lt;FormType&gt;UCC 1 FILING STATEMENT&lt;/FormType&gt;
  212. &lt;AmendType&gt;NONE&lt;/AmendType&gt;
  213. &lt;AmendAction&gt;NONE&lt;/AmendAction&gt;
  214. &lt;EnteredDate&gt;08/02/2002&lt;/EnteredDate&gt;
  215. &lt;ReceivedDate&gt;08/01/2002&lt;/ReceivedDate&gt;
  216. &lt;ApprovedDate&gt;08/02/2002&lt;/ApprovedDate&gt;
  217. &lt;Debtor entityId='19'&gt;
  218. &lt;IsBusiness&gt;true&lt;/IsBusiness&gt;
  219. &lt;OrgName&gt;&lt;![CDATA[BOGUS LABORATORIES, INC.]]&gt;&lt;/OrgName&gt;
  220. &lt;Status&gt;ACTIVE&lt;/Status&gt;
  221. &lt;Address1&gt;&lt;![CDATA[334 SOUTH 900 WEST]]&gt;&lt;/Address1&gt;
  222. &lt;Address4&gt;&lt;![CDATA[SALT LAKE CITY 45 84104]]&gt;&lt;/Address4&gt;
  223. &lt;City&gt;&lt;![CDATA[SALT LAKE CITY]]&gt;&lt;/City&gt;
  224. &lt;State&gt;UTAH&lt;/State&gt;
  225. &lt;Zip&gt;84104&lt;/Zip&gt;
  226. &lt;OrgType&gt;CORP&lt;/OrgType&gt;
  227. &lt;OrgJurisdiction&gt;&lt;![CDATA[SALT LAKE CITY]]&gt;&lt;/OrgJurisdiction&gt;
  228. &lt;OrgID&gt;654245-0142&lt;/OrgID&gt;
  229. &lt;EnteredDate&gt;08/02/2002&lt;/EnteredDate&gt;
  230. &lt;/Debtor&gt;
  231. &lt;Debtor entityId='7'&gt;
  232. &lt;IsBusiness&gt;false&lt;/IsBusiness&gt;
  233. &lt;FirstName&gt;&lt;![CDATA[FRED]]&gt;&lt;/FirstName&gt;
  234. &lt;LastName&gt;&lt;![CDATA[JONES]]&gt;&lt;/LastName&gt;
  235. &lt;Status&gt;ACTIVE&lt;/Status&gt;
  236. &lt;Address1&gt;&lt;![CDATA[1038 E. 900 N.]]&gt;&lt;/Address1&gt;
  237. &lt;Address4&gt;&lt;![CDATA[OGDEN 45 84404]]&gt;&lt;/Address4&gt;
  238. &lt;City&gt;&lt;![CDATA[OGDEN]]&gt;&lt;/City&gt;
  239. &lt;State&gt;UTAH&lt;/State&gt;
  240. &lt;Zip&gt;84404&lt;/Zip&gt;
  241. &lt;OrgType&gt;NONE&lt;/OrgType&gt;
  242. &lt;EnteredDate&gt;08/02/2002&lt;/EnteredDate&gt;
  243. &lt;/Debtor&gt;
  244. &lt;SecuredParty entityId='20'&gt;
  245. &lt;IsBusiness&gt;true&lt;/IsBusiness&gt;
  246. &lt;OrgName&gt;&lt;![CDATA[WELLS FARGO BANK]]&gt;&lt;/OrgName&gt;
  247. &lt;Status&gt;ACTIVE&lt;/Status&gt;
  248. &lt;Address1&gt;&lt;![CDATA[ATTN: LOAN OPERATIONS CENTER]]&gt;&lt;/Address1&gt;
  249. &lt;Address3&gt;&lt;![CDATA[P.O. BOX 9120]]&gt;&lt;/Address3&gt;
  250. &lt;Address4&gt;&lt;![CDATA[BOISE 13 83707-2203]]&gt;&lt;/Address4&gt;
  251. &lt;City&gt;&lt;![CDATA[BOISE]]&gt;&lt;/City&gt;
  252. &lt;State&gt;IDAHO&lt;/State&gt;
  253. &lt;Zip&gt;83707-2203&lt;/Zip&gt;
  254. &lt;Status&gt;ACTIVE&lt;/Status&gt;
  255. &lt;EnteredDate&gt;08/02/2002&lt;/EnteredDate&gt;
  256. &lt;/SecuredParty&gt;
  257. &lt;Collateral&gt;
  258. &lt;Action&gt;ADD&lt;/Action&gt;
  259. &lt;Description&gt;&lt;![CDATA[ALL ACCOUNTS]]&gt;&lt;/Description&gt;
  260. &lt;EffectiveDate&gt;08/01/2002&lt;/EffectiveDate&gt;
  261. &lt;/Collateral&gt;
  262. &lt;/Transaction&gt;
  263. &lt;Transaction ID='375799'&gt;
  264. &lt;StartDate&gt;08/01/2002&lt;/StartDate&gt;
  265. &lt;LapseDate&gt;08/01/2006&lt;/LapseDate&gt;
  266. &lt;FormType&gt;UCC 3 AMENDMENT&lt;/FormType&gt;
  267. &lt;AmendType&gt;TERMINATION BY DEBTOR&lt;/AmendType&gt;
  268. &lt;AmendAction&gt;NONE&lt;/AmendAction&gt;
  269. &lt;EnteredDate&gt;02/23/2004&lt;/EnteredDate&gt;
  270. &lt;ReceivedDate&gt;02/18/2004&lt;/ReceivedDate&gt;
  271. &lt;ApprovedDate&gt;02/23/2004&lt;/ApprovedDate&gt;
  272. &lt;/Transaction&gt;
  273. &lt;/Filing&gt;
  274. &lt;/UCC&gt;
  275. </programlisting>
  276. <para>The key to working with this type of complex XML data are the RECORD
  277. structures that define the layout of the XML data.</para>
  278. <programlisting>CollateralRec := RECORD
  279. STRING Action {XPATH('Action')};
  280. STRING Description {XPATH('Description')};
  281. STRING EffectiveDate {XPATH('EffectiveDate')};
  282. END;
  283. PartyRec := RECORD
  284. STRING PartyID {XPATH('@entityId')};
  285. STRING IsBusiness {XPATH('IsBusiness')};
  286. STRING OrgName {XPATH('OrgName')};
  287. STRING FirstName {XPATH('FirstName')};
  288. STRING LastName {XPATH('LastName')};
  289. STRING Status {XPATH('Status[1]')};
  290. STRING Address1 {XPATH('Address1')};
  291. STRING Address2 {XPATH('Address2')};
  292. STRING Address3 {XPATH('Address3')};
  293. STRING Address4 {XPATH('Address4')};
  294. STRING City {XPATH('City')};
  295. STRING State {XPATH('State')};
  296. STRING Zip {XPATH('Zip')};
  297. STRING OrgType {XPATH('OrgType')};
  298. STRING OrgJurisdiction {XPATH('OrgJurisdiction')};
  299. STRING OrgID {XPATH('OrgID')};
  300. STRING10 EnteredDate {XPATH('EnteredDate')};
  301. END;
  302. TransactionRec := RECORD
  303. STRING TransactionID {XPATH('@ID')};
  304. STRING10 StartDate {XPATH('StartDate')};
  305. STRING10 LapseDate {XPATH('LapseDate')};
  306. STRING FormType {XPATH('FormType')};
  307. STRING AmendType {XPATH('AmendType')};
  308. STRING AmendAction {XPATH('AmendAction')};
  309. STRING10 EnteredDate {XPATH('EnteredDate')};
  310. STRING10 ReceivedDate {XPATH('ReceivedDate')};
  311. STRING10 ApprovedDate {XPATH('ApprovedDate')};
  312. DATASET(PartyRec) Debtors {XPATH('Debtor')};
  313. DATASET(PartyRec) SecuredParties {XPATH('SecuredParty')};
  314. CollateralRec Collateral {XPATH('Collateral')}
  315. END;
  316. UCC_Rec := RECORD
  317. STRING FilingNumber {XPATH('@number')};
  318. DATASET(TransactionRec) Transactions {XPATH('Transaction')};
  319. END;
  320. UCC := DATASET('~PROGGUIDE::EXAMPLEDATA::XML_UCC',UCC_Rec,XML('UCC/Filing'));
  321. </programlisting>
  322. <para>Building from the bottom up, these RECORD structures combine to
  323. create the final UCC_Rec layout that defines the entire format of this XML
  324. data.</para>
  325. <para>The XML option on the final DATASET declaration specifies the XPATH
  326. to the record tag (Filing) then the child DATASET “field” definitions in
  327. the RECORD structures handle the multiple instance issues. Because ECL is
  328. case insensitive and XML syntax is case sensitive, it is necessary to use
  329. the XPATH to define all the field tags. The PartyRec RECORD structure
  330. works with both the Debtors and SecuredParties child DATASET fields
  331. because both contain the same tags and information.</para>
  332. <para>Once you've defined the layout, how can you extract the data into a
  333. normalized relational structure to work with it in the supercomputer?
  334. NORMALIZE is the answer. NORMALIZE needs to know how many times to call
  335. its TRANSFORM, so you must use the TABLE function to get the counts, like
  336. this:</para>
  337. <programlisting>XactTbl := TABLE(UCC,{INTEGER XactCount := COUNT(Transactions), UCC});
  338. OUTPUT(XactTbl);</programlisting>
  339. <para>This TABLE function gets the counts of the multiple Transaction
  340. records per Filing so that we can use NORMALIZE to extract them into a
  341. table of their own.</para>
  342. <programlisting>Out_Transacts := RECORD
  343. STRING FilingNumber;
  344. STRING TransactionID;
  345. STRING10 StartDate;
  346. STRING10 LapseDate;
  347. STRING FormType;
  348. STRING AmendType;
  349. STRING AmendAction;
  350. STRING10 EnteredDate;
  351. STRING10 ReceivedDate;
  352. STRING10 ApprovedDate;
  353. DATASET(PartyRec) Debtors;
  354. DATASET(PartyRec) SecuredParties;
  355. CollateralRec Collateral;
  356. END;
  357. Out_Transacts Get_Transacts(XactTbl L, INTEGER C) := TRANSFORM
  358. SELF.FilingNumber := L.FilingNumber;
  359. SELF := L.Transactions[C];
  360. END;
  361. Transacts := NORMALIZE(XactTbl,LEFT.XactCount,Get_Transacts(LEFT,COUNTER));
  362. OUTPUT(Transacts);
  363. </programlisting>
  364. <para>This NORMALIZE extracts all the Transactions into a separate
  365. recordset with just one Transaction per record with the parent information
  366. (the Filing number) appended. However, each record here still contains
  367. multiple Debtor and SecuredParty child records.</para>
  368. <programlisting>PartyCounts := TABLE(Transacts,
  369. {INTEGER DebtorCount := COUNT(Debtors),
  370. INTEGER PartyCount := COUNT(SecuredParties),
  371. Transacts});
  372. OUTPUT(PartyCounts);
  373. </programlisting>
  374. <para>This TABLE function gets the counts of the multiple Debtor and
  375. SecuredParty records for each Transaction.</para>
  376. <programlisting>Out_Parties := RECORD
  377. STRING FilingNumber;
  378. STRING TransactionID;
  379. PartyRec;
  380. END;
  381. Out_Parties Get_Debtors(PartyCounts L, INTEGER C) := TRANSFORM
  382. SELF.FilingNumber := L.FilingNumber;
  383. SELF.TransactionID := L.TransactionID;
  384. SELF := L.Debtors[C];
  385. END;
  386. TransactDebtors := NORMALIZE( PartyCounts,
  387. LEFT.DebtorCount,
  388. Get_Debtors(LEFT,COUNTER));
  389. OUTPUT(TransactDebtors);
  390. </programlisting>
  391. <para>This NORMALIZE extracts all the Debtors into a separate
  392. recordset.</para>
  393. <programlisting>Out_Parties Get_Parties(PartyCounts L, INTEGER C) := TRANSFORM
  394. SELF.FilingNumber := L.FilingNumber;
  395. SELF.TransactionID := L.TransactionID;
  396. SELF := L.SecuredParties[C];
  397. END;
  398. TransactParties := NORMALIZE(PartyCounts,
  399. LEFT.PartyCount,
  400. Get_Parties(LEFT,COUNTER));
  401. OUTPUT(TransactParties);
  402. </programlisting>
  403. <para>This NORMALIZE extracts all the SecuredParties into a separate
  404. recordset. With this, we've now broken out all the child records into
  405. their own normalized relational structure that we can work with
  406. easily.</para>
  407. </sect2>
  408. <sect2 id="Piping_to_Third-Party_Tools">
  409. <title>Piping to Third-Party Tools</title>
  410. <para>One other way to work with XML data is to use third-party tools that
  411. you have adapted for use in the supercomputer so that you have the
  412. advantage of working with previously proven technology and the benefit of
  413. running that technology in parallel on all the supercomputer nodes at
  414. once.</para>
  415. <para>The technique is simple: just define the input file as a data stream
  416. and use the PIPE option on DATASET to process the data in its native form.
  417. Once the processing is complete, you can OUTPUT the result in whatever
  418. form it comes out of the third-party tool, something like this example
  419. code (non-functional):</para>
  420. <programlisting>Rec := RECORD
  421. STRING1 char;
  422. END;
  423. TimeZones := DATASET('timezones.xml',Rec,PIPE('ThirdPartyTool.exe'));
  424. OUTPUT(TimeZones,,'ProcessedTimezones.xml');
  425. </programlisting>
  426. <para>The key to this technique is the STRING1 field definition. This
  427. makes the input and output just a 1-byte-at-a-time data stream that flows
  428. into the third-party tool and back out of your ECL code in its native
  429. format. You don't even need to know what that format is. You could also
  430. use this technique with the PIPE option on OUTPUT.</para>
  431. </sect2>
  432. </sect1>