DataTutorial.xml 46 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <book lang="en_US" xml:base="../">
  5. <title>HPCC Data Tutorial</title>
  6. <bookinfo>
  7. <title>HPCC Data Tutorial</title>
  8. <mediaobject>
  9. <imageobject>
  10. <imagedata fileref="images/redswooshWithLogo3.jpg" />
  11. </imageobject>
  12. </mediaobject>
  13. <author>
  14. <surname>Boca Raton Documentation Team</surname>
  15. </author>
  16. <legalnotice>
  17. <para>We welcome your comments and feedback about this document via
  18. email to <email>docfeedback@hpccsystems.com</email> Please include
  19. <emphasis role="bold">Documentation Feedback</emphasis> in the subject
  20. line and reference the document name, page numbers, and current Version
  21. Number in the text of the message.</para>
  22. <para>LexisNexis and the Knowledge Burst logo are registered trademarks
  23. of Reed Elsevier Properties Inc., used under license. Other products and
  24. services may be trademarks or registered trademarks of their respective
  25. companies. All names and example data used in this manual are
  26. fictitious. Any similarity to actual persons, living or dead, is purely
  27. coincidental.</para>
  28. <para></para>
  29. </legalnotice>
  30. <xi:include href="common/Version.xml" xpointer="FooterInfo"
  31. xmlns:xi="http://www.w3.org/2001/XInclude" />
  32. <!--Release Info makes a running page footer: now an include! -->
  33. <!--The following include statement pulls in the date_ver from version.xml-->
  34. <xi:include href="common/Version.xml" xpointer="DateVer"
  35. xmlns:xi="http://www.w3.org/2001/XInclude" />
  36. <corpname>HPCC Systems</corpname>
  37. <!--corpname never prints-->
  38. <xi:include href="common/Version.xml" xpointer="Copyright"
  39. xmlns:xi="http://www.w3.org/2001/XInclude" />
  40. <!--Copyright tag inserts the symbol automatically: Now an Include!-->
  41. <mediaobject role="logo">
  42. <imageobject>
  43. <imagedata fileref="images/LN_Rightjustified.jpg" />
  44. </imageobject>
  45. </mediaobject>
  46. </bookinfo>
  47. <chapter>
  48. <title>Introduction</title>
  49. <sect1 id="Introduction_Case-Study-and-Tutorial" role="nobrk">
  50. <title>The ECL Development Process</title>
  51. <para>This tutorial provides a walk-through of the development process,
  52. from beginning to end, and is designed to be an introduction to working
  53. with data on any HPCCSystems HPCC<footnote>
  54. <para><emphasis role="bold">H</emphasis>igh <emphasis
  55. role="bold">P</emphasis>erformance <emphasis
  56. role="bold">C</emphasis>omputing <emphasis
  57. role="bold">C</emphasis>luster (HPCC) is a massively parallel
  58. processing computing platform that solves Big Data problems. See
  59. http://www.hpccsystems.com/Why-HPCC/How-it-works for more
  60. details.</para>
  61. </footnote>. We will write code in ECL<footnote>
  62. <para><emphasis role="bold">E</emphasis>nterprise <emphasis
  63. role="bold">C</emphasis>ontrol <emphasis
  64. role="bold">L</emphasis>anguage (ECL) is a declarative, data centric
  65. programming language used to manage all aspects of the massive data
  66. joins, sorts, and builds that truly differentiate HPCC (High
  67. Performance Computing Cluster) from other technologies in its
  68. ability to provide flexible data analysis on a massive scale.</para>
  69. </footnote>to process our data and query it.</para>
  70. <para>This tutorial assumes:</para>
  71. <itemizedlist>
  72. <listitem>
  73. <para>You have a running HPCC. This can be a VM Edition or a single
  74. or multinode HPCC platform</para>
  75. </listitem>
  76. </itemizedlist>
  77. <para>• You have the ECL IDE<footnote>
  78. <para>The ECL IDE (Integrated Development Environment) is the tool
  79. used to create queries into your data and ECL files with which to
  80. build your queries.</para>
  81. </footnote> installed and configured</para>
  82. <para>In this tutorial, we will:</para>
  83. <itemizedlist mark="bullet">
  84. <listitem>
  85. <para>Download a raw data file</para>
  86. <para>There are links to data file available at <ulink
  87. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink></para>
  88. <para>The download is approximately 30 MB (compressed) and is
  89. available in either ZIP or .tar.gz format. Choose the appropriate
  90. link.</para>
  91. </listitem>
  92. <listitem>
  93. <para>Spray the file to a Data Refinery cluster HPCC clusters
  94. "spray" data into file parts on each node.</para>
  95. <para>A <emphasis>spray</emphasis> or <emphasis>import</emphasis> is
  96. the relocation of a data file from one location to an HPCC cluster.
  97. The term spray was adopted due to the nature of the file movement –
  98. the file is partitioned across all nodes within a cluster.</para>
  99. </listitem>
  100. <listitem>
  101. <para>Examine the data and determine the pre-processing we need to
  102. perform</para>
  103. </listitem>
  104. <listitem>
  105. <para>Pre-process the data to produce a new data file</para>
  106. </listitem>
  107. <listitem>
  108. <para>Determine the types of queries we want</para>
  109. </listitem>
  110. <listitem>
  111. <para>Create the queries</para>
  112. </listitem>
  113. <listitem>
  114. <para>Test the queries</para>
  115. </listitem>
  116. <listitem>
  117. <para>Deploy them to a Rapid Data Delivery Engine (RDDE) cluster,
  118. also know as a Roxie cluster.</para>
  119. </listitem>
  120. </itemizedlist>
  121. </sect1>
  122. </chapter>
  123. <chapter id="Working_with_Data">
  124. <title>Working with Data</title>
  125. <sect1 id="The_Original_Data" role="nobrk">
  126. <title>The Original Data</title>
  127. <para>In this scenario, we receive a structured data file containing
  128. records with people's names and addresses. The HPCC also supports
  129. unstructured data, but this example is simpler. This file is documented
  130. in the following table:</para>
  131. <para></para>
  132. <para><informaltable colsep="1" frame="all" rowsep="1">
  133. <tgroup cols="3">
  134. <colspec colwidth="147.60pt" />
  135. <colspec colwidth="147.60pt" />
  136. <colspec colwidth="147.60pt" />
  137. <thead>
  138. <row>
  139. <entry align="left">Field Name</entry>
  140. <entry align="left">Type</entry>
  141. <entry align="left">Description</entry>
  142. </row>
  143. </thead>
  144. <tbody>
  145. <row>
  146. <entry>FirstName</entry>
  147. <entry>15 Character String</entry>
  148. <entry>First Name</entry>
  149. </row>
  150. <row>
  151. <entry>LastName</entry>
  152. <entry>25 Character String</entry>
  153. <entry>Last name</entry>
  154. </row>
  155. <row>
  156. <entry>MiddleName</entry>
  157. <entry>15 Character String</entry>
  158. <entry>Middle Name</entry>
  159. </row>
  160. <row>
  161. <entry>Zip</entry>
  162. <entry>5 Character String</entry>
  163. <entry>ZIP Code</entry>
  164. </row>
  165. <row>
  166. <entry>Street</entry>
  167. <entry>42 Character String</entry>
  168. <entry>Street Address</entry>
  169. </row>
  170. <row>
  171. <entry>City</entry>
  172. <entry>20 Character String</entry>
  173. <entry>City</entry>
  174. </row>
  175. <row>
  176. <entry>State</entry>
  177. <entry>2 Character String</entry>
  178. <entry>State</entry>
  179. </row>
  180. </tbody>
  181. </tgroup>
  182. </informaltable></para>
  183. <para>This gives us a record length of 124 (the total of all field
  184. lengths). You will need to know this length for the <emphasis
  185. role="bold">File Spray</emphasis> process.</para>
  186. <para></para>
  187. <sect2 id="Uploading_a_file">
  188. <title>Load the Incoming Data File to your Landing Zone</title>
  189. <para>A Landing Zone (or Drop Zone) is a physical storage location
  190. defined in your HPCC's environment. A daemon (DaFileSrv) must be
  191. running on that server to enable file sprays and desprays.</para>
  192. <para>For smaller data files, maximum of 2GB, you can use the
  193. upload/download file utility in ECL Watch (a Web-based interface to
  194. your HPCC platform). The sample data file is ~100 mb.</para>
  195. <orderedlist>
  196. <listitem>
  197. <para>Download the sample data file from the HPCC Systems
  198. portal.</para>
  199. <para>The data file is available from links found on <ulink
  200. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink>.
  201. The download is approximately 30 MB (compressed) and is available
  202. in either ZIP or tar.gz format (<emphasis
  203. role="bold">OriginalPerson.tar.gz</emphasis> or <emphasis
  204. role="bold">OriginalPerson.zip</emphasis>)</para>
  205. </listitem>
  206. <listitem>
  207. <para>Extract it to a folder on your local machine.</para>
  208. </listitem>
  209. <listitem>
  210. <para>In your browser, go to the <emphasis role="bold">ECL
  211. Watch</emphasis> URL For example, http://nnn.nnn.nnn.nnn:8010,
  212. where nnn.nnn.nnn.nnn is your ESP<footnote>
  213. <para>The ESP (Enterprise Services Platform) Server is the
  214. communication layer server in you HPCC environment.</para>
  215. </footnote> Server's IP address.</para>
  216. <para><informaltable colsep="1" frame="all" rowsep="1">
  217. <?dbfo keep-together="always"?>
  218. <tgroup cols="2">
  219. <colspec colwidth="49.50pt" />
  220. <colspec />
  221. <tbody>
  222. <row>
  223. <entry><inlinegraphic
  224. fileref="images/caution.png" /></entry>
  225. <entry>Your IP address could be different from the ones
  226. provided in the example images. Please use the IP
  227. address provided by <emphasis
  228. role="bold">your</emphasis> installation.</entry>
  229. </row>
  230. </tbody>
  231. </tgroup>
  232. </informaltable></para>
  233. </listitem>
  234. <listitem>
  235. <?dbfo keep-together="always"?>
  236. <para>From ECL Watch page, click on the <emphasis
  237. role="bold">Upload/download File </emphasis> link in the menu on
  238. the left side.</para>
  239. <para><figure>
  240. <title>Upload/download</title>
  241. <mediaobject>
  242. <imageobject>
  243. <imagedata fileref="images/LZimg03-1.jpg" />
  244. </imageobject>
  245. </mediaobject>
  246. </figure></para>
  247. <para>Once you click on the Upload/download file link, it will
  248. take you to the Dropzones and Files page, where you can choose to
  249. <emphasis role="bold">Browse</emphasis> your machine for a file to
  250. upload:</para>
  251. <para><figure>
  252. <title>Dropzones and Files</title>
  253. <mediaobject>
  254. <imageobject>
  255. <imagedata fileref="images/LZimg04.jpg" />
  256. </imageobject>
  257. </mediaobject>
  258. </figure></para>
  259. </listitem>
  260. <listitem>
  261. <para>Press the <emphasis role="bold">Browse</emphasis> button to
  262. browse the files on your local machine, select the file to upload
  263. and then press the <emphasis role="bold">Open</emphasis>
  264. button.</para>
  265. <para>The file you selected should appear in the <emphasis
  266. role="bold">Select a file to upload:</emphasis> field. The data
  267. file is named: <emphasis role="bold">OriginalPerson.
  268. </emphasis></para>
  269. </listitem>
  270. <listitem>
  271. <para>Press on <emphasis role="bold">Upload Now</emphasis> to
  272. complete the file upload.</para>
  273. </listitem>
  274. </orderedlist>
  275. </sect2>
  276. <sect2 id="Spray_the_Data_File_to_your_DR-THOR_Cluster">
  277. <title>Spray the Data File to your THOR Cluster</title>
  278. <para>To use the data file in our HPCC cluster, we must first “spray”
  279. it to a Thor cluster. A <emphasis>spray</emphasis> or
  280. <emphasis>import</emphasis> is the relocation of a data file from one
  281. location to a Thor cluster. The term spray was adopted due to the
  282. nature of the file movement – the file is partitioned across all nodes
  283. within a cluster.</para>
  284. <para>In this example, the file is on your Landing Zone and is named
  285. <emphasis role="bold">OriginalPerson.</emphasis></para>
  286. <para>We are going to spray it to our Thor cluster and give it a
  287. logical name of <emphasis role="bold">tutorial::</emphasis><emphasis
  288. role="bold">YN</emphasis><emphasis
  289. role="bold">::OriginalPerson</emphasis><emphasis role="bold">
  290. </emphasis>where <emphasis role="bold">YN</emphasis> are your
  291. initials. The Distrubuted File Utility maintains a list of logical
  292. files and their corresponding physical file locations.</para>
  293. <orderedlist>
  294. <listitem>
  295. <para>Open ECL Watch in your browser using the following
  296. URL:</para>
  297. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp
  298. </emphasis><emphasis role="bold">(where nnn.nnn.nnn.nnn is your
  299. ESP Server’s IP Address and pppp is the port. The default port is
  300. 8010)</emphasis></para>
  301. </listitem>
  302. <listitem>
  303. <para>Click on the <emphasis role="bold">Spray Fixed</emphasis>
  304. hyperlink under the DFU Files menu on the left.</para>
  305. <para>The <emphasis role="bold">DFU Spray Fixed</emphasis> page
  306. displays.</para>
  307. </listitem>
  308. <listitem>
  309. <para>Using the Source <emphasis
  310. role="bold">Machine/dropzone</emphasis> drop-down list, select the
  311. Landing Zone where the file was placed.</para>
  312. <para>In the VM or Community Edition, there is only one Landing
  313. Zone.</para>
  314. <para>The IP Address is automatically filled and the Local Path is
  315. partially filled with the default folder on your landing
  316. zone.</para>
  317. </listitem>
  318. <listitem>
  319. <para>Complete the <emphasis role="bold">Local Path</emphasis> to
  320. include the complete file name or use the <emphasis
  321. role="bold">Choose File</emphasis> button to select the file from
  322. a list of files in the folder. (The file to choose is
  323. <emphasis>OriginalPerson</emphasis>)</para>
  324. </listitem>
  325. <listitem>
  326. <para>Fill in the <emphasis role="bold">Record Length</emphasis>
  327. (124).</para>
  328. </listitem>
  329. <listitem>
  330. <para>Fill in the <emphasis role="bold">Label</emphasis> using the
  331. naming convention described earlier: <emphasis
  332. role="bold">tutorial::</emphasis><emphasis
  333. role="bold">YN</emphasis><emphasis
  334. role="bold">::OriginalPerson</emphasis> (remember, <emphasis
  335. role="bold">YN</emphasis> are your initials).</para>
  336. </listitem>
  337. <listitem>
  338. <?dbfo keep-together="always"?>
  339. <para>Make sure the <emphasis
  340. role="bold">Replicate</emphasis><emphasis role="bold">
  341. </emphasis>box is checked.</para>
  342. <para><emphasis role="bold">Note:</emphasis> If replication is
  343. disabled in your Thor settings, this checkbox does not
  344. appear.</para>
  345. <para><figure>
  346. <title>Dropzones and Files</title>
  347. <mediaobject>
  348. <imageobject>
  349. <imagedata fileref="images/DTimg01.jpg" scale="85" />
  350. </imageobject>
  351. </mediaobject>
  352. </figure></para>
  353. </listitem>
  354. <listitem>
  355. <para>Press the <emphasis role="bold">Submit<emphasis role="bold">
  356. </emphasis></emphasis>button.</para>
  357. </listitem>
  358. <listitem>
  359. <?dbfo keep-together="always"?>
  360. <para>Click on the <emphasis role="bold">View Progress</emphasis>
  361. hyperlink</para>
  362. <para><figure>
  363. <title>View Progress</title>
  364. <mediaobject>
  365. <imageobject>
  366. <imagedata fileref="images/DTimg02.jpg" />
  367. </imageobject>
  368. </mediaobject>
  369. </figure>The Workunit progress page displays.</para>
  370. <para><figure>
  371. <title>Spray Complete</title>
  372. <mediaobject>
  373. <imageobject>
  374. <imagedata fileref="images/DTimg03.jpg" />
  375. </imageobject>
  376. </mediaobject>
  377. </figure> Once the spray is complete, we can proceed.</para>
  378. </listitem>
  379. </orderedlist>
  380. </sect2>
  381. </sect1>
  382. <sect1 id="Begin_Coding">
  383. <title>Begin Coding</title>
  384. <para>In this portion of the tutorial, we will write ECL code to define
  385. the data file and execute simple queries on it so we can evaluate it and
  386. determine any necessary pre-processing.</para>
  387. <orderedlist>
  388. <listitem>
  389. <para>Start the ECL IDE (Start &gt;&gt; All Programs &gt;&gt; HPCC
  390. Systems &gt;&gt; ECL IDE )</para>
  391. </listitem>
  392. <listitem>
  393. <para>Log in to your environment</para>
  394. <para>For purposes of this tutorial, let’s create a folder called
  395. <emphasis role="bold">Tutorial</emphasis><emphasis
  396. role="bold">YourName</emphasis><emphasis> </emphasis>(where
  397. <emphasis>YourName</emphasis> is your name).</para>
  398. </listitem>
  399. <listitem>
  400. <?dbfo keep-together="always"?>
  401. <para>Rt-Click on the <emphasis role="bold">My Files</emphasis>
  402. folder in the Repository<emphasis role="bold"></emphasis> window,
  403. and select <emphasis role="bold">Insert Folder</emphasis> from the
  404. pop-up menu.</para>
  405. <para><figure>
  406. <title>Insert Folder</title>
  407. <mediaobject>
  408. <imageobject>
  409. <imagedata fileref="images/DTimg04.jpg" />
  410. </imageobject>
  411. </mediaobject>
  412. </figure></para>
  413. </listitem>
  414. <listitem>
  415. <?dbfo keep-together="always"?>
  416. <para>Enter <emphasis role="bold">Tutorial</emphasis><emphasis
  417. role="bold">YourName</emphasis>(where <emphasis>YourName</emphasis>
  418. is your name)<emphasis></emphasis>for the label, then press the OK
  419. button.</para>
  420. <para><figure>
  421. <title>Enter Folder Label</title>
  422. <mediaobject>
  423. <imageobject>
  424. <imagedata fileref="images/DTimg05.jpg" />
  425. </imageobject>
  426. </mediaobject>
  427. </figure></para>
  428. </listitem>
  429. <listitem>
  430. <para>Rt-Click on the <emphasis
  431. role="bold">Tutorial</emphasis><emphasis
  432. role="bold">YourName</emphasis>Folder, and select <emphasis
  433. role="bold">Insert File</emphasis> from the pop-up menu.</para>
  434. </listitem>
  435. <listitem>
  436. <?dbfo keep-together="always"?>
  437. <para>Enter <emphasis role="bold">Layout_People</emphasis> for the
  438. label, then press the OK button.</para>
  439. <para><figure>
  440. <title>Insert File</title>
  441. <mediaobject>
  442. <imageobject>
  443. <imagedata fileref="images/DTimg06.jpg" />
  444. </imageobject>
  445. </mediaobject>
  446. </figure></para>
  447. <para>A Builder Window opens.</para>
  448. <para><figure>
  449. <title>Layout People in Builder</title>
  450. <mediaobject>
  451. <imageobject>
  452. <imagedata fileref="images/DTimg07.jpg" />
  453. </imageobject>
  454. </mediaobject>
  455. </figure></para>
  456. <para>Notice that some text has been written for you in the window.
  457. This helps you to remember that the name of the file (Layout_People)
  458. <emphasis>must always exactly match</emphasis> the name of the
  459. single EXPORT definition (Layout_People) contained in that file.
  460. This is a requirement -- one EXPORT definition per file, and its
  461. name must match the filename.</para>
  462. </listitem>
  463. <listitem>
  464. <?dbfo keep-together="always"?>
  465. <para>Write the following code in the Builder workspace:</para>
  466. <para><programlisting>EXPORT Layout_People := RECORD
  467. STRING15 FirstName;
  468. STRING25 LastName;
  469. STRING15 MiddleName;
  470. STRING5 Zip;
  471. STRING42 Street;
  472. STRING20 City;
  473. STRING2 State;
  474. END; </programlisting> <figure>
  475. <title>Code in Builder Window</title>
  476. <mediaobject>
  477. <imageobject>
  478. <imagedata fileref="images/DTimg08.jpg" />
  479. </imageobject>
  480. </mediaobject>
  481. </figure></para>
  482. </listitem>
  483. <listitem>
  484. <para>Press the syntax check button on the main toolbar (or press
  485. F7).</para>
  486. <para>It is always a good idea to check syntax before
  487. submitting.</para>
  488. <para><figure>
  489. <title>Check Syntax</title>
  490. <mediaobject>
  491. <imageobject>
  492. <imagedata fileref="images/DTimg23.jpg" />
  493. </imageobject>
  494. </mediaobject>
  495. </figure></para>
  496. <para>This file defines the record structure for the data file.
  497. Next, we will examine the data.</para>
  498. </listitem>
  499. </orderedlist>
  500. <sect2 id="Examine_the_Data" role="brk">
  501. <title>Examine the Data</title>
  502. <para>In this section, we will look at the data and determine if there
  503. is any pre-processing we want to perform on the data. This is the step
  504. in the development process where we convert the raw data into a form
  505. we can use.</para>
  506. <orderedlist>
  507. <listitem>
  508. <para>Rt-Click on the <emphasis
  509. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  510. </emphasis>Folder, and select <emphasis role="bold">Insert
  511. File</emphasis> from the pop-up menu.</para>
  512. </listitem>
  513. <listitem>
  514. <para>Enter <emphasis role="bold">File_OriginalPerson</emphasis>
  515. for the label, then press the OK button.</para>
  516. <para><figure>
  517. <title>Insert File</title>
  518. <mediaobject>
  519. <imageobject>
  520. <imagedata fileref="images/DTimg09.jpg" />
  521. </imageobject>
  522. </mediaobject>
  523. </figure>A Builder Window opens.</para>
  524. </listitem>
  525. <listitem>
  526. <para>Write the following code (remember to replace
  527. <emphasis>YN</emphasis>with your initials):</para>
  528. <para><programlisting>IMPORT TutorialYourName;
  529. EXPORT File_OriginalPerson :=
  530. DATASET('~tutorial::YN::OriginalPerson',TutorialYourName.Layout_People,THOR);
  531. </programlisting></para>
  532. <para><figure>
  533. <title>File_OriginalPerson.ecl</title>
  534. <mediaobject>
  535. <imageobject>
  536. <imagedata fileref="images/DTimg10.jpg" />
  537. </imageobject>
  538. </mediaobject>
  539. </figure></para>
  540. </listitem>
  541. <listitem>
  542. <para>Press the syntax check button on the main toolbar (or press
  543. F7) to check the syntax.</para>
  544. <para>This defines the Dataset. Next, we will examine the
  545. data.</para>
  546. </listitem>
  547. <listitem>
  548. <para>Open a new Builder Window (CTRL+N) and write the following
  549. code (remember to replace <emphasis>YourName </emphasis>with your
  550. name):</para>
  551. <programlisting>IMPORT TutorialYourName;
  552. COUNT(TutorialYourName.File_OriginalPerson);
  553. </programlisting>
  554. </listitem>
  555. <listitem>
  556. <para>Press the syntax check button on the main toolbar (or press
  557. F7) to check the syntax.</para>
  558. </listitem>
  559. <listitem>
  560. <?dbfo keep-together="always"?>
  561. <para>Make sure the selected cluster is your Thor cluster, then
  562. press the <emphasis role="bold">Submit</emphasis> button. Note
  563. that your target cluster might have a different name.</para>
  564. <para><figure>
  565. <title>Target Thor</title>
  566. <mediaobject>
  567. <imageobject>
  568. <imagedata fileref="images/DTimg11.jpg" />
  569. </imageobject>
  570. </mediaobject>
  571. </figure></para>
  572. </listitem>
  573. <listitem>
  574. <para>When the Workunit completes, it displays a green checkmark
  575. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  576. </listitem>
  577. <listitem>
  578. <para>Select the Workunit tab (the one with the number next to the
  579. checkmark) and select the <emphasis role="bold">Result
  580. 1</emphasis> tab (it may already be selected).</para>
  581. <para><figure>
  582. <title>Result tab</title>
  583. <mediaobject>
  584. <imageobject>
  585. <imagedata fileref="images/DT173-16.png" />
  586. </imageobject>
  587. </mediaobject>
  588. </figure>This shows us that there are 841,400 records in the
  589. data file.</para>
  590. </listitem>
  591. <listitem>
  592. <para>Select the Builder tab and change COUNT to OUTPUT, as shown
  593. below:</para>
  594. <para><programlisting>IMPORT TutorialYourName;
  595. <emphasis role="bold">OUTPUT</emphasis>(TutorialYourName.File_OriginalPerson);</programlisting></para>
  596. <para>Note: The modified portion is shown in <emphasis
  597. role="bold">bold</emphasis>.</para>
  598. </listitem>
  599. <listitem>
  600. <para>Check the syntax, if no errors, press the <emphasis
  601. role="bold">Submit</emphasis> button.</para>
  602. </listitem>
  603. <listitem>
  604. <?dbfo keep-together="always"?>
  605. <para>When it completes, select the Workunit tab, then select the
  606. <emphasis role="bold">Result 1</emphasis> tab.</para>
  607. <para><figure>
  608. <title>Output Results</title>
  609. <mediaobject>
  610. <imageobject>
  611. <imagedata fileref="images/DT173-17.png" />
  612. </imageobject>
  613. </mediaobject>
  614. </figure></para>
  615. <para>Notice the names are in mixed case.</para>
  616. <para>For our purposes, it will be easier to have all the names in
  617. all uppercase. This demonstrates one of the steps in the basic
  618. process of preparing data (Extract, Transform, and Load—ETL) using
  619. ECL.</para>
  620. </listitem>
  621. <listitem>
  622. <para>Close the Builder Window.</para>
  623. </listitem>
  624. </orderedlist>
  625. </sect2>
  626. <sect2 id="Process_the_Data">
  627. <title>Process the Data</title>
  628. <para>In this section, we will write code to convert the original data
  629. so that all names are in uppercase. We will then write this new file
  630. to our Thor cluster.</para>
  631. <orderedlist>
  632. <listitem>
  633. <para>Rt-Click on the <emphasis
  634. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  635. </emphasis>Folder, and select Insert File from the pop-up
  636. menu.</para>
  637. </listitem>
  638. <listitem>
  639. <para>Name this one <emphasis
  640. role="bold">BWR_ProcessRawData</emphasis> and write the following
  641. code (changing YN and YourName as before):</para>
  642. <para><programlisting>IMPORT TutorialYourName, Std;
  643. TutorialYourName.Layout_People toUpperPlease(TutorialYourName.Layout_People pInput)
  644. := TRANSFORM
  645. SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  646. SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  647. SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  648. SELF.Zip := pInput.Zip;
  649. SELF.Street := pInput.Street;
  650. SELF.City := pInput.City;
  651. SELF.State := pInput.State;
  652. END ;
  653. OrigDataset := TutorialYourName.File_OriginalPerson;
  654. UpperedDataset := PROJECT(OrigDataset,toUpperPlease(LEFT));
  655. OUTPUT(UpperedDataset,,'~tutorial::YN::TutorialPerson',OVERWRITE);
  656. </programlisting></para>
  657. </listitem>
  658. <listitem>
  659. <para>Check the syntax, if no errors press the <emphasis
  660. role="bold">Submit</emphasis> button.</para>
  661. </listitem>
  662. <listitem>
  663. <para>When it completes, select the Workunit tab, then select the
  664. Result 1 tab.</para>
  665. <para><figure>
  666. <title>Process Result</title>
  667. <mediaobject>
  668. <imageobject>
  669. <imagedata fileref="images/DT173-18.jpg" />
  670. </imageobject>
  671. </mediaobject>
  672. </figure></para>
  673. <para>The results show that the process has successfully converted
  674. the name fields to uppercase.</para>
  675. </listitem>
  676. <listitem>
  677. <para>After you examine the results, close the Builder
  678. window.</para>
  679. </listitem>
  680. </orderedlist>
  681. </sect2>
  682. <sect2 id="Using_our_Data">
  683. <title>Using our New Data</title>
  684. <para></para>
  685. <para>Now that we have our data in a useful format and the file is in
  686. place, we can write more code to use the new data file. We will
  687. determine the indexes we will need and create them. For this tutorial,
  688. let’s assume the field we need to index is the Zip code field.</para>
  689. <para></para>
  690. <para>In the DATASET definition, we will add a virtual field to the
  691. RECORD structure for the fileposition. This is required for
  692. indexes.</para>
  693. <para></para>
  694. <orderedlist>
  695. <listitem>
  696. <para>Insert a File into the <emphasis
  697. role="bold">Tutorial</emphasis><emphasis
  698. role="bold">YourName</emphasis><emphasis role="bold">
  699. </emphasis>Folder. Name it <emphasis role="bold">
  700. File_TutorialPerson </emphasis>and write this code (changing
  701. <emphasis>YN </emphasis>to your initials):</para>
  702. <para></para>
  703. <para><programlisting>IMPORT TutorialYourName;
  704. EXPORT File_TutorialPerson :=
  705. DATASET('~tutorial::YN::TutorialPerson',
  706. {TutorialYourName.Layout_People,
  707. UNSIGNED8 fpos {virtual(fileposition)}},THOR);
  708. </programlisting></para>
  709. </listitem>
  710. <listitem>
  711. <para>Check the syntax, if no errors press the <emphasis
  712. role="bold">Submit</emphasis> button.</para>
  713. </listitem>
  714. <listitem>
  715. <para>When it completes, it displays a green checkmark
  716. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  717. </listitem>
  718. </orderedlist>
  719. </sect2>
  720. <sect2 id="Index_the_Data">
  721. <title>Index the Data</title>
  722. <para>Next, we will define the INDEX.</para>
  723. <orderedlist>
  724. <listitem>
  725. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  726. role="bold">IDX_PeopleByZip</emphasis><emphasis role="bold">
  727. </emphasis>and write this code (changing <emphasis>YN</emphasis>
  728. and <emphasis>YourName</emphasis> as before):</para>
  729. <para><programlisting>IMPORT TutorialYourName;
  730. EXPORT IDX_PeopleByZIP :=
  731. INDEX(TutorialYourName.File_TutorialPerson,{zip,fpos},'~tutorial::YN::PeopleByZipINDEX');
  732. </programlisting></para>
  733. </listitem>
  734. <listitem>
  735. <para>Check the syntax.</para>
  736. <para>Next, we will build the index file.</para>
  737. </listitem>
  738. <listitem>
  739. <para>Insert a File into the <emphasis
  740. role="bold">Tutorial</emphasis><emphasis
  741. role="bold">YourName</emphasis><emphasis role="bold">
  742. </emphasis>Folder and name it <emphasis
  743. role="bold">BWR_BuildPeopleByZip </emphasis>and write this code
  744. (replacing <emphasis>YourName</emphasis> with your name):</para>
  745. <para><programlisting>IMPORT TutorialYourName;
  746. BUILDINDEX(TutorialYourName.IDX_PeopleByZIP,OVERWRITE);
  747. </programlisting></para>
  748. </listitem>
  749. <listitem>
  750. <para>Check the syntax and if there are no errors, press the
  751. <emphasis role="bold">Submit</emphasis> button.</para>
  752. </listitem>
  753. <listitem>
  754. <para>Wait for the Workunit to complete, then close the Builder
  755. Window.</para>
  756. </listitem>
  757. </orderedlist>
  758. </sect2>
  759. <sect2 id="Query_the_Data">
  760. <title>Build a Query</title>
  761. <para>Now that we have an index file, we will write a query that uses
  762. it.</para>
  763. <orderedlist>
  764. <listitem>
  765. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  766. role="bold">BWR_FetchPeopleByZip </emphasis>and write this code
  767. (changing <emphasis>YourName</emphasis> as before):</para>
  768. <para><programlisting>IMPORT TutorialYourName;
  769. ZipFilter :='33024';
  770. FetchPeopleByZip :=
  771. FETCH(TutorialYourName.File_TutorialPerson,
  772. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  773. RIGHT.fpos);
  774. OUTPUT(FetchPeopleByZip);
  775. </programlisting></para>
  776. </listitem>
  777. <listitem>
  778. <para>Check the syntax and if there are no errors, press the
  779. <emphasis role="bold">Submit</emphasis> button.</para>
  780. </listitem>
  781. <listitem>
  782. <para>When it completes, select the Workunit<emphasis role="bold">
  783. </emphasis>tab, then select the <emphasis
  784. role="bold">Result</emphasis> tab.</para>
  785. </listitem>
  786. <listitem>
  787. <para>Examine the result, then close the Builder window.</para>
  788. <para><emphasis role="bold">Note</emphasis>: You can change the
  789. value of the <emphasis role="bold">ZipValue</emphasis> field to
  790. get results from different Zip codes.</para>
  791. </listitem>
  792. </orderedlist>
  793. </sect2>
  794. </sect1>
  795. <sect1 id="Publishing_your_Query">
  796. <title>Publishing your Query</title>
  797. <para>Now that we have created an indexed query, the next step is to
  798. enable access to it through a Web interface.</para>
  799. <para>Our STORED variables provide a means to pass values as query
  800. parameters. In this example, the user can supply the ZIP code so the
  801. results are people from that ZIP code.</para>
  802. <orderedlist>
  803. <listitem>
  804. <para>Insert a File into the <emphasis role="bold">TutorialYourName
  805. </emphasis>Folder and name it <emphasis
  806. role="bold">FetchPeopleByZipService</emphasis></para>
  807. </listitem>
  808. <listitem>
  809. <para>Write this code (changing <emphasis>YourName</emphasis> as
  810. before):</para>
  811. <para><programlisting>IMPORT TutorialYourName;
  812. STRING10 ZipFilter := '' :STORED('ZIPValue');
  813. resultSet :=
  814. FETCH(TutorialYourName.File_TutorialPerson,
  815. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  816. RIGHT.fpos);
  817. OUTPUT(resultset);
  818. </programlisting></para>
  819. </listitem>
  820. <listitem>
  821. <para>Check the syntax, and save the file.</para>
  822. </listitem>
  823. <listitem>
  824. <para>Press the <emphasis role="bold">Submit</emphasis><emphasis
  825. role="bold"> </emphasis>button.</para>
  826. </listitem>
  827. <listitem>
  828. <para>When the workunit completes, select the Workunit<emphasis
  829. role="bold"> </emphasis>tab, then select the ECL Watch tab.</para>
  830. </listitem>
  831. <listitem>
  832. <?dbfo keep-together="always"?>
  833. <para>Press the <emphasis role="bold">Publish</emphasis> button, you
  834. may need to scroll down the main window.</para>
  835. <para><figure>
  836. <title>Publish Workunit</title>
  837. <mediaobject>
  838. <imageobject>
  839. <imagedata fileref="images/DTimg12.jpg" />
  840. </imageobject>
  841. </mediaobject>
  842. </figure></para>
  843. </listitem>
  844. <listitem>
  845. <para>When the workunit is published, a notice dialog
  846. displays.</para>
  847. <para><figure>
  848. <title>Workunit Published</title>
  849. <mediaobject>
  850. <imageobject>
  851. <imagedata fileref="images/DT173-18b.png" />
  852. </imageobject>
  853. </mediaobject>
  854. </figure></para>
  855. </listitem>
  856. </orderedlist>
  857. <sect2 id="Execute-using-the-Data-Delivery-Engine">
  858. <title>Execute using WsECL</title>
  859. <para>Now that the query is published, we can run it using the WsECL
  860. Web service. WsECL provides a Web-based interface to your published
  861. query. It also automatically creates an entry form to execute the
  862. query.</para>
  863. <para>Using the following URL:</para>
  864. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  865. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  866. Default port is 8002)</emphasis></para>
  867. <para></para>
  868. <para><figure>
  869. <title>WsECL</title>
  870. <mediaobject>
  871. <imageobject>
  872. <imagedata fileref="images/DTimg13.jpg" />
  873. </imageobject>
  874. </mediaobject>
  875. </figure></para>
  876. <para></para>
  877. <orderedlist>
  878. <listitem>
  879. <para>Click on the + sign next to <emphasis
  880. role="bold">thor</emphasis> to expand the tree.</para>
  881. </listitem>
  882. <listitem>
  883. <?dbfo keep-together="always"?>
  884. <para>Click on the <emphasis
  885. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  886. <para>The form for the service displays.</para>
  887. <para><figure>
  888. <title>Service Form</title>
  889. <mediaobject>
  890. <imageobject>
  891. <imagedata fileref="images/DTimg14a.jpg" />
  892. </imageobject>
  893. </mediaobject>
  894. </figure></para>
  895. </listitem>
  896. <listitem>
  897. <para>Provide a zip code (e.g., 33024) in the <emphasis
  898. role="bold">zipvalue</emphasis> field, , select <emphasis
  899. role="bold">Output Tables</emphasis> from the droplist, then press
  900. the <emphasis role="bold">Submit</emphasis> button.</para>
  901. <para>The results display.</para>
  902. <para><figure>
  903. <title>Results</title>
  904. <mediaobject>
  905. <imageobject>
  906. <imagedata fileref="images/DTimg15a.jpg" />
  907. </imageobject>
  908. </mediaobject>
  909. </figure></para>
  910. </listitem>
  911. </orderedlist>
  912. </sect2>
  913. </sect1>
  914. <sect1 id="Deploy_the_Roxie_Query">
  915. <title>Compile and Publish the Roxie Query</title>
  916. <para>The final step in this process is to publish the indexed query to
  917. a Rapid Data Delivery Engine (Roxie) Cluster.</para>
  918. <para>We will recompile the code with Roxie as the target cluster, then
  919. publish it to a Roxie cluster. <orderedlist>
  920. <listitem>
  921. <para>In the ECL IDE, select the Builder tab on the
  922. FetchPeopleByZipService file builder window,</para>
  923. </listitem>
  924. <listitem>
  925. <para>Using the <emphasis role="bold">Target</emphasis> droplist,
  926. select Roxie as the Target cluster.</para>
  927. <para><figure>
  928. <title>Target Roxie</title>
  929. <mediaobject>
  930. <imageobject>
  931. <imagedata fileref="images/DTimg16.jpg" />
  932. </imageobject>
  933. </mediaobject>
  934. </figure></para>
  935. </listitem>
  936. <listitem>
  937. <para>In the Builder window, in the upper left corner the
  938. <emphasis role="bold">Submit</emphasis> button has a drop down
  939. arrow next to it. Select the arrow to expose the <emphasis
  940. role="bold">Compile</emphasis> option.</para>
  941. <figure>
  942. <title>Compile</title>
  943. <mediaobject>
  944. <imageobject>
  945. <imagedata fileref="images/DTimg17.jpg" />
  946. </imageobject>
  947. </mediaobject>
  948. </figure>
  949. </listitem>
  950. <listitem>
  951. <para>Select <emphasis role="bold">Compile</emphasis></para>
  952. </listitem>
  953. <listitem>
  954. <?dbfo keep-together="always"?>
  955. <para>When the workunit finishes, it will display a green circle
  956. indicating it has compiled.</para>
  957. <para><figure>
  958. <title>Compiled</title>
  959. <mediaobject>
  960. <imageobject>
  961. <imagedata fileref="images/DTimg18.jpg" />
  962. </imageobject>
  963. </mediaobject>
  964. </figure></para>
  965. </listitem>
  966. </orderedlist></para>
  967. <sect2 id="Deploy_the_Query_to_Roxie">
  968. <title>Publish the Roxie query</title>
  969. <para>Next we will publish the query to a Roxie Cluster.</para>
  970. <orderedlist>
  971. <listitem>
  972. <para>Select the workunit tab for the FetchPeopleByZipService that
  973. you just compiled.</para>
  974. </listitem>
  975. <listitem>
  976. <para>Select the ECL Watch tab.</para>
  977. </listitem>
  978. <listitem>
  979. <?dbfo keep-together="always"?>
  980. <para>Press the <emphasis role="bold">Publish</emphasis> button
  981. (you may need to scroll down the main window)</para>
  982. <para><figure>
  983. <title>Publish Query</title>
  984. <mediaobject>
  985. <imageobject>
  986. <imagedata fileref="images/DTimg19.jpg" />
  987. </imageobject>
  988. </mediaobject>
  989. </figure>When it successfully publishes, you will see:</para>
  990. <para><figure>
  991. <title>Workunit Published</title>
  992. <mediaobject>
  993. <imageobject>
  994. <imagedata fileref="images/DT173-18b.png" />
  995. </imageobject>
  996. </mediaobject>
  997. </figure></para>
  998. </listitem>
  999. </orderedlist>
  1000. </sect2>
  1001. <sect2 id="Run_the_Roxie_Query" role="brk">
  1002. <title>Run the Roxie Query in WsECL</title>
  1003. <para>Now that the query is deployed to a Roxie cluster, we can run it
  1004. using the WS-ECL service Using the following URL:</para>
  1005. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  1006. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  1007. The default port is 8002)</emphasis></para>
  1008. <orderedlist>
  1009. <listitem>
  1010. <para>Click on the + sign next to <emphasis
  1011. role="bold">myroxie</emphasis> to expand the tree.</para>
  1012. </listitem>
  1013. <listitem>
  1014. <?dbfo keep-together="always"?>
  1015. <para>Click on the <emphasis
  1016. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  1017. <para>The form for the service displays.</para>
  1018. <para><figure>
  1019. <title>RoxieECL</title>
  1020. <mediaobject>
  1021. <imageobject>
  1022. <imagedata fileref="images/DTimg21.jpg" />
  1023. </imageobject>
  1024. </mediaobject>
  1025. </figure></para>
  1026. </listitem>
  1027. <listitem>
  1028. <?dbfo keep-together="always"?>
  1029. <para>Provide a zip code (e.g., 33024), select <emphasis
  1030. role="bold">Output Tables</emphasis> from the droplist, and press
  1031. the Submit button.</para>
  1032. <para>The results display.</para>
  1033. <para><figure>
  1034. <title>RoxieResults</title>
  1035. <mediaobject>
  1036. <imageobject>
  1037. <imagedata fileref="images/DTimg22.jpg" />
  1038. </imageobject>
  1039. </mediaobject>
  1040. </figure></para>
  1041. </listitem>
  1042. </orderedlist>
  1043. </sect2>
  1044. </sect1>
  1045. </chapter>
  1046. <chapter id="Summary">
  1047. <title>Summary</title>
  1048. <para>Now that you have successfully processed raw data, sprayed it onto a
  1049. cluster, and deployed it to a RDDE cluster, what’s next?</para>
  1050. <!-- -->
  1051. <para>Here is a short list of suggestions on the path you might take from
  1052. here:</para>
  1053. <itemizedlist mark="bullet">
  1054. <listitem>
  1055. <para>Create indexes on other fields and create queries using
  1056. them.</para>
  1057. </listitem>
  1058. </itemizedlist>
  1059. <itemizedlist mark="bullet">
  1060. <listitem>
  1061. <para>Write client applications to access your queries using JSON or
  1062. SOAP interfaces.</para>
  1063. </listitem>
  1064. </itemizedlist>
  1065. <itemizedlist mark="bullet">
  1066. <listitem>
  1067. <para>Looks at the resources available on the Links tab</para>
  1068. <para><figure>
  1069. <title>Links</title>
  1070. <mediaobject>
  1071. <imageobject>
  1072. <imagedata fileref="images/DTimg24.jpg" />
  1073. </imageobject>
  1074. </mediaobject>
  1075. </figure>The Links tab provides easy access to a form, a Sample
  1076. Request, a Sample Response, the WSDL, the XML Schema (XSD) and
  1077. more...</para>
  1078. </listitem>
  1079. </itemizedlist>
  1080. <itemizedlist mark="bullet">
  1081. <listitem>
  1082. <para>Follow the procedures in this tutorial using your own
  1083. data!</para>
  1084. </listitem>
  1085. </itemizedlist>
  1086. </chapter>
  1087. </book>