DataTutorial.xml 47 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384138513861387138813891390139113921393
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <book lang="en_US" xml:base="../">
  5. <title>HPCC Data Tutorial</title>
  6. <bookinfo>
  7. <title>HPCC Data Tutorial</title>
  8. <mediaobject>
  9. <imageobject>
  10. <imagedata fileref="images/redswooshWithLogo3.jpg" />
  11. </imageobject>
  12. </mediaobject>
  13. <author>
  14. <surname>Boca Raton Documentation Team</surname>
  15. </author>
  16. <legalnotice>
  17. <para>We welcome your comments and feedback about this document via
  18. email to <email>docfeedback@hpccsystems.com</email> Please include
  19. <emphasis role="bold">Documentation Feedback</emphasis> in the subject
  20. line and reference the document name, page numbers, and current Version
  21. Number in the text of the message.</para>
  22. <para>LexisNexis and the Knowledge Burst logo are registered trademarks
  23. of Reed Elsevier Properties Inc., used under license. Other products and
  24. services may be trademarks or registered trademarks of their respective
  25. companies. All names and example data used in this manual are
  26. fictitious. Any similarity to actual persons, living or dead, is purely
  27. coincidental.</para>
  28. <para></para>
  29. </legalnotice>
  30. <xi:include href="common/Version.xml" xpointer="FooterInfo"
  31. xmlns:xi="http://www.w3.org/2001/XInclude" />
  32. <!--Release Info makes a running page footer: now an include! -->
  33. <!--The following include statement pulls in the date_ver from version.xml-->
  34. <xi:include href="common/Version.xml" xpointer="DateVer"
  35. xmlns:xi="http://www.w3.org/2001/XInclude" />
  36. <corpname>HPCC Systems</corpname>
  37. <!--corpname never prints-->
  38. <xi:include href="common/Version.xml" xpointer="Copyright"
  39. xmlns:xi="http://www.w3.org/2001/XInclude" />
  40. <!--Copyright tag inserts the symbol automatically: Now an Include!-->
  41. <mediaobject role="logo">
  42. <imageobject>
  43. <imagedata fileref="images/LN_Rightjustified.jpg" />
  44. </imageobject>
  45. </mediaobject>
  46. </bookinfo>
  47. <chapter>
  48. <title>Introduction</title>
  49. <sect1 id="Introduction_Case-Study-and-Tutorial" role="nobrk">
  50. <title>The ECL Development Process</title>
  51. <para>This tutorial provides a walk-through of the development process,
  52. from beginning to end, and is designed to be an introduction to working
  53. with data on any HPCCSystems HPCC<footnote>
  54. <para><emphasis role="bold">H</emphasis>igh <emphasis
  55. role="bold">P</emphasis>erformance <emphasis
  56. role="bold">C</emphasis>omputing <emphasis
  57. role="bold">C</emphasis>luster (HPCC) is a massively parallel
  58. processing computing platform that solves Big Data problems. See
  59. http://www.hpccsystems.com/Why-HPCC/How-it-works for more
  60. details.</para>
  61. </footnote>. We will write code in ECL<footnote>
  62. <para><emphasis role="bold">E</emphasis>nterprise <emphasis
  63. role="bold">C</emphasis>ontrol <emphasis
  64. role="bold">L</emphasis>anguage (ECL) is a declarative, data centric
  65. programming language used to manage all aspects of the massive data
  66. joins, sorts, and builds that truly differentiate HPCC (High
  67. Performance Computing Cluster) from other technologies in its
  68. ability to provide flexible data analysis on a massive scale.</para>
  69. </footnote>to process our data and query it.</para>
  70. <para>This tutorial assumes:</para>
  71. <itemizedlist>
  72. <listitem>
  73. <para>You have a running HPCC. This can be a VM Edition or a single
  74. or multinode HPCC platform</para>
  75. </listitem>
  76. </itemizedlist>
  77. <para>• You have the ECL IDE<footnote>
  78. <para>The ECL IDE (Integrated Development Environment) is the tool
  79. used to create queries into your data and ECL files with which to
  80. build your queries.</para>
  81. </footnote> installed and configured</para>
  82. <para>In this tutorial, we will:</para>
  83. <itemizedlist mark="bullet">
  84. <listitem>
  85. <para>Download a raw data file</para>
  86. <para>There are links to data file available at <ulink
  87. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink></para>
  88. <para>The download is approximately 30 MB (compressed) and is
  89. available in either ZIP or .tar.gz format. Choose the appropriate
  90. link.</para>
  91. </listitem>
  92. <listitem>
  93. <para>Spray the file to a Data Refinery cluster HPCC clusters
  94. "spray" data into file parts on each node.</para>
  95. <para>A <emphasis>spray</emphasis> or <emphasis>import</emphasis> is
  96. the relocation of a data file from one location to an HPCC cluster.
  97. The term spray was adopted due to the nature of the file movement –
  98. the file is partitioned across all nodes within a cluster.</para>
  99. </listitem>
  100. <listitem>
  101. <para>Examine the data and determine the pre-processing we need to
  102. perform</para>
  103. </listitem>
  104. <listitem>
  105. <para>Pre-process the data to produce a new data file</para>
  106. </listitem>
  107. <listitem>
  108. <para>Determine the types of queries we want</para>
  109. </listitem>
  110. <listitem>
  111. <para>Create the queries</para>
  112. </listitem>
  113. <listitem>
  114. <para>Test the queries</para>
  115. </listitem>
  116. <listitem>
  117. <para>Deploy them to a Rapid Data Delivery Engine (RDDE) cluster,
  118. also know as a Roxie cluster.</para>
  119. </listitem>
  120. </itemizedlist>
  121. </sect1>
  122. </chapter>
  123. <chapter id="Working_with_Data">
  124. <title>Working with Data</title>
  125. <sect1 id="The_Original_Data" role="nobrk">
  126. <title>The Original Data</title>
  127. <para>In this scenario, we receive a structured data file containing
  128. records with people's names and addresses. The HPCC also supports
  129. unstructured data, but this example is simpler. This file is documented
  130. in the following table:</para>
  131. <para></para>
  132. <para><informaltable colsep="1" frame="all" rowsep="1">
  133. <tgroup cols="3">
  134. <colspec colwidth="147.60pt" />
  135. <colspec colwidth="147.60pt" />
  136. <colspec colwidth="147.60pt" />
  137. <thead>
  138. <row>
  139. <entry align="left">Field Name</entry>
  140. <entry align="left">Type</entry>
  141. <entry align="left">Description</entry>
  142. </row>
  143. </thead>
  144. <tbody>
  145. <row>
  146. <entry>FirstName</entry>
  147. <entry>15 Character String</entry>
  148. <entry>First Name</entry>
  149. </row>
  150. <row>
  151. <entry>LastName</entry>
  152. <entry>25 Character String</entry>
  153. <entry>Last name</entry>
  154. </row>
  155. <row>
  156. <entry>MiddleName</entry>
  157. <entry>15 Character String</entry>
  158. <entry>Middle Name</entry>
  159. </row>
  160. <row>
  161. <entry>Zip</entry>
  162. <entry>5 Character String</entry>
  163. <entry>ZIP Code</entry>
  164. </row>
  165. <row>
  166. <entry>Street</entry>
  167. <entry>42 Character String</entry>
  168. <entry>Street Address</entry>
  169. </row>
  170. <row>
  171. <entry>City</entry>
  172. <entry>20 Character String</entry>
  173. <entry>City</entry>
  174. </row>
  175. <row>
  176. <entry>State</entry>
  177. <entry>2 Character String</entry>
  178. <entry>State</entry>
  179. </row>
  180. </tbody>
  181. </tgroup>
  182. </informaltable></para>
  183. <para>This gives us a record length of 124 (the total of all field
  184. lengths). You will need to know this length for the <emphasis
  185. role="bold">File Spray</emphasis> process.</para>
  186. <para></para>
  187. <sect2 id="Uploading_a_file">
  188. <title>Load the Incoming Data File to your Landing Zone</title>
  189. <para>A Landing Zone (or Drop Zone) is a physical storage location
  190. defined in your HPCC's environment. A daemon (DaFileSrv) must be
  191. running on that server to enable file sprays and desprays.</para>
  192. <para>For smaller data files, maximum of 2GB, you can use the
  193. upload/download file utility in ECL Watch (a Web-based interface to
  194. your HPCC platform). The sample data file is ~100 mb.</para>
  195. <orderedlist>
  196. <listitem>
  197. <para>Download the sample data file from the HPCC Systems
  198. portal.</para>
  199. <para>The data file is available from links found on <ulink
  200. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink>.
  201. The download is approximately 30 MB (compressed) and is available
  202. in either ZIP or tar.gz format (<emphasis
  203. role="bold">OriginalPerson.tar.gz</emphasis> or <emphasis
  204. role="bold">OriginalPerson.zip</emphasis>)</para>
  205. </listitem>
  206. <listitem>
  207. <para>Extract it to a folder on your local machine.</para>
  208. </listitem>
  209. <listitem>
  210. <para>In your browser, go to the <emphasis role="bold">ECL
  211. Watch</emphasis> URL. For example, http://nnn.nnn.nnn.nnn:8010,
  212. where nnn.nnn.nnn.nnn is your ESP<footnote>
  213. <para>The ESP (Enterprise Services Platform) Server is the
  214. communication layer server in you HPCC environment.</para>
  215. </footnote> Server's IP address.</para>
  216. <para><informaltable colsep="1" frame="all" rowsep="1">
  217. <?dbfo keep-together="always"?>
  218. <tgroup cols="2">
  219. <colspec colwidth="49.50pt" />
  220. <colspec />
  221. <tbody>
  222. <row>
  223. <entry><inlinegraphic
  224. fileref="images/caution.png" /></entry>
  225. <entry>Your IP address could be different from the ones
  226. provided in the example images. Please use the IP
  227. address provided by <emphasis
  228. role="bold">your</emphasis> installation.</entry>
  229. </row>
  230. </tbody>
  231. </tgroup>
  232. </informaltable></para>
  233. </listitem>
  234. <listitem>
  235. <?dbfo keep-together="always"?>
  236. <para>From ECL Watch page, click on the <emphasis
  237. role="bold">Upload/download File </emphasis> link in the menu on
  238. the left side.</para>
  239. <para><figure>
  240. <title>Upload/download</title>
  241. <mediaobject>
  242. <imageobject>
  243. <imagedata fileref="images/LZimg03-1.jpg"
  244. vendor="eclwatchSS" />
  245. </imageobject>
  246. </mediaobject>
  247. </figure></para>
  248. <para>Once you click on the Upload/download file link, it will
  249. take you to the Dropzones and Files page, where you can choose to
  250. <emphasis role="bold">Browse</emphasis> your machine for a file to
  251. upload:</para>
  252. <para><figure>
  253. <title>Dropzones and Files</title>
  254. <mediaobject>
  255. <imageobject>
  256. <imagedata fileref="images/LZimg04.jpg"
  257. vendor="eclwatchSS" />
  258. </imageobject>
  259. </mediaobject>
  260. </figure></para>
  261. </listitem>
  262. <listitem>
  263. <para>Press the <emphasis role="bold">Browse</emphasis> button to
  264. browse the files on your local machine, select the file to upload
  265. and then press the <emphasis role="bold">Open</emphasis>
  266. button.</para>
  267. <para>The file you selected should appear in the <emphasis
  268. role="bold">Select a file to upload:</emphasis> field. The data
  269. file is named: <emphasis role="bold">OriginalPerson.
  270. </emphasis></para>
  271. </listitem>
  272. <listitem>
  273. <para>Press on <emphasis role="bold">Upload Now</emphasis> to
  274. complete the file upload.</para>
  275. </listitem>
  276. </orderedlist>
  277. </sect2>
  278. <sect2 id="Spray_the_Data_File_to_your_DR-THOR_Cluster">
  279. <title>Spray the Data File to your THOR Cluster</title>
  280. <para>To use the data file in our HPCC cluster, we must first “spray”
  281. it to a Thor cluster. A <emphasis>spray</emphasis> or
  282. <emphasis>import</emphasis> is the relocation of a data file from one
  283. location to a Thor cluster. The term spray was adopted due to the
  284. nature of the file movement – the file is partitioned across all nodes
  285. within a cluster.</para>
  286. <para>In this example, the file is on your Landing Zone and is named
  287. <emphasis role="bold">OriginalPerson.</emphasis></para>
  288. <para>We are going to spray it to our Thor cluster and give it a
  289. logical name of <emphasis role="bold">tutorial::</emphasis><emphasis
  290. role="bold">YN</emphasis><emphasis
  291. role="bold">::OriginalPerson</emphasis><emphasis role="bold">
  292. </emphasis>where <emphasis role="bold">YN</emphasis> are your
  293. initials. The Distrubuted File Utility maintains a list of logical
  294. files and their corresponding physical file locations.</para>
  295. <orderedlist>
  296. <listitem>
  297. <para>Open ECL Watch in your browser using the following
  298. URL:</para>
  299. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp
  300. </emphasis><emphasis role="bold">(where nnn.nnn.nnn.nnn is your
  301. ESP Server’s IP Address and pppp is the port. The default port is
  302. 8010)</emphasis></para>
  303. </listitem>
  304. <listitem>
  305. <para>Click on the <emphasis role="bold">Spray Fixed</emphasis>
  306. hyperlink under the DFU Files menu on the left.</para>
  307. <para>The <emphasis role="bold">DFU Spray Fixed</emphasis> page
  308. displays.</para>
  309. </listitem>
  310. <listitem>
  311. <para>Using the Source <emphasis
  312. role="bold">Machine/dropzone</emphasis> drop list, select the
  313. Landing Zone where the file was placed.</para>
  314. <para>In the VM or Community Edition, there is only one Landing
  315. Zone.</para>
  316. <para>The IP Address is automatically filled and the Local Path is
  317. partially filled with the default folder on your landing
  318. zone.</para>
  319. </listitem>
  320. <listitem>
  321. <para>Complete the <emphasis role="bold">Local Path</emphasis> to
  322. include the complete file name or use the <emphasis
  323. role="bold">Choose File</emphasis> button to select the file from
  324. a list of files in the folder. (The file to choose is
  325. <emphasis>OriginalPerson</emphasis>)</para>
  326. </listitem>
  327. <listitem>
  328. <para>Fill in the <emphasis role="bold">Record Length</emphasis>
  329. (124).</para>
  330. </listitem>
  331. <listitem>
  332. <para>Fill in the <emphasis role="bold">Label</emphasis> using the
  333. naming convention described earlier: <emphasis
  334. role="bold">tutorial::</emphasis><emphasis
  335. role="bold">YN</emphasis><emphasis
  336. role="bold">::OriginalPerson</emphasis> (remember, <emphasis
  337. role="bold">YN</emphasis> are your initials).</para>
  338. </listitem>
  339. <listitem>
  340. <?dbfo keep-together="always"?>
  341. <para>Make sure the <emphasis
  342. role="bold">Replicate</emphasis><emphasis role="bold">
  343. </emphasis>box is checked.</para>
  344. <para><emphasis role="bold">Note:</emphasis> This option is only
  345. available on systems where replication has been enabled.</para>
  346. <para><figure>
  347. <title>Dropzones and Files</title>
  348. <mediaobject>
  349. <imageobject>
  350. <imagedata fileref="images/DTimg01.jpg" />
  351. </imageobject>
  352. </mediaobject>
  353. </figure></para>
  354. </listitem>
  355. <listitem>
  356. <para>Press the <emphasis role="bold">Submit<emphasis role="bold">
  357. </emphasis></emphasis>button.</para>
  358. </listitem>
  359. <listitem>
  360. <?dbfo keep-together="always"?>
  361. <para>Click on the <emphasis role="bold">View Progress</emphasis>
  362. hyperlink</para>
  363. <para><figure>
  364. <title>View Progress</title>
  365. <mediaobject>
  366. <imageobject>
  367. <imagedata fileref="images/DTimg02.jpg"
  368. vendor="eclwatchSS" />
  369. </imageobject>
  370. </mediaobject>
  371. </figure>The Workunit progress page displays.</para>
  372. <para><figure>
  373. <title>Spray Complete</title>
  374. <mediaobject>
  375. <imageobject>
  376. <imagedata fileref="images/DTimg03.jpg"
  377. vendor="eclwatchSS" />
  378. </imageobject>
  379. </mediaobject>
  380. </figure> Once the spray is complete, we can proceed.</para>
  381. </listitem>
  382. </orderedlist>
  383. </sect2>
  384. </sect1>
  385. <sect1 id="Begin_Coding">
  386. <title>Begin Coding</title>
  387. <para>In this portion of the tutorial, we will write ECL code to define
  388. the data file and execute simple queries on it so we can evaluate it and
  389. determine any necessary pre-processing.</para>
  390. <orderedlist>
  391. <listitem>
  392. <para>Start the ECL IDE (Start &gt;&gt; All Programs &gt;&gt; HPCC
  393. Systems &gt;&gt; ECL IDE )</para>
  394. </listitem>
  395. <listitem>
  396. <para>Log in to your environment</para>
  397. <para>For purposes of this tutorial, let’s create a folder called
  398. <emphasis role="bold">Tutorial</emphasis><emphasis
  399. role="bold">YourName</emphasis><emphasis> </emphasis>(where
  400. <emphasis>YourName</emphasis> is your name).</para>
  401. </listitem>
  402. <listitem>
  403. <?dbfo keep-together="always"?>
  404. <para>Rt-Click on the <emphasis role="bold">My Files</emphasis>
  405. folder in the Repository<emphasis role="bold"></emphasis> window,
  406. and select <emphasis role="bold">Insert Folder</emphasis> from the
  407. pop-up menu.</para>
  408. <para><figure>
  409. <title>Insert Folder</title>
  410. <mediaobject>
  411. <imageobject>
  412. <imagedata fileref="images/DTimg04.jpg" />
  413. </imageobject>
  414. </mediaobject>
  415. </figure></para>
  416. </listitem>
  417. <listitem>
  418. <?dbfo keep-together="always"?>
  419. <para>Enter <emphasis role="bold">Tutorial</emphasis><emphasis
  420. role="bold">YourName</emphasis>(where <emphasis>YourName</emphasis>
  421. is your name)<emphasis></emphasis>for the label, then press the OK
  422. button.</para>
  423. <para><figure>
  424. <title>Enter Folder Label</title>
  425. <mediaobject>
  426. <imageobject>
  427. <imagedata fileref="images/DTimg05.jpg" />
  428. </imageobject>
  429. </mediaobject>
  430. </figure></para>
  431. </listitem>
  432. <listitem>
  433. <para>Rt-Click on the <emphasis
  434. role="bold">Tutorial</emphasis><emphasis
  435. role="bold">YourName</emphasis>Folder, and select <emphasis
  436. role="bold">Insert File</emphasis> from the pop-up menu.</para>
  437. </listitem>
  438. <listitem>
  439. <?dbfo keep-together="always"?>
  440. <para>Enter <emphasis role="bold">Layout_People</emphasis> for the
  441. label, then press the OK button.</para>
  442. <para><figure>
  443. <title>Insert File</title>
  444. <mediaobject>
  445. <imageobject>
  446. <imagedata fileref="images/DTimg06.jpg" />
  447. </imageobject>
  448. </mediaobject>
  449. </figure></para>
  450. <para>A Builder Window opens.</para>
  451. <para><figure>
  452. <title>Layout People in Builder</title>
  453. <mediaobject>
  454. <imageobject>
  455. <imagedata fileref="images/DTimg07.jpg" />
  456. </imageobject>
  457. </mediaobject>
  458. </figure></para>
  459. <para>Notice that some text has been written for you in the window.
  460. This helps you to remember that the name of the file (Layout_People)
  461. <emphasis>must always exactly match</emphasis> the name of the
  462. single EXPORT definition (Layout_People) contained in that file.
  463. This is a requirement -- one EXPORT definition per file, and its
  464. name must match the filename.</para>
  465. </listitem>
  466. <listitem>
  467. <?dbfo keep-together="always"?>
  468. <para>Write the following code in the Builder workspace:</para>
  469. <para><programlisting>EXPORT Layout_People := RECORD
  470. STRING15 FirstName;
  471. STRING25 LastName;
  472. STRING15 MiddleName;
  473. STRING5 Zip;
  474. STRING42 Street;
  475. STRING20 City;
  476. STRING2 State;
  477. END; </programlisting> <figure>
  478. <title>Code in Builder Window</title>
  479. <mediaobject>
  480. <imageobject>
  481. <imagedata fileref="images/DTimg08.jpg" />
  482. </imageobject>
  483. </mediaobject>
  484. </figure></para>
  485. </listitem>
  486. <listitem>
  487. <para>Press the syntax check button on the main toolbar (or press
  488. F7).</para>
  489. <para>It is always a good idea to check syntax before
  490. submitting.</para>
  491. <para><figure>
  492. <title>Check Syntax</title>
  493. <mediaobject>
  494. <imageobject>
  495. <imagedata fileref="images/DTimg23.jpg" />
  496. </imageobject>
  497. </mediaobject>
  498. </figure></para>
  499. <para>This file defines the record structure for the data file.
  500. Next, we will examine the data.</para>
  501. </listitem>
  502. </orderedlist>
  503. <sect2 id="Examine_the_Data" role="brk">
  504. <title>Examine the Data</title>
  505. <para>In this section, we will look at the data and determine if there
  506. is any pre-processing we want to perform on the data. This is the step
  507. in the development process where we convert the raw data into a form
  508. we can use.</para>
  509. <orderedlist>
  510. <listitem>
  511. <para>Rt-Click on the <emphasis
  512. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  513. </emphasis>Folder, and select <emphasis role="bold">Insert
  514. File</emphasis> from the pop-up menu.</para>
  515. </listitem>
  516. <listitem>
  517. <para>Enter <emphasis role="bold">File_OriginalPerson</emphasis>
  518. for the label, then press the OK button.</para>
  519. <para><figure>
  520. <title>Insert File</title>
  521. <mediaobject>
  522. <imageobject>
  523. <imagedata fileref="images/DTimg09.jpg" />
  524. </imageobject>
  525. </mediaobject>
  526. </figure>A Builder Window opens.</para>
  527. </listitem>
  528. <listitem>
  529. <para>Write the following code (remember to replace
  530. <emphasis>YN</emphasis>with your initials):</para>
  531. <para><programlisting>IMPORT TutorialYourName;
  532. EXPORT File_OriginalPerson :=
  533. DATASET('~tutorial::YN::OriginalPerson',TutorialYourName.Layout_People,THOR);
  534. </programlisting></para>
  535. <para><figure>
  536. <title>File_OriginalPerson.ecl</title>
  537. <mediaobject>
  538. <imageobject>
  539. <imagedata fileref="images/DTimg10.jpg" />
  540. </imageobject>
  541. </mediaobject>
  542. </figure></para>
  543. </listitem>
  544. <listitem>
  545. <para>Press the syntax check button on the main toolbar (or press
  546. F7) to check the syntax.</para>
  547. <para>This defines the Dataset. Next, we will examine the
  548. data.</para>
  549. </listitem>
  550. <listitem>
  551. <para>Open a new Builder Window (CTRL+N) and write the following
  552. code (remember to replace <emphasis>YourName </emphasis>with your
  553. name):</para>
  554. <programlisting>IMPORT TutorialYourName;
  555. COUNT(TutorialYourName.File_OriginalPerson);
  556. </programlisting>
  557. </listitem>
  558. <listitem>
  559. <para>Press the syntax check button on the main toolbar (or press
  560. F7) to check the syntax.</para>
  561. </listitem>
  562. <listitem>
  563. <?dbfo keep-together="always"?>
  564. <para>Make sure the selected cluster is your Thor cluster, then
  565. press the <emphasis role="bold">Submit</emphasis> button. Note
  566. that your target cluster might have a different name.</para>
  567. <para><figure>
  568. <title>Target Thor</title>
  569. <mediaobject>
  570. <imageobject>
  571. <imagedata fileref="images/DTimg11.jpg" />
  572. </imageobject>
  573. </mediaobject>
  574. </figure></para>
  575. </listitem>
  576. <listitem>
  577. <para>When the Workunit completes, it displays a green checkmark
  578. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  579. </listitem>
  580. <listitem>
  581. <para>Select the Workunit tab (the one with the number next to the
  582. checkmark) and select the <emphasis role="bold">Result
  583. 1</emphasis> tab (it may already be selected).</para>
  584. <para><figure>
  585. <title>Result tab</title>
  586. <mediaobject>
  587. <imageobject>
  588. <imagedata fileref="images/DT173-16.png" />
  589. </imageobject>
  590. </mediaobject>
  591. </figure>This shows us that there are 841,400 records in the
  592. data file.</para>
  593. </listitem>
  594. <listitem>
  595. <para>Select the Builder tab and change COUNT to OUTPUT, as shown
  596. below:</para>
  597. <para><programlisting>IMPORT TutorialYourName;
  598. <emphasis role="bold">OUTPUT</emphasis>(TutorialYourName.File_OriginalPerson);</programlisting></para>
  599. <para>Note: The modified portion is shown in <emphasis
  600. role="bold">bold</emphasis>.</para>
  601. </listitem>
  602. <listitem>
  603. <para>Check the syntax, if no errors, press the <emphasis
  604. role="bold">Submit</emphasis> button.</para>
  605. </listitem>
  606. <listitem>
  607. <?dbfo keep-together="always"?>
  608. <para>When it completes, select the Workunit tab, then select the
  609. <emphasis role="bold">Result 1</emphasis> tab.</para>
  610. <para><figure>
  611. <title>Output Results</title>
  612. <mediaobject>
  613. <imageobject>
  614. <imagedata fileref="images/DT173-17.png" />
  615. </imageobject>
  616. </mediaobject>
  617. </figure></para>
  618. <para>Notice the names are in mixed case.</para>
  619. <para>For our purposes, it will be easier to have all the names in
  620. all uppercase. This demonstrates one of the steps in the basic
  621. process of preparing data (Extract, Transform, and Load—ETL) using
  622. ECL.</para>
  623. </listitem>
  624. <listitem>
  625. <para>Close the Builder Window.</para>
  626. </listitem>
  627. </orderedlist>
  628. </sect2>
  629. <sect2 id="Process_the_Data">
  630. <title>Process the Data</title>
  631. <para>In this section, we will write code to convert the original data
  632. so that all names are in uppercase. We will then write this new file
  633. to our Thor cluster.</para>
  634. <orderedlist>
  635. <listitem>
  636. <para>Rt-Click on the <emphasis
  637. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  638. </emphasis>Folder, and select Insert File from the pop-up
  639. menu.</para>
  640. </listitem>
  641. <listitem>
  642. <para>Name this one <emphasis
  643. role="bold">BWR_ProcessRawData</emphasis> and write the following
  644. code (changing YN and YourName as before):</para>
  645. <para><programlisting>IMPORT TutorialYourName, Std;
  646. TutorialYourName.Layout_People toUpperPlease(TutorialYourName.Layout_People pInput)
  647. := TRANSFORM
  648. SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  649. SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  650. SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  651. SELF.Zip := pInput.Zip;
  652. SELF.Street := pInput.Street;
  653. SELF.City := pInput.City;
  654. SELF.State := pInput.State;
  655. END ;
  656. OrigDataset := TutorialYourName.File_OriginalPerson;
  657. UpperedDataset := PROJECT(OrigDataset,toUpperPlease(LEFT));
  658. OUTPUT(UpperedDataset,,'~tutorial::YN::TutorialPerson',OVERWRITE);
  659. </programlisting></para>
  660. </listitem>
  661. <listitem>
  662. <para>Check the syntax, if no errors press the <emphasis
  663. role="bold">Submit</emphasis> button.</para>
  664. </listitem>
  665. <listitem>
  666. <para>When it completes, select the Workunit tab, then select the
  667. Result 1 tab.</para>
  668. <para><figure>
  669. <title>Process Result</title>
  670. <mediaobject>
  671. <imageobject>
  672. <imagedata fileref="images/DT173-18.jpg" />
  673. </imageobject>
  674. </mediaobject>
  675. </figure></para>
  676. <para>The results show that the process has successfully converted
  677. the name fields to uppercase.</para>
  678. </listitem>
  679. <listitem>
  680. <para>After you examine the results, close the Builder
  681. window.</para>
  682. </listitem>
  683. </orderedlist>
  684. </sect2>
  685. <sect2 id="Using_our_Data">
  686. <title>Using our New Data</title>
  687. <para></para>
  688. <para>Now that we have our data in a useful format and the file is in
  689. place, we can write more code to use the new data file. We will
  690. determine the indexes we will need and create them. For this tutorial,
  691. let’s assume the field we need to index is the Zip code field.</para>
  692. <para></para>
  693. <para>In the DATASET definition, we will add a virtual field to the
  694. RECORD structure for the fileposition. This is required for
  695. indexes.</para>
  696. <para></para>
  697. <orderedlist>
  698. <listitem>
  699. <para>Insert a File into the <emphasis
  700. role="bold">Tutorial</emphasis><emphasis
  701. role="bold">YourName</emphasis><emphasis role="bold">
  702. </emphasis>Folder. Name it <emphasis role="bold">
  703. File_TutorialPerson </emphasis>and write this code (changing
  704. <emphasis>YN </emphasis>to your initials):</para>
  705. <para></para>
  706. <para><programlisting>IMPORT TutorialYourName;
  707. EXPORT File_TutorialPerson :=
  708. DATASET('~tutorial::YN::TutorialPerson',
  709. {TutorialYourName.Layout_People,
  710. UNSIGNED8 fpos {virtual(fileposition)}},THOR);
  711. </programlisting></para>
  712. </listitem>
  713. <listitem>
  714. <para>Check the syntax, if no errors press the <emphasis
  715. role="bold">Submit</emphasis> button.</para>
  716. </listitem>
  717. <listitem>
  718. <para>When it completes, it displays a green checkmark
  719. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  720. </listitem>
  721. </orderedlist>
  722. </sect2>
  723. <sect2 id="Index_the_Data">
  724. <title>Index the Data</title>
  725. <para>Next, we will define the INDEX.</para>
  726. <orderedlist>
  727. <listitem>
  728. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  729. role="bold">IDX_PeopleByZip</emphasis><emphasis role="bold">
  730. </emphasis>and write this code (changing <emphasis>YN</emphasis>
  731. and <emphasis>YourName</emphasis> as before):</para>
  732. <para><programlisting>IMPORT TutorialYourName;
  733. EXPORT IDX_PeopleByZIP :=
  734. INDEX(TutorialYourName.File_TutorialPerson,{zip,fpos},'~tutorial::YN::PeopleByZipINDEX');
  735. </programlisting></para>
  736. </listitem>
  737. <listitem>
  738. <para>Check the syntax.</para>
  739. <para>Next, we will build the index file.</para>
  740. </listitem>
  741. <listitem>
  742. <para>Insert a File into the <emphasis
  743. role="bold">Tutorial</emphasis><emphasis
  744. role="bold">YourName</emphasis><emphasis role="bold">
  745. </emphasis>Folder and name it <emphasis
  746. role="bold">BWR_BuildPeopleByZip </emphasis>and write this code
  747. (replacing <emphasis>YourName</emphasis> with your name):</para>
  748. <para><programlisting>IMPORT TutorialYourName;
  749. BUILDINDEX(TutorialYourName.IDX_PeopleByZIP,OVERWRITE);
  750. </programlisting></para>
  751. </listitem>
  752. <listitem>
  753. <para>Check the syntax and if there are no errors, press the
  754. <emphasis role="bold">Submit</emphasis> button.</para>
  755. </listitem>
  756. <listitem>
  757. <para>Wait for the Workunit to complete, then close the Builder
  758. Window.</para>
  759. </listitem>
  760. </orderedlist>
  761. </sect2>
  762. <sect2 id="Query_the_Data">
  763. <title>Build a Query</title>
  764. <para>Now that we have an index file, we will write a query that uses
  765. it.</para>
  766. <orderedlist>
  767. <listitem>
  768. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  769. role="bold">BWR_FetchPeopleByZip </emphasis>and write this code
  770. (changing <emphasis>YourName</emphasis> as before):</para>
  771. <para><programlisting>IMPORT TutorialYourName;
  772. ZipFilter :='33024';
  773. FetchPeopleByZip :=
  774. FETCH(TutorialYourName.File_TutorialPerson,
  775. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  776. RIGHT.fpos);
  777. OUTPUT(FetchPeopleByZip);
  778. </programlisting></para>
  779. </listitem>
  780. <listitem>
  781. <para>Check the syntax and if there are no errors, press the
  782. <emphasis role="bold">Submit</emphasis> button.</para>
  783. </listitem>
  784. <listitem>
  785. <para>When it completes, select the Workunit<emphasis role="bold">
  786. </emphasis>tab, then select the <emphasis
  787. role="bold">Result</emphasis> tab.</para>
  788. </listitem>
  789. <listitem>
  790. <para>Examine the result, then close the Builder window.</para>
  791. <para><emphasis role="bold">Note</emphasis>: You can change the
  792. value of the <emphasis role="bold">ZipValue</emphasis> field to
  793. get results from different Zip codes.</para>
  794. </listitem>
  795. </orderedlist>
  796. </sect2>
  797. </sect1>
  798. <sect1 id="Publishing_your_Query">
  799. <title>Publishing your Query</title>
  800. <para>Now that we have created an indexed query, the next step is to
  801. enable access to it through a Web interface.</para>
  802. <para>Our STORED variables provide a means to pass values as query
  803. parameters. In this example, the user can supply the ZIP code so the
  804. results are people from that ZIP code.</para>
  805. <orderedlist>
  806. <listitem>
  807. <para>Insert a File into the <emphasis role="bold">TutorialYourName
  808. </emphasis>Folder and name it <emphasis
  809. role="bold">FetchPeopleByZipService</emphasis></para>
  810. </listitem>
  811. <listitem>
  812. <para>Write this code (changing <emphasis>YourName</emphasis> as
  813. before):</para>
  814. <para><programlisting>IMPORT TutorialYourName;
  815. STRING10 ZipFilter := '' :STORED('ZIPValue');
  816. resultSet :=
  817. FETCH(TutorialYourName.File_TutorialPerson,
  818. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  819. RIGHT.fpos);
  820. OUTPUT(resultset);
  821. </programlisting></para>
  822. </listitem>
  823. <listitem>
  824. <para>Check the syntax, and save the file.</para>
  825. </listitem>
  826. <listitem>
  827. <para>Press the <emphasis role="bold">Submit</emphasis><emphasis
  828. role="bold"> </emphasis>button.</para>
  829. </listitem>
  830. <listitem>
  831. <para>When the workunit completes, select the Workunit<emphasis
  832. role="bold"> </emphasis>tab, then select the ECL Watch tab.</para>
  833. </listitem>
  834. <listitem>
  835. <?dbfo keep-together="always"?>
  836. <para>Press the <emphasis role="bold">Publish</emphasis> button, you
  837. may need to scroll down the main window.</para>
  838. <para><figure>
  839. <title>Publish Workunit</title>
  840. <mediaobject>
  841. <imageobject>
  842. <imagedata fileref="images/DTimg12.jpg" />
  843. </imageobject>
  844. </mediaobject>
  845. </figure></para>
  846. </listitem>
  847. <listitem>
  848. <para>When the workunit is published, a notice dialog
  849. displays.</para>
  850. <para><figure>
  851. <title>Workunit Published</title>
  852. <mediaobject>
  853. <imageobject>
  854. <imagedata fileref="images/DT173-18b.png" />
  855. </imageobject>
  856. </mediaobject>
  857. </figure></para>
  858. </listitem>
  859. </orderedlist>
  860. <sect2 id="Execute-using-the-Data-Delivery-Engine">
  861. <title>Execute using WsECL</title>
  862. <para>Now that the query is published, we can run it using the WsECL
  863. Web service. WsECL provides a Web-based interface to your published
  864. query. It also automatically creates an entry form to execute the
  865. query.</para>
  866. <para>Using the following URL:</para>
  867. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  868. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  869. Default port is 8002)</emphasis></para>
  870. <para></para>
  871. <para><figure>
  872. <title>WsECL</title>
  873. <mediaobject>
  874. <imageobject>
  875. <imagedata fileref="images/DTimg13.jpg" />
  876. </imageobject>
  877. </mediaobject>
  878. </figure></para>
  879. <para></para>
  880. <orderedlist>
  881. <listitem>
  882. <para>Click on the + sign next to <emphasis
  883. role="bold">thor</emphasis> to expand the tree.</para>
  884. </listitem>
  885. <listitem>
  886. <?dbfo keep-together="always"?>
  887. <para>Click on the <emphasis
  888. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  889. <para>The form for the service displays.</para>
  890. <para><figure>
  891. <title>Service Form</title>
  892. <mediaobject>
  893. <imageobject>
  894. <imagedata fileref="images/DTimg14a.jpg" />
  895. </imageobject>
  896. </mediaobject>
  897. </figure></para>
  898. </listitem>
  899. <listitem>
  900. <para>Provide a zip code (e.g., 33024) in the <emphasis
  901. role="bold">zipvalue</emphasis> field, , select <emphasis
  902. role="bold">Output Tables</emphasis> from the drop list, then
  903. press the <emphasis role="bold">Submit</emphasis> button.</para>
  904. <para>The results display.</para>
  905. <para><figure>
  906. <title>Results</title>
  907. <mediaobject>
  908. <imageobject>
  909. <imagedata fileref="images/DTimg15a.jpg" />
  910. </imageobject>
  911. </mediaobject>
  912. </figure></para>
  913. </listitem>
  914. </orderedlist>
  915. </sect2>
  916. </sect1>
  917. <sect1 id="Deploy_the_Roxie_Query">
  918. <title>Compile and Publish the Roxie Query</title>
  919. <para>The final step in this process is to publish the indexed query to
  920. a Rapid Data Delivery Engine (Roxie) Cluster.</para>
  921. <para>We will recompile the code with Roxie as the target cluster, then
  922. publish it to a Roxie cluster. <orderedlist>
  923. <listitem>
  924. <para>In the ECL IDE, select the Builder tab on the
  925. FetchPeopleByZipService file builder window,</para>
  926. </listitem>
  927. <listitem>
  928. <para>Using the <emphasis role="bold">Target</emphasis> drop list,
  929. select Roxie as the Target cluster.</para>
  930. <para><figure>
  931. <title>Target Roxie</title>
  932. <mediaobject>
  933. <imageobject>
  934. <imagedata fileref="images/DTimg16.jpg" />
  935. </imageobject>
  936. </mediaobject>
  937. </figure></para>
  938. </listitem>
  939. <listitem>
  940. <para>In the Builder window, in the upper left corner the
  941. <emphasis role="bold">Submit</emphasis> button has a drop down
  942. arrow next to it. Select the arrow to expose the <emphasis
  943. role="bold">Compile</emphasis> option.</para>
  944. <figure>
  945. <title>Compile</title>
  946. <mediaobject>
  947. <imageobject>
  948. <imagedata fileref="images/DTimg17.jpg" />
  949. </imageobject>
  950. </mediaobject>
  951. </figure>
  952. </listitem>
  953. <listitem>
  954. <para>Select <emphasis role="bold">Compile</emphasis></para>
  955. </listitem>
  956. <listitem>
  957. <?dbfo keep-together="always"?>
  958. <para>When the workunit finishes, it will display a green circle
  959. indicating it has compiled.</para>
  960. <para><figure>
  961. <title>Compiled</title>
  962. <mediaobject>
  963. <imageobject>
  964. <imagedata fileref="images/DTimg18.jpg" />
  965. </imageobject>
  966. </mediaobject>
  967. </figure></para>
  968. </listitem>
  969. </orderedlist></para>
  970. <sect2 id="Deploy_the_Query_to_Roxie">
  971. <title>Publish the Roxie query</title>
  972. <para>Next we will publish the query to a Roxie Cluster.</para>
  973. <orderedlist>
  974. <listitem>
  975. <para>Select the workunit tab for the FetchPeopleByZipService that
  976. you just compiled.</para>
  977. </listitem>
  978. <listitem>
  979. <para>Select the ECL Watch tab.</para>
  980. </listitem>
  981. <listitem>
  982. <?dbfo keep-together="always"?>
  983. <para>Press the <emphasis role="bold">Publish</emphasis> button
  984. (you may need to scroll down the main window)</para>
  985. <para><figure>
  986. <title>Publish Query</title>
  987. <mediaobject>
  988. <imageobject>
  989. <imagedata fileref="images/DTimg19.jpg" />
  990. </imageobject>
  991. </mediaobject>
  992. </figure>When it successfully publishes, you will see:</para>
  993. <para><figure>
  994. <title>Workunit Published</title>
  995. <mediaobject>
  996. <imageobject>
  997. <imagedata fileref="images/DT173-18b.png" />
  998. </imageobject>
  999. </mediaobject>
  1000. </figure></para>
  1001. </listitem>
  1002. </orderedlist>
  1003. </sect2>
  1004. <sect2 id="Run_the_Roxie_Query" role="brk">
  1005. <title>Run the Roxie Query in WsECL</title>
  1006. <para>Now that the query is deployed to a Roxie cluster, we can run it
  1007. using the WS-ECL service Using the following URL:</para>
  1008. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  1009. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  1010. The default port is 8002)</emphasis></para>
  1011. <orderedlist>
  1012. <listitem>
  1013. <para>Click on the + sign next to <emphasis
  1014. role="bold">myroxie</emphasis> to expand the tree.</para>
  1015. </listitem>
  1016. <listitem>
  1017. <?dbfo keep-together="always"?>
  1018. <para>Click on the <emphasis
  1019. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  1020. <para>The form for the service displays.</para>
  1021. <para><figure>
  1022. <title>RoxieECL</title>
  1023. <mediaobject>
  1024. <imageobject>
  1025. <imagedata fileref="images/DTimg21.jpg" />
  1026. </imageobject>
  1027. </mediaobject>
  1028. </figure></para>
  1029. </listitem>
  1030. <listitem>
  1031. <?dbfo keep-together="always"?>
  1032. <para>Provide a zip code (e.g., 33024), select <emphasis
  1033. role="bold">Output Tables</emphasis> from the drop list, and press
  1034. the Submit button.</para>
  1035. <para>The results display.</para>
  1036. <para><figure>
  1037. <title>RoxieResults</title>
  1038. <mediaobject>
  1039. <imageobject>
  1040. <imagedata fileref="images/DTimg22.jpg" />
  1041. </imageobject>
  1042. </mediaobject>
  1043. </figure></para>
  1044. </listitem>
  1045. </orderedlist>
  1046. </sect2>
  1047. </sect1>
  1048. </chapter>
  1049. <chapter id="Summary">
  1050. <title>Summary</title>
  1051. <para>Now that you have successfully processed raw data, sprayed it onto a
  1052. cluster, and deployed it to a RDDE cluster, what’s next?</para>
  1053. <!-- -->
  1054. <para>Here is a short list of suggestions on the path you might take from
  1055. here:</para>
  1056. <itemizedlist mark="bullet">
  1057. <listitem>
  1058. <para>Create indexes on other fields and create queries using
  1059. them.</para>
  1060. </listitem>
  1061. </itemizedlist>
  1062. <itemizedlist mark="bullet">
  1063. <listitem>
  1064. <para>Write client applications to access your queries using JSON or
  1065. SOAP interfaces.</para>
  1066. </listitem>
  1067. </itemizedlist>
  1068. <itemizedlist mark="bullet">
  1069. <listitem>
  1070. <para>Looks at the resources available on the Links tab</para>
  1071. <para><figure>
  1072. <title>Links</title>
  1073. <mediaobject>
  1074. <imageobject>
  1075. <imagedata fileref="images/DTimg24.jpg" />
  1076. </imageobject>
  1077. </mediaobject>
  1078. </figure>The Links tab provides easy access to a form, a Sample
  1079. Request, a Sample Response, the WSDL, the XML Schema (XSD) and
  1080. more...</para>
  1081. </listitem>
  1082. </itemizedlist>
  1083. <itemizedlist mark="bullet">
  1084. <listitem>
  1085. <para>Follow the procedures in this tutorial using your own
  1086. data!</para>
  1087. </listitem>
  1088. </itemizedlist>
  1089. </chapter>
  1090. </book>