DataTutorial.xml 48 KB


  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <book lang="en_US" xml:base="../">
  5. <title>HPCC Systems<superscript>®</superscript> Data Tutorial</title>
  6. <bookinfo>
  7. <title>HPCC Systems<superscript>®</superscript> Data Tutorial</title>
  8. <mediaobject>
  9. <imageobject>
  10. <imagedata fileref="images/redswooshWithLogo3.jpg" />
  11. </imageobject>
  12. </mediaobject>
  13. <author>
  14. <surname>Boca Raton Documentation Team</surname>
  15. </author>
  16. <legalnotice>
  17. <para>We welcome your comments and feedback about this document via
  18. email to <email>docfeedback@hpccsystems.com</email></para>
  19. <para>Please include <emphasis role="bold">Documentation
  20. Feedback</emphasis> in the subject line and reference the document name,
  21. page numbers, and current Version Number in the text of the
  22. message.</para>
  23. <para>LexisNexis and the Knowledge Burst logo are registered trademarks
  24. of Reed Elsevier Properties Inc., used under license.</para>
  25. <para>HPCC Systems<superscript>®</superscript> is a registered trademark
  26. of LexisNexis Risk Data Management Inc.</para>
  27. <para>Other products and services may be trademarks or registered
  28. trademarks of their respective companies.</para>
  29. <para>All names and example data used in this manual are fictitious. Any
  30. similarity to actual persons, living or dead, is purely
  31. coincidental.</para>
  32. <para></para>
  33. </legalnotice>
  34. <xi:include href="common/Version.xml"
  35. xpointer="xpointer(//*[@id='FooterInfo'])"
  36. xmlns:xi="http://www.w3.org/2001/XInclude" />
  37. <!--Release Info makes a running page footer: now an include! -->
  38. <!--The following include statement pulls in the date_ver from version.xml-->
  39. <xi:include href="common/Version.xml"
  40. xpointer="xpointer(//*[@id='DateVer'])"
  41. xmlns:xi="http://www.w3.org/2001/XInclude" />
  42. <corpname>HPCC Systems<superscript>®</superscript></corpname>
  43. <!--corpname never prints-->
  44. <xi:include href="common/Version.xml"
  45. xpointer="xpointer(//*[@id='Copyright'])"
  46. xmlns:xi="http://www.w3.org/2001/XInclude" />
  47. <!--Copyright tag inserts the symbol automatically: Now an Include!-->
  48. <mediaobject role="logo">
  49. <imageobject>
  50. <imagedata fileref="images/LN_Rightjustified.jpg" />
  51. </imageobject>
  52. </mediaobject>
  53. </bookinfo>
  54. <chapter id="DataTutorialIntroduction">
  55. <title>Introduction</title>
  56. <sect1 id="Introduction_Case-Study-and-Tutorial" role="nobrk">
  57. <title>The ECL Development Process</title>
  58. <para>This tutorial provides a walk-through of the development process,
  59. from beginning to end, and is designed to be an introduction to working
  60. with data on any HPCC Systems platform. HPCC<footnote>
  61. <para><emphasis role="bold">H</emphasis>igh <emphasis
  62. role="bold">P</emphasis>erformance <emphasis
  63. role="bold">C</emphasis>omputing <emphasis
  64. role="bold">C</emphasis>luster (HPCC) Systems is a massively
  65. parallel processing computing platform that solves Big Data
  66. problems. See http://www.hpccsystems.com/Why-HPCC/How-it-works for
  67. more details.</para>
  68. </footnote>. We will write code in ECL<footnote>
  69. <para><emphasis role="bold">E</emphasis>nterprise <emphasis
  70. role="bold">C</emphasis>ontrol <emphasis
  71. role="bold">L</emphasis>anguage (ECL) is a declarative, data centric
  72. programming language used to manage all aspects of the massive data
  73. joins, sorts, and builds that truly differentiate HPCC Systems (High
  74. Performance Computing Cluster) from other technologies in its
  75. ability to provide flexible data analysis on a massive scale.</para>
  76. </footnote>to process our data and query it.</para>
  77. <para>This tutorial assumes:</para>
  78. <itemizedlist>
  79. <listitem>
  80. <para>You have a running HPCC Systems platform. This can be a single
  81. or multinode HPCC Systems platform deployment.</para>
  82. </listitem>
  83. </itemizedlist>
  84. <para>You have the ECL IDE<footnote>
  85. <para>The ECL IDE (Integrated Development Environment) is the tool
  86. used to create queries into your data and ECL files with which to
  87. build your queries.</para>
  88. </footnote> installed and configured</para>
  89. <para>In this tutorial, we will:</para>
  90. <itemizedlist mark="bullet">
  91. <listitem>
  92. <para>Download a raw data file</para>
  93. <para>There are links to data file available at <ulink
  94. url="https://hpccsystems.com/training/documentation/learning-ecl">https://hpccsystems.com/training/documentation/learning-ecl</ulink></para>
  95. <para>The download is approximately 30 MB (compressed) and is
  96. available in either ZIP or .tar.gz format. Choose the appropriate
  97. link.</para>
  98. </listitem>
  99. <listitem>
  100. <para>Spray the file to a Data Refinery cluster HPCC Systems
  101. clusters "spray" data into file parts on each node.</para>
  102. <para>A <emphasis>spray</emphasis> or <emphasis>import</emphasis> is
  103. the relocation of a data file from one location to an HPCC Systems
  104. cluster. The term spray was adopted due to the nature of the file
  105. movement -- the file is partitioned across all nodes within a
  106. cluster.</para>
  107. </listitem>
  108. <listitem>
  109. <para>Examine the data and determine the pre-processing we need to
  110. perform</para>
  111. </listitem>
  112. <listitem>
  113. <para>Pre-process the data to produce a new data file</para>
  114. </listitem>
  115. <listitem>
  116. <para>Determine the types of queries we want</para>
  117. </listitem>
  118. <listitem>
  119. <para>Create the queries</para>
  120. </listitem>
  121. <listitem>
  122. <para>Test the queries</para>
  123. </listitem>
  124. <listitem>
  125. <para>Deploy them to a Rapid Data Delivery Engine (RDDE) cluster,
  126. also know as a Roxie cluster.</para>
  127. </listitem>
  128. </itemizedlist>
  129. </sect1>
  130. </chapter>
  131. <chapter id="Working_with_Data">
  132. <title>Working with Data</title>
  133. <sect1 id="The_Original_Data" role="nobrk">
  134. <title>The Original Data</title>
  135. <para>In this scenario, we receive a structured data file containing
  136. records with people's names and addresses. The HPCC Systems platform
  137. also supports unstructured data, but this example is simpler. This file
  138. is documented in the following table:</para>
  139. <para></para>
  140. <para><informaltable colsep="1" frame="all" rowsep="1">
  141. <tgroup cols="3">
  142. <colspec colwidth="147.60pt" />
  143. <colspec colwidth="147.60pt" />
  144. <colspec colwidth="147.60pt" />
  145. <thead>
  146. <row>
  147. <entry align="left">Field Name</entry>
  148. <entry align="left">Type</entry>
  149. <entry align="left">Description</entry>
  150. </row>
  151. </thead>
  152. <tbody>
  153. <row>
  154. <entry>FirstName</entry>
  155. <entry>15 Character String</entry>
  156. <entry>First Name</entry>
  157. </row>
  158. <row>
  159. <entry>LastName</entry>
  160. <entry>25 Character String</entry>
  161. <entry>Last name</entry>
  162. </row>
  163. <row>
  164. <entry>MiddleName</entry>
  165. <entry>15 Character String</entry>
  166. <entry>Middle Name</entry>
  167. </row>
  168. <row>
  169. <entry>Zip</entry>
  170. <entry>5 Character String</entry>
  171. <entry>ZIP Code</entry>
  172. </row>
  173. <row>
  174. <entry>Street</entry>
  175. <entry>42 Character String</entry>
  176. <entry>Street Address</entry>
  177. </row>
  178. <row>
  179. <entry>City</entry>
  180. <entry>20 Character String</entry>
  181. <entry>City</entry>
  182. </row>
  183. <row>
  184. <entry>State</entry>
  185. <entry>2 Character String</entry>
  186. <entry>State</entry>
  187. </row>
  188. </tbody>
  189. </tgroup>
  190. </informaltable></para>
  191. <para>This gives us a record length of 124 (the total of all field
  192. lengths). You will need to know this length for the <emphasis
  193. role="bold">File Spray</emphasis> process.</para>
  194. <para></para>
  195. <sect2 id="Uploading_a_file">
  196. <title>Load the Incoming Data File to your Landing Zone</title>
  197. <para>A Landing Zone (or Drop Zone) is a physical storage location
  198. defined in your HPCC's environment. A daemon (DaFileSrv) must be
  199. running on that server to enable file sprays and desprays.</para>
  200. <para>For smaller data files, you can use the upload/download file
  201. utility in ECL Watch (a Web-based interface to your HPCC Systems
  202. platform). The sample data file is ~100 mb.</para>
  203. <orderedlist>
  204. <listitem>
  205. <para>Download the sample data file from the HPCC
  206. Systems<superscript>®</superscript> portal.</para>
  207. <para>The data file is available from links found on <ulink
  208. url="https://hpccsystems.com/training/documentation/tutorials">https://hpccsystems.com/training/documentation/tutorials</ulink>
  209. . The download is approximately 30 MB (compressed) and is
  210. available in either ZIP or tar.gz format (<emphasis
  211. role="bold">OriginalPerson.tar.gz</emphasis> or <emphasis
  212. role="bold">OriginalPerson.zip</emphasis>)</para>
  213. </listitem>
  214. <listitem>
  215. <para>Extract it to a folder on your local machine.</para>
  216. </listitem>
  217. <listitem>
  218. <para>In your browser, go to the <emphasis role="bold">ECL
  219. Watch</emphasis> URL. For example, http://nnn.nnn.nnn.nnn:8010,
  220. where nnn.nnn.nnn.nnn is your ESP<footnote>
  221. <para>The ESP (Enterprise Services Platform) Server is the
  222. communication layer server in you HPCC Systems
  223. environment.</para>
  224. </footnote> Server's IP address.</para>
  225. <para><informaltable colsep="1" frame="all" rowsep="1">
  226. <?dbfo keep-together="always"?>
  227. <tgroup cols="2">
  228. <colspec colwidth="49.50pt" />
  229. <colspec />
  230. <tbody>
  231. <row>
  232. <entry><inlinegraphic
  233. fileref="images/caution.png" /></entry>
  234. <entry>Your IP address could be different from the ones
  235. provided in the example images. Please use the IP
  236. address provided by <emphasis
  237. role="bold">your</emphasis> installation.</entry>
  238. </row>
  239. </tbody>
  240. </tgroup>
  241. </informaltable></para>
  242. </listitem>
  243. <listitem>
  244. <?dbfo keep-together="always"?>
  245. <para>From the ECL Watch home page, click on the <emphasis
  246. role="bold">Files</emphasis> icon, then click the <emphasis
  247. role="bold">Landing Zones</emphasis> link from the navigation
  248. sub-menu.</para>
  249. <para>Press on the <emphasis role="bold">Upload </emphasis>action
  250. button on the Landing Zones tab.</para>
  251. <para><figure>
  252. <title>Upload/download</title>
  253. <mediaobject>
  254. <imageobject>
  255. <imagedata fileref="images/LZimg03-1.jpg"
  256. vendor="eclwatchSS" />
  257. </imageobject>
  258. </mediaobject>
  259. </figure></para>
  260. <para>Once you press the Upload button, a dialog opens where you
  261. can choose a file to upload.</para>
  262. </listitem>
  263. <listitem>
  264. <para>Browse the files on your local machine, select the file to
  265. upload, and then press the <emphasis role="bold">Open</emphasis>
  266. button.</para>
  267. <para>The file you selected displays in the <emphasis
  268. role="bold">File Uploader</emphasis> dialog.</para>
  269. <para><figure>
  270. <title>File Uploader</title>
  271. <mediaobject>
  272. <imageobject>
  273. <imagedata fileref="images/LZimg04.jpg"
  274. vendor="eclwatchSS" />
  275. </imageobject>
  276. </mediaobject>
  277. </figure></para>
  278. </listitem>
  279. <listitem>
  280. <para>Press the <emphasis role="bold">Start</emphasis> button to
  281. complete the file upload.<figure>
  282. <title>Upload Progress</title>
  283. <mediaobject>
  284. <imageobject>
  285. <imagedata fileref="images/LZimg06.jpg"
  286. vendor="eclwatchSS" />
  287. </imageobject>
  288. </mediaobject>
  289. </figure></para>
  290. </listitem>
  291. </orderedlist>
  292. </sect2>
  293. <sect2 id="Spray_the_Data_File_to_your_DR-THOR_Cluster">
  294. <title>Spray the Data File to your Thor Cluster</title>
  295. <para>To use the data file in our HPCC Systems cluster, we must first
  296. "spray" it to a Thor cluster. A <emphasis>spray</emphasis> or
  297. <emphasis>import</emphasis> is the relocation of a data file from one
  298. location to a Thor cluster. The term spray was adopted due to the
  299. nature of the file movement -- the file is partitioned across all
  300. nodes within a cluster.</para>
  301. <para>In this example, the file is on your Landing Zone and is named
  302. <emphasis role="bold">OriginalPerson.</emphasis></para>
  303. <para>We are going to spray it to our Thor cluster and give it a
  304. logical name of <emphasis role="bold">tutorial::</emphasis><emphasis
  305. role="bold">YN</emphasis><emphasis
  306. role="bold">::OriginalPerson</emphasis><emphasis role="bold">
  307. </emphasis>where <emphasis role="bold">YN</emphasis> are your
  308. initials. The Distributed File Utility maintains a list of logical
  309. files and their corresponding physical file locations.</para>
  310. <orderedlist>
  311. <listitem>
  312. <para>Open ECL Watch in your browser using the following
  313. URL:</para>
  314. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp
  315. </emphasis><emphasis role="bold">(where nnn.nnn.nnn.nnn is your
  316. ESP Server's IP Address and pppp is the port. The default port is
  317. 8010)</emphasis></para>
  318. </listitem>
  319. <listitem>
  320. <?dbfo keep-together="always"?>
  321. <para>From the ECL Watch home page, click on the <emphasis
  322. role="bold">Files</emphasis> icon, then click the <emphasis
  323. role="bold">Landing Zones</emphasis> link from the navigation
  324. sub-menu.</para>
  325. <para>On the Landing Zones tab, click on the arrow next to your
  326. mydropzone container to expand the list of uploaded files. <figure>
  327. <title>mydropzone</title>
  328. <mediaobject>
  329. <imageobject>
  330. <imagedata fileref="images/LZimg000.jpg"
  331. vendor="eclwatchSS" />
  332. </imageobject>
  333. </mediaobject>
  334. </figure></para>
  335. <para>Find the file you want to spray in the list
  336. (OriginalPerson), check the box next to that file name to select
  337. that file.</para>
  338. <para>Once you select the file from the list, the <emphasis
  339. role="bold">Spray</emphasis> action buttons become enabled.</para>
  340. </listitem>
  341. <listitem>
  342. <para>Press the <emphasis role="bold">Fixed</emphasis> action
  343. button. This indicates that you are spraying a fixed width file.
  344. <figure>
  345. <title>Spray: Fixed action button</title>
  346. <mediaobject>
  347. <imageobject>
  348. <imagedata fileref="images/LZimg001.jpg"
  349. vendor="eclwatchSS" />
  350. </imageobject>
  351. </mediaobject>
  352. </figure></para>
  353. <para>The <emphasis role="bold">Spray Fixed</emphasis> dialog
  354. displays.</para>
  355. </listitem>
  356. <listitem>
  357. <para>The Target name field is automatically filled in with the
  358. selected file. <figure>
  359. <title>Spray Fixed dialog</title>
  360. <mediaobject>
  361. <imageobject>
  362. <imagedata fileref="images/LZimg002.jpg"
  363. vendor="eclwatchSS" />
  364. </imageobject>
  365. </mediaobject>
  366. </figure></para>
  367. </listitem>
  368. <listitem>
  369. <para>Choose the mythor cluster from the <emphasis
  370. role="bold">Group</emphasis> drop list.</para>
  371. </listitem>
  372. <listitem>
  373. <para>If there are multiple queues, select one from the
  374. list.</para>
  375. </listitem>
  376. <listitem>
  377. <para>Fill in the <emphasis role="bold">Record Length</emphasis>
  378. (124).</para>
  379. </listitem>
  380. <listitem>
  381. <para>Fill in the <emphasis role="bold">Target Scope</emphasis>
  382. using the naming convention described earlier: <emphasis
  383. role="bold">tutorial::</emphasis><emphasis
  384. role="bold">YN</emphasis> (remember, <emphasis
  385. role="bold">YN</emphasis> are your initials).</para>
  386. </listitem>
  387. <listitem>
  388. <?dbfo keep-together="always"?>
  389. <para>Make sure the <emphasis
  390. role="bold">Replicate</emphasis><emphasis role="bold">
  391. </emphasis>box is checked.</para>
  392. <para><emphasis role="bold">Note:</emphasis> This option is only
  393. available on systems where replication has been enabled.</para>
  394. </listitem>
  395. <listitem>
  396. <para>Press the <emphasis role="bold">Spray<emphasis role="bold">
  397. </emphasis></emphasis>button.</para>
  398. </listitem>
  399. <listitem>
  400. <?dbfo keep-together="always"?>
  401. <para>The workunit details page displays. You can view the
  402. progress of the spray.</para>
  403. <para><figure>
  404. <title>View Progress</title>
  405. <mediaobject>
  406. <imageobject>
  407. <imagedata fileref="images/DTimg02.jpg"
  408. vendor="eclwatchSS" />
  409. </imageobject>
  410. </mediaobject>
  411. </figure></para>
  412. <para>Once the spray is complete, we can proceed.</para>
  413. </listitem>
  414. </orderedlist>
  415. </sect2>
  416. </sect1>
  417. <sect1 id="Begin_Coding">
  418. <title>Begin Coding</title>
  419. <para>In this portion of the tutorial, we will write ECL code to define
  420. the data file and execute simple queries on it so we can evaluate it and
  421. determine any necessary pre-processing.</para>
  422. <orderedlist>
  423. <listitem>
  424. <para>Start the ECL IDE (Start &gt;&gt; All Programs &gt;&gt; HPCC
  425. Systems &gt;&gt; ECL IDE )</para>
  426. </listitem>
  427. <listitem>
  428. <para>Log in to your environment</para>
  429. </listitem>
  430. <listitem>
  431. <?dbfo keep-together="always"?>
  432. <para>Right-click on the <emphasis role="bold">My Files</emphasis>
  433. folder in the Repository window, and select <emphasis
  434. role="bold">Insert Folder</emphasis> from the pop-up menu.</para>
  435. <para><figure>
  436. <title>Insert Folder</title>
  437. <mediaobject>
  438. <imageobject>
  439. <imagedata fileref="images/DTimg04.jpg" />
  440. </imageobject>
  441. </mediaobject>
  442. </figure></para>
  443. <para>For purposes of this tutorial, let's create a folder called
  444. <emphasis role="bold">Tutorial</emphasis><emphasis
  445. role="bold">YourName</emphasis><emphasis> </emphasis>(where
  446. <emphasis>YourName</emphasis> is your name).</para>
  447. </listitem>
  448. <listitem>
  449. <?dbfo keep-together="always"?>
  450. <para>Enter <emphasis role="bold">Tutorial</emphasis><emphasis
  451. role="bold">YourName</emphasis>(where <emphasis>YourName</emphasis>
  452. is your name)<emphasis> </emphasis>for the label, then press the OK
  453. button.</para>
  454. <para><figure>
  455. <title>Enter Folder Label</title>
  456. <mediaobject>
  457. <imageobject>
  458. <imagedata fileref="images/DTimg05.jpg" />
  459. </imageobject>
  460. </mediaobject>
  461. </figure></para>
  462. </listitem>
  463. <listitem>
  464. <para>Right-click on the <emphasis
  465. role="bold">Tutorial</emphasis><emphasis
  466. role="bold">YourName</emphasis>Folder, and select <emphasis
  467. role="bold">Insert File</emphasis> from the pop-up menu.</para>
  468. </listitem>
  469. <listitem>
  470. <?dbfo keep-together="always"?>
  471. <para>Enter <emphasis role="bold">Layout_People</emphasis> for the
  472. label, then press the OK button.</para>
  473. <para><figure>
  474. <title>Insert File</title>
  475. <mediaobject>
  476. <imageobject>
  477. <imagedata fileref="images/DTimg06.jpg" />
  478. </imageobject>
  479. </mediaobject>
  480. </figure></para>
  481. <para>A Builder Window opens.</para>
  482. <para><figure>
  483. <title>Layout People in Builder</title>
  484. <mediaobject>
  485. <imageobject>
  486. <imagedata fileref="images/DTimg07.jpg" />
  487. </imageobject>
  488. </mediaobject>
  489. </figure></para>
  490. <para>Notice that some text has been written for you in the window.
  491. This helps you to remember that the name of the file (Layout_People)
  492. <emphasis>must always exactly match</emphasis> the name of the
  493. single EXPORT definition (Layout_People) contained in that file.
  494. This is a requirement -- one EXPORT definition per file, and its
  495. name must match the filename.</para>
  496. </listitem>
  497. <listitem>
  498. <?dbfo keep-together="always"?>
  499. <para>Write the following code in the Builder workspace:</para>
  500. <para><programlisting>EXPORT Layout_People := RECORD
  501. STRING15 FirstName;
  502. STRING25 LastName;
  503. STRING15 MiddleName;
  504. STRING5 Zip;
  505. STRING42 Street;
  506. STRING20 City;
  507. STRING2 State;
  508. END; </programlisting> <figure>
  509. <title>Code in Builder Window</title>
  510. <mediaobject>
  511. <imageobject>
  512. <imagedata fileref="images/DTimg08.jpg" />
  513. </imageobject>
  514. </mediaobject>
  515. </figure></para>
  516. </listitem>
  517. <listitem>
  518. <para>Press the syntax check button on the main toolbar (or press
  519. F7).</para>
  520. <para>It is always a good idea to check syntax before
  521. submitting.</para>
  522. <para><figure>
  523. <title>Check Syntax</title>
  524. <mediaobject>
  525. <imageobject>
  526. <imagedata fileref="images/DTimg23.jpg" />
  527. </imageobject>
  528. </mediaobject>
  529. </figure></para>
  530. <para>This file defines the record structure for the data file.
  531. Next, we will examine the data.</para>
  532. </listitem>
  533. </orderedlist>
  534. <sect2 id="Examine_the_Data" role="brk">
  535. <title>Examine the Data</title>
  536. <para>In this section, we will look at the data and determine if there
  537. is any pre-processing we want to perform on the data. This is the step
  538. in the development process where we convert the raw data into a form
  539. we can use.</para>
  540. <orderedlist>
  541. <listitem>
  542. <para>Right-click on the <emphasis
  543. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  544. </emphasis>Folder, and select <emphasis role="bold">Insert
  545. File</emphasis> from the pop-up menu.</para>
  546. </listitem>
  547. <listitem>
  548. <para>Enter <emphasis role="bold">File_OriginalPerson</emphasis>
  549. for the label, then press the OK button.</para>
  550. <para><figure>
  551. <title>Insert File</title>
  552. <mediaobject>
  553. <imageobject>
  554. <imagedata fileref="images/DTimg09.jpg" />
  555. </imageobject>
  556. </mediaobject>
  557. </figure>A Builder Window opens.</para>
  558. </listitem>
  559. <listitem>
  560. <para>Write the following code (remember to replace
  561. <emphasis>YN</emphasis> with your initials):</para>
  562. <para><programlisting>IMPORT TutorialYourName;
  563. EXPORT File_OriginalPerson :=
  564. DATASET('~tutorial::YN::OriginalPerson',TutorialYourName.Layout_People,THOR);
  565. </programlisting></para>
  566. <para><figure>
  567. <title>File_OriginalPerson.ecl</title>
  568. <mediaobject>
  569. <imageobject>
  570. <imagedata fileref="images/DTimg10.jpg" />
  571. </imageobject>
  572. </mediaobject>
  573. </figure></para>
  574. </listitem>
  575. <listitem>
  576. <para>Press the syntax check button on the main toolbar (or press
  577. F7) to check the syntax.</para>
  578. <para>This defines the Dataset. Next, we will examine the
  579. data.</para>
  580. </listitem>
  581. <listitem>
  582. <para>Open a new Builder Window (CTRL+N) and write the following
  583. code (remember to replace <emphasis>YourName </emphasis>with your
  584. name):</para>
  585. <programlisting>IMPORT TutorialYourName;
  586. COUNT(TutorialYourName.File_OriginalPerson);
  587. </programlisting>
  588. </listitem>
  589. <listitem>
  590. <para>Press the syntax check button on the main toolbar (or press
  591. F7) to check the syntax.</para>
  592. </listitem>
  593. <listitem>
  594. <?dbfo keep-together="always"?>
  595. <para>Make sure the selected cluster is your Thor cluster, then
  596. press the <emphasis role="bold">Submit</emphasis> button. Note
  597. that your target cluster might have a different name.</para>
  598. <para><figure>
  599. <title>Target Thor</title>
  600. <mediaobject>
  601. <imageobject>
  602. <imagedata fileref="images/DTimg11.jpg" />
  603. </imageobject>
  604. </mediaobject>
  605. </figure></para>
  606. </listitem>
  607. <listitem>
  608. <para>When the Workunit completes, it displays a green checkmark
  609. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  610. </listitem>
  611. <listitem>
  612. <para>Select the Workunit tab (the one with the number next to the
  613. checkmark) and select the <emphasis role="bold">Result
  614. 1</emphasis> tab (it may already be selected).</para>
  615. <para><figure>
  616. <title>Result tab</title>
  617. <mediaobject>
  618. <imageobject>
  619. <imagedata fileref="images/DT173-16.png" />
  620. </imageobject>
  621. </mediaobject>
  622. </figure>This shows us that there are 841,400 records in the
  623. data file.</para>
  624. </listitem>
  625. <listitem>
  626. <para>Select the Builder tab and change COUNT to OUTPUT, as shown
  627. below:</para>
  628. <para><programlisting>IMPORT TutorialYourName;
  629. <emphasis role="bold">OUTPUT</emphasis>(TutorialYourName.File_OriginalPerson);</programlisting></para>
  630. <para><emphasis role="bold">Note: </emphasis>The modified portion
  631. is shown in <emphasis role="bold">bold</emphasis>.</para>
  632. <para></para>
  633. </listitem>
  634. <listitem>
  635. <para>Check the syntax, if no errors, press the <emphasis
  636. role="bold">Submit</emphasis> button.</para>
  637. </listitem>
  638. <listitem>
  639. <?dbfo keep-together="always"?>
  640. <para>When it completes, select the Workunit tab, then select the
  641. <emphasis role="bold">Result 1</emphasis> tab.</para>
  642. <para><figure>
  643. <title>Output Results</title>
  644. <mediaobject>
  645. <imageobject>
  646. <imagedata fileref="images/DT173-17.png" />
  647. </imageobject>
  648. </mediaobject>
  649. </figure></para>
  650. <para>Notice the names are in mixed case.</para>
  651. <para>For our purposes, it will be easier to have all the names in
  652. all uppercase. This demonstrates one of the steps in the basic
  653. process of preparing data (Extract, Transform, and Load--ETL)
  654. using ECL.</para>
  655. </listitem>
  656. <listitem>
  657. <para>Close the Builder Window.</para>
  658. </listitem>
  659. </orderedlist>
  660. </sect2>
  661. <sect2 id="Process_the_Data" role="brk">
  662. <title>Process the Data</title>
  663. <para>In this section, we will write code to convert the original data
  664. so that all names are in uppercase. We will then write this new file
  665. to our Thor cluster.</para>
  666. <orderedlist>
  667. <listitem>
  668. <para>Right-click on the <emphasis
  669. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  670. </emphasis>Folder, and select Insert File from the pop-up
  671. menu.</para>
  672. </listitem>
  673. <listitem>
  674. <para>Name this one <emphasis
  675. role="bold">BWR_ProcessRawData</emphasis> and write the following
  676. code (changing YN and YourName as before):</para>
  677. <para><programlisting>IMPORT TutorialYourName, Std;
  678. TutorialYourName.Layout_People toUpperPlease(TutorialYourName.Layout_People pInput)
  679. := TRANSFORM
  680. SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  681. SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  682. SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  683. SELF.Zip := pInput.Zip;
  684. SELF.Street := pInput.Street;
  685. SELF.City := pInput.City;
  686. SELF.State := pInput.State;
  687. END ;
  688. OrigDataset := TutorialYourName.File_OriginalPerson;
  689. UpperedDataset := PROJECT(OrigDataset,toUpperPlease(LEFT));
  690. OUTPUT(UpperedDataset,,'~tutorial::YN::TutorialPerson',OVERWRITE);
  691. </programlisting></para>
  692. </listitem>
  693. <listitem>
  694. <para>Check the syntax, if no errors press the <emphasis
  695. role="bold">Submit</emphasis> button.</para>
  696. </listitem>
  697. <listitem>
  698. <para>When it completes, select the Workunit tab, then select the
  699. Result 1 tab.</para>
  700. <para><figure>
  701. <title>Process Result</title>
  702. <mediaobject>
  703. <imageobject>
  704. <imagedata fileref="images/DT173-18.jpg" />
  705. </imageobject>
  706. </mediaobject>
  707. </figure></para>
  708. <para>The results show that the process has successfully converted
  709. the name fields to uppercase.</para>
  710. </listitem>
  711. <listitem>
  712. <para>After you examine the results, close the Builder
  713. window.</para>
  714. </listitem>
  715. </orderedlist>
  716. </sect2>
  717. <sect2 id="Using_our_Data">
  718. <title>Using our New Data</title>
  719. <para></para>
  720. <para>Now that we have our data in a useful format and the file is in
  721. place, we can write more code to use the new data file. We will
  722. determine the indexes we will need and create them. For this tutorial,
  723. let's assume the field we need to index is the Zip code field.</para>
  724. <para></para>
  725. <para>In the DATASET definition, we will add a virtual field to the
  726. RECORD structure for the fileposition. This is required for
  727. indexes.</para>
  728. <para></para>
  729. <orderedlist>
  730. <listitem>
  731. <para>Insert a File into the <emphasis
  732. role="bold">Tutorial</emphasis><emphasis
  733. role="bold">YourName</emphasis><emphasis role="bold">
  734. </emphasis>Folder. Name it <emphasis role="bold">
  735. File_TutorialPerson </emphasis>and write this code (changing
  736. <emphasis>YN </emphasis>to your initials):</para>
  737. <para></para>
  738. <para><programlisting>IMPORT TutorialYourName;
  739. EXPORT File_TutorialPerson :=
  740. DATASET('~tutorial::YN::TutorialPerson',
  741. {TutorialYourName.Layout_People,
  742. UNSIGNED8 fpos {virtual(fileposition)}},THOR);
  743. </programlisting></para>
  744. </listitem>
  745. <listitem>
  746. <para>Check the syntax, if no errors press the <emphasis
  747. role="bold">Submit</emphasis> button.</para>
  748. </listitem>
  749. <listitem>
  750. <para>When it completes, it displays a green checkmark
  751. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  752. </listitem>
  753. </orderedlist>
  754. </sect2>
  755. <sect2 id="Index_the_Data">
  756. <title>Index the Data</title>
  757. <para>Next, we will define the INDEX.</para>
  758. <orderedlist>
  759. <listitem>
  760. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  761. role="bold">IDX_PeopleByZip</emphasis><emphasis role="bold">
  762. </emphasis>and write this code (changing <emphasis>YN</emphasis>
  763. and <emphasis>YourName</emphasis> as before):</para>
  764. <para><programlisting>IMPORT TutorialYourName;
  765. EXPORT IDX_PeopleByZIP :=
  766. INDEX(TutorialYourName.File_TutorialPerson,{zip,fpos},'~tutorial::YN::PeopleByZipINDEX');
  767. </programlisting></para>
  768. </listitem>
  769. <listitem>
  770. <para>Check the syntax.</para>
  771. <para>Next, we will build the index file.</para>
  772. </listitem>
  773. <listitem>
  774. <para>Insert a File into the <emphasis
  775. role="bold">Tutorial</emphasis><emphasis
  776. role="bold">YourName</emphasis><emphasis role="bold">
  777. </emphasis>Folder and name it <emphasis
  778. role="bold">BWR_BuildPeopleByZip </emphasis>and write this code
  779. (replacing <emphasis>YourName</emphasis> with your name):</para>
  780. <para><programlisting>IMPORT TutorialYourName;
  781. BUILDINDEX(TutorialYourName.IDX_PeopleByZIP,OVERWRITE);
  782. </programlisting></para>
  783. </listitem>
  784. <listitem>
  785. <para>Check the syntax and if there are no errors, press the
  786. <emphasis role="bold">Submit</emphasis> button.</para>
  787. </listitem>
  788. <listitem>
  789. <para>Wait for the Workunit to complete, then close the Builder
  790. Window.</para>
  791. </listitem>
  792. </orderedlist>
  793. </sect2>
  794. <sect2 id="Query_the_Data">
  795. <title>Build a Query</title>
  796. <para>Now that we have an index file, we will write a query that uses
  797. it.</para>
  798. <orderedlist>
  799. <listitem>
  800. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  801. role="bold">BWR_FetchPeopleByZip </emphasis>and write this code
  802. (changing <emphasis>YourName</emphasis> as before):</para>
  803. <para><programlisting>IMPORT TutorialYourName;
  804. ZipFilter :='33024';
  805. FetchPeopleByZip :=
  806. FETCH(TutorialYourName.File_TutorialPerson,
  807. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  808. RIGHT.fpos);
  809. OUTPUT(FetchPeopleByZip);
  810. </programlisting></para>
  811. </listitem>
  812. <listitem>
  813. <para>Check the syntax and if there are no errors, press the
  814. <emphasis role="bold">Submit</emphasis> button.</para>
  815. </listitem>
  816. <listitem>
  817. <para>When it completes, select the Workunit<emphasis role="bold">
  818. </emphasis>tab, then select the <emphasis
  819. role="bold">Result</emphasis> tab.</para>
  820. </listitem>
  821. <listitem>
  822. <para>Examine the result, then close the Builder window and
  823. resubmit the code.</para>
  824. <para><emphasis role="bold">Note</emphasis>: You can change the
  825. value of the <emphasis role="bold">ZipValue</emphasis> field to
  826. get results from different Zip codes.</para>
  827. </listitem>
  828. </orderedlist>
  829. </sect2>
  830. </sect1>
  831. <sect1 id="Publishing_your_Query">
  832. <title>Publishing your Thor Query</title>
  833. <para>Now that we have created an indexed query, the next step is to
  834. enable access to it through a Web interface.</para>
  835. <para>Our STORED variables provide a means to pass values as query
  836. parameters. In this example, the user can supply the ZIP code so the
  837. results are people from that ZIP code.</para>
  838. <orderedlist>
  839. <listitem>
  840. <para>Insert a File into the <emphasis role="bold">TutorialYourName
  841. </emphasis>Folder and name it <emphasis
  842. role="bold">FetchPeopleByZipService</emphasis></para>
  843. </listitem>
  844. <listitem>
  845. <para>Write this code (changing <emphasis>YourName</emphasis> as
  846. before):</para>
  847. <para><programlisting>IMPORT TutorialYourName;
  848. STRING10 ZipFilter := '' :STORED('ZIPValue');
  849. resultSet :=
  850. FETCH(TutorialYourName.File_TutorialPerson,
  851. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  852. RIGHT.fpos);
  853. OUTPUT(resultset);
  854. </programlisting></para>
  855. </listitem>
  856. <listitem>
  857. <para>Check the syntax, and save the file.</para>
  858. </listitem>
  859. <listitem>
  860. <para>Press the <emphasis role="bold">Submit</emphasis><emphasis
  861. role="bold"> </emphasis>button.</para>
  862. </listitem>
  863. <listitem>
  864. <para>When the workunit completes, select the Workunit<emphasis
  865. role="bold"> </emphasis>tab, then select the ECL Watch tab.</para>
  866. </listitem>
  867. <listitem>
  868. <para>Press the <emphasis role="bold">Publish</emphasis> button, on
  869. the ECL Watch tab.</para>
  870. <para><figure>
  871. <title>Publish Workunit</title>
  872. <mediaobject>
  873. <imageobject>
  874. <imagedata fileref="images/DTimg12.jpg" />
  875. </imageobject>
  876. </mediaobject>
  877. </figure></para>
  878. <para>The Publish dialog displays, with the Job Name field
  879. automatically filled in. You can add a comment in the Comment field
  880. if you wish, then press Submit. <figure>
  881. <title>Publish Dialog</title>
  882. <mediaobject>
  883. <imageobject>
  884. <imagedata fileref="images/DTimg12b.jpg" />
  885. </imageobject>
  886. </mediaobject>
  887. </figure></para>
  888. </listitem>
  889. <listitem>
  890. <para>If there are no error messages, the workunit is published.
  891. Leave the builder window open, you will need it again later.</para>
  892. </listitem>
  893. </orderedlist>
  894. <sect2 id="Execute-using-the-Data-Delivery-Engine">
  895. <title>Execute using WsECL</title>
  896. <para>Now that the query is published, we can run it using the WsECL
  897. Web service. WsECL provides a Web-based interface to your published
  898. query. It also automatically creates an entry form to execute the
  899. query.</para>
  900. <para>Using the following URL:</para>
  901. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  902. nnn.nnn.nnn.nnn is your ESP Server's IP address and pppp is the port.
  903. Default port is 8002)</emphasis></para>
  904. <para><figure>
  905. <title>WsECL</title>
  906. <mediaobject>
  907. <imageobject>
  908. <imagedata fileref="images/DTimg13.jpg" />
  909. </imageobject>
  910. </mediaobject>
  911. </figure></para>
  912. <para></para>
  913. <orderedlist>
  914. <listitem>
  915. <para>Click on the + sign next to <emphasis
  916. role="bold">thor</emphasis> to expand the tree.</para>
  917. </listitem>
  918. <listitem>
  919. <?dbfo keep-together="always"?>
  920. <para>Click on the <emphasis
  921. role="bold">fetchpeoplebyzipservice</emphasis> hyperlink.</para>
  922. <para>The form for the service displays.</para>
  923. <para><figure>
  924. <title>Service Form</title>
  925. <mediaobject>
  926. <imageobject>
  927. <imagedata fileref="images/DTimg14a.jpg" />
  928. </imageobject>
  929. </mediaobject>
  930. </figure></para>
  931. </listitem>
  932. <listitem>
  933. <para>Provide a zip code (e.g., 33024) in the <emphasis
  934. role="bold">zipvalue</emphasis> field. Select <emphasis
  935. role="bold">Output Tables</emphasis> from the drop list, then
  936. press the <emphasis role="bold">Submit</emphasis> button.</para>
  937. <para>The results display.</para>
  938. <para><figure>
  939. <title>Results</title>
  940. <mediaobject>
  941. <imageobject>
  942. <imagedata fileref="images/DTimg15a.jpg" />
  943. </imageobject>
  944. </mediaobject>
  945. </figure></para>
  946. </listitem>
  947. </orderedlist>
  948. </sect2>
  949. </sect1>
  950. <sect1 id="Deploy_the_Roxie_Query">
  951. <title>Compile and Publish the Roxie Query</title>
  952. <para>The final step in this process is to publish the indexed query to
  953. a Rapid Data Delivery Engine (Roxie) Cluster.</para>
  954. <para>We will recompile the code with Roxie as the target cluster, then
  955. publish it to a Roxie cluster. <orderedlist>
  956. <listitem>
  957. <para>In the ECL IDE, select the Builder tab on the
  958. FetchPeopleByZipService file builder window.</para>
  959. </listitem>
  960. <listitem>
  961. <para>Using the <emphasis role="bold">Target</emphasis> drop list,
  962. select Roxie as the Target cluster.</para>
  963. <para><figure>
  964. <title>Target Roxie</title>
  965. <mediaobject>
  966. <imageobject>
  967. <imagedata fileref="images/DTimg16.jpg" />
  968. </imageobject>
  969. </mediaobject>
  970. </figure></para>
  971. </listitem>
  972. <listitem>
  973. <para>In the Builder window, in the upper left corner the
  974. <emphasis role="bold">Submit</emphasis> button has a drop down
  975. arrow next to it. Select the arrow to expose the <emphasis
  976. role="bold">Compile</emphasis> option.</para>
  977. <figure>
  978. <title>Compile</title>
  979. <mediaobject>
  980. <imageobject>
  981. <imagedata fileref="images/DTimg17.jpg" />
  982. </imageobject>
  983. </mediaobject>
  984. </figure>
  985. </listitem>
  986. <listitem>
  987. <para>Select <emphasis role="bold">Compile</emphasis></para>
  988. </listitem>
  989. <listitem>
  990. <?dbfo keep-together="always"?>
  991. <para>When the workunit finishes, it will display a green circle
  992. indicating it has compiled.</para>
  993. <para><figure>
  994. <title>Compiled</title>
  995. <mediaobject>
  996. <imageobject>
  997. <imagedata fileref="images/DTimg18.jpg" />
  998. </imageobject>
  999. </mediaobject>
  1000. </figure></para>
  1001. </listitem>
  1002. </orderedlist></para>
  1003. <sect2 id="Deploy_the_Query_to_Roxie">
  1004. <title>Publish the Roxie query</title>
  1005. <para>Next we will publish the query to a Roxie Cluster.</para>
  1006. <orderedlist>
  1007. <listitem>
  1008. <para>Select the workunit tab for the FetchPeopleByZipService that
  1009. you just compiled.</para>
  1010. <para>This opens the workunit in an ECL Watch tab.</para>
  1011. </listitem>
  1012. <listitem>
  1013. <?dbfo keep-together="always"?>
  1014. <para>Press the <emphasis role="bold">Publish</emphasis> action
  1015. button, then verify the information in the dialog and press
  1016. <emphasis role="bold">Submit</emphasis>.</para>
  1017. <para><figure>
  1018. <title>Publish Query</title>
  1019. <mediaobject>
  1020. <imageobject>
  1021. <imagedata fileref="images/DTimg19.jpg" />
  1022. </imageobject>
  1023. </mediaobject>
  1024. </figure>This publishes the query.</para>
  1025. </listitem>
  1026. </orderedlist>
  1027. </sect2>
  1028. <sect2 id="Run_the_Roxie_Query" role="brk">
  1029. <title>Run the Roxie Query in WsECL</title>
  1030. <para>Now that the query is deployed to a Roxie cluster, we can run it
  1031. using the WS-ECL service Using the following URL:</para>
  1032. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  1033. nnn.nnn.nnn.nnn is your ESP Server's IP address and pppp is the port.
  1034. The default port is 8002)</emphasis></para>
  1035. <orderedlist>
  1036. <listitem>
  1037. <para>Click on the + sign next to <emphasis
  1038. role="bold">myroxie</emphasis> to expand the tree.</para>
  1039. </listitem>
  1040. <listitem>
  1041. <?dbfo keep-together="always"?>
  1042. <para>Click on the <emphasis
  1043. role="bold">fetchpeoplebyzipservice</emphasis> hyperlink.</para>
  1044. <para>The form for the service displays.</para>
  1045. <para><figure>
  1046. <title>RoxieECL</title>
  1047. <mediaobject>
  1048. <imageobject>
  1049. <imagedata fileref="images/DTimg21.jpg" />
  1050. </imageobject>
  1051. </mediaobject>
  1052. </figure></para>
  1053. </listitem>
  1054. <listitem>
  1055. <?dbfo keep-together="always"?>
  1056. <para>Provide a zip code (e.g., 33024), select <emphasis
  1057. role="bold">Output Tables</emphasis> from the drop list, and press
  1058. the Submit button.</para>
  1059. <para>The results display.</para>
  1060. <para><figure>
  1061. <title>RoxieResults</title>
  1062. <mediaobject>
  1063. <imageobject>
  1064. <imagedata fileref="images/DTimg22.jpg" />
  1065. </imageobject>
  1066. </mediaobject>
  1067. </figure></para>
  1068. </listitem>
  1069. </orderedlist>
  1070. </sect2>
  1071. </sect1>
  1072. </chapter>
  1073. <chapter id="Summary">
  1074. <title>Summary</title>
  1075. <para>Now that you have successfully processed raw data, sprayed it onto a
  1076. cluster, and deployed it to a RDDE cluster, what's next?</para>
  1077. <!-- -->
  1078. <para>Here is a short list of suggestions on the path you might take from
  1079. here:</para>
  1080. <itemizedlist mark="bullet">
  1081. <listitem>
  1082. <para>Create indexes on other fields and create queries using
  1083. them.</para>
  1084. </listitem>
  1085. </itemizedlist>
  1086. <itemizedlist mark="bullet">
  1087. <listitem>
  1088. <para>Write client applications to access your queries using JSON or
  1089. SOAP interfaces.</para>
  1090. </listitem>
  1091. </itemizedlist>
  1092. <itemizedlist mark="bullet">
  1093. <listitem>
  1094. <para>Looks at the resources available on the Links tab</para>
  1095. <para><figure>
  1096. <title>Links</title>
  1097. <mediaobject>
  1098. <imageobject>
  1099. <imagedata fileref="images/DTimg24.jpg" />
  1100. </imageobject>
  1101. </mediaobject>
  1102. </figure>The Links tab provides easy access to a form, a Sample
  1103. Request, a Sample Response, the WSDL, the XML Schema (XSD) and
  1104. more...</para>
  1105. </listitem>
  1106. </itemizedlist>
  1107. <itemizedlist mark="bullet">
  1108. <listitem>
  1109. <para>Follow the procedures in this tutorial using your own
  1110. data!</para>
  1111. </listitem>
  1112. </itemizedlist>
  1113. </chapter>
  1114. </book>