DataTutorial.xml 46 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <book lang="en_US">
  5. <title>HPCC Data Tutorial</title>
  6. <bookinfo>
  7. <title>HPCC Data Tutorial</title>
  8. <mediaobject>
  9. <imageobject>
  10. <imagedata fileref="images/redswooshWithLogo3.jpg" />
  11. </imageobject>
  12. </mediaobject>
  13. <author>
  14. <surname>Boca Raton Documentation Team</surname>
  15. </author>
  16. <legalnotice>
  17. <para>We welcome your comments and feedback about this document via
  18. email to <email>docfeedback@hpccsystems.com</email> Please include
  19. <emphasis role="bold">Documentation Feedback</emphasis> in the subject
  20. line and reference the document name, page numbers, and current Version
  21. Number in the text of the message.</para>
  22. <para>LexisNexis and the Knowledge Burst logo are registered trademarks
  23. of Reed Elsevier Properties Inc., used under license. Other products and
  24. services may be trademarks or registered trademarks of their respective
  25. companies. All names and example data used in this manual are
  26. fictitious. Any similarity to actual persons, living or dead, is purely
  27. coincidental.</para>
  28. <para></para>
  29. </legalnotice>
  30. <xi:include href="Version.xml" xpointer="FooterInfo"
  31. xmlns:xi="http://www.w3.org/2001/XInclude" />
  32. <!--Release Info makes a running page footer: now an include! -->
  33. <!--The following include statement pulls in the date_ver from version.xml-->
  34. <xi:include href="Version.xml" xpointer="DateVer"
  35. xmlns:xi="http://www.w3.org/2001/XInclude" />
  36. <corpname>HPCC Systems</corpname>
  37. <!--corpname never prints-->
  38. <xi:include href="Version.xml" xpointer="Copyright"
  39. xmlns:xi="http://www.w3.org/2001/XInclude" />
  40. <!--Copyright tag inserts the symbol automatically: Now an Include!-->
  41. <mediaobject role="logo">
  42. <imageobject>
  43. <imagedata fileref="images/LN_Rightjustified.jpg" />
  44. </imageobject>
  45. </mediaobject>
  46. </bookinfo>
  47. <chapter>
  48. <title>Introduction</title>
  49. <sect1 id="Introduction_Case-Study-and-Tutorial" role="nobrk">
  50. <title>The ECL Development Process</title>
  51. <para>This tutorial provides a walk-through of the development process,
  52. from beginning to end, and is designed to be an introduction to working
  53. with data on any HPCCSystems HPCC<footnote>
  54. <para><emphasis role="bold">H</emphasis>igh <emphasis
  55. role="bold">P</emphasis>erformance <emphasis
  56. role="bold">C</emphasis>omputing <emphasis
  57. role="bold">C</emphasis>luster (HPCC) is a massively parallel
  58. processing computing platform that solves Big Data problems. See
  59. http://www.hpccsystems.com/Why-HPCC/How-it-works for more
  60. details.</para>
  61. </footnote>. We will write code in ECL<footnote>
  62. <para><emphasis role="bold">E</emphasis>nterprise <emphasis
  63. role="bold">C</emphasis>ontrol <emphasis
  64. role="bold">L</emphasis>anguage (ECL) is a declarative, data centric
  65. programming language used to manage all aspects of the massive data
  66. joins, sorts, and builds that truly differentiate HPCC (High
  67. Performance Computing Cluster) from other technologies in its
  68. ability to provide flexible data analysis on a massive scale.</para>
  69. </footnote>to process our data and query it.</para>
  70. <para>This tutorial assumes:</para>
  71. <itemizedlist>
  72. <listitem>
  73. <para>You have a running HPCC. This can be a VM Edition or a single
  74. or multinode HPCC platform</para>
  75. </listitem>
  76. </itemizedlist>
  77. <para>• You have the ECL IDE<footnote>
  78. <para>The ECL IDE (Integrated Development Environment) is the tool
  79. used to create queries into your data and ECL files with which to
  80. build your queries.</para>
  81. </footnote> installed and configured</para>
  82. <para>In this tutorial, we will:</para>
  83. <itemizedlist mark="bullet">
  84. <listitem>
  85. <para>Download a raw data file</para>
  86. <para>There are links to data file available at <ulink
  87. url="http://hpccsystems.com/community/docs"><emphasis
  88. role="bold">http://hpccsystems.com/community/docs</emphasis></ulink></para>
  89. <para>The download is approximately 30 MB (compressed) and is
  90. available in either ZIP or .tar.gz format. Choose the appropriate
  91. link.</para>
  92. </listitem>
  93. <listitem>
  94. <para>Spray the file to a Data Refinery cluster HPCC clusters
  95. "spray" data into file parts on each node.</para>
  96. <para>A <emphasis>spray</emphasis> or <emphasis>import</emphasis> is
  97. the relocation of a data file from one location to an HPCC cluster.
  98. The term spray was adopted due to the nature of the file movement –
  99. the file is partitioned across all nodes within a cluster.</para>
  100. </listitem>
  101. <listitem>
  102. <para>Examine the data and determine the pre-processing we need to
  103. perform</para>
  104. </listitem>
  105. <listitem>
  106. <para>Pre-process the data to produce a new data file</para>
  107. </listitem>
  108. <listitem>
  109. <para>Determine the types of queries we want</para>
  110. </listitem>
  111. <listitem>
  112. <para>Create the queries</para>
  113. </listitem>
  114. <listitem>
  115. <para>Test the queries</para>
  116. </listitem>
  117. <listitem>
  118. <para>Deploy them to a Rapid Data Delivery Engine (RDDE) cluster,
  119. also know as a Roxie cluster.</para>
  120. </listitem>
  121. </itemizedlist>
  122. </sect1>
  123. </chapter>
  124. <chapter id="Working_with_Data">
  125. <title>Working with Data</title>
  126. <sect1 id="The_Original_Data" role="nobrk">
  127. <title>The Original Data</title>
  128. <para>In this scenario, we receive a structured data file containing
  129. records with people's names and addresses. The HPCC also supports
  130. unstructured data, but this example is simpler. This file is documented
  131. in the following table:</para>
  132. <para></para>
  133. <para><informaltable colsep="1" frame="all" rowsep="1">
  134. <tgroup cols="3">
  135. <colspec colwidth="147.60pt" />
  136. <colspec colwidth="147.60pt" />
  137. <colspec colwidth="147.60pt" />
  138. <thead>
  139. <row>
  140. <entry align="left">Field Name</entry>
  141. <entry align="left">Type</entry>
  142. <entry align="left">Description</entry>
  143. </row>
  144. </thead>
  145. <tbody>
  146. <row>
  147. <entry>FirstName</entry>
  148. <entry>15 Character String</entry>
  149. <entry>First Name</entry>
  150. </row>
  151. <row>
  152. <entry>LastName</entry>
  153. <entry>25 Character String</entry>
  154. <entry>Last name</entry>
  155. </row>
  156. <row>
  157. <entry>MiddleName</entry>
  158. <entry>15 Character String</entry>
  159. <entry>Middle Name</entry>
  160. </row>
  161. <row>
  162. <entry>Zip</entry>
  163. <entry>5 Character String</entry>
  164. <entry>ZIP Code</entry>
  165. </row>
  166. <row>
  167. <entry>Street</entry>
  168. <entry>42 Character String</entry>
  169. <entry>Street Address</entry>
  170. </row>
  171. <row>
  172. <entry>City</entry>
  173. <entry>20 Character String</entry>
  174. <entry>City</entry>
  175. </row>
  176. <row>
  177. <entry>State</entry>
  178. <entry>2 Character String</entry>
  179. <entry>State</entry>
  180. </row>
  181. </tbody>
  182. </tgroup>
  183. </informaltable></para>
  184. <para>This gives us a record length of 124 (the total of all field
  185. lengths). You will need to know this length for the <emphasis
  186. role="bold">File Spray</emphasis> process.</para>
  187. <para></para>
  188. <sect2 id="Uploading_a_file">
  189. <title>Load the Incoming Data File to your Landing Zone</title>
  190. <para>A Landing Zone (or Drop Zone) is a physical storage location
  191. defined in your HPCC's environment. A daemon (DaFileSrv) must be
  192. running on that server to enable file sprays and desprays.</para>
  193. <para>For smaller data files, maximum of 2GB, you can use the
  194. upload/download file utility in ECL Watch (a Web-based interface to
  195. your HPCC platform). The sample data file is ~100 mb.</para>
  196. <orderedlist>
  197. <listitem>
  198. <para>Download the sample data file from the HPCC Systems
  199. portal.</para>
  200. <para>The data file is available from links found on <ulink
  201. url="http://hpccsystems.com/community/docs">http://hpccsystems.com/community/docs</ulink>.
  202. The download is approximately 30 MB (compressed) and is available
  203. in either ZIP or tar.gz format (<emphasis
  204. role="bold">OriginalPerson.tar.gz</emphasis> or <emphasis
  205. role="bold">OriginalPerson.zip</emphasis>)</para>
  206. </listitem>
  207. <listitem>
  208. <para>Extract it to a folder on your local machine.</para>
  209. </listitem>
  210. <listitem>
  211. <para>In your browser, go to the <emphasis role="bold">ECL
  212. Watch</emphasis> URL For example, http://nnn.nnn.nnn.nnn:8010,
  213. where nnn.nnn.nnn.nnn is your ESP<footnote>
  214. <para>The ESP (Enterprise Services Platform) Server is the
  215. communication layer server in you HPCC environment.</para>
  216. </footnote> Server's IP address.</para>
  217. <para><informaltable colsep="1" frame="all" rowsep="1">
  218. <?dbfo keep-together="always"?>
  219. <tgroup cols="2">
  220. <colspec colwidth="49.50pt" />
  221. <colspec />
  222. <tbody>
  223. <row>
  224. <entry><inlinegraphic
  225. fileref="images/caution.png" /></entry>
  226. <entry>Your IP address could be different from the ones
  227. provided in the example images. Please use the IP
  228. address provided by <emphasis
  229. role="bold">your</emphasis> installation.</entry>
  230. </row>
  231. </tbody>
  232. </tgroup>
  233. </informaltable></para>
  234. </listitem>
  235. <listitem>
  236. <?dbfo keep-together="always"?>
  237. <para>From ECL Watch page, click on the <emphasis
  238. role="bold">Upload/download File </emphasis> link in the menu on
  239. the left side.</para>
  240. <para><figure>
  241. <title>Upload/download</title>
  242. <mediaobject>
  243. <imageobject>
  244. <imagedata fileref="images/LZimg03-1.jpg" />
  245. </imageobject>
  246. </mediaobject>
  247. </figure></para>
  248. <para>Once you click on the Upload/download file link, it will
  249. take you to the Dropzones and Files page, where you can choose to
  250. <emphasis role="bold">Browse</emphasis> your machine for a file to
  251. upload:</para>
  252. <para><figure>
  253. <title>Dropzones and Files</title>
  254. <mediaobject>
  255. <imageobject>
  256. <imagedata fileref="images/LZimg04.jpg" />
  257. </imageobject>
  258. </mediaobject>
  259. </figure></para>
  260. </listitem>
  261. <listitem>
  262. <para>Press the <emphasis role="bold">Browse</emphasis> button to
  263. browse the files on your local machine, select the file to upload
  264. and then press the <emphasis role="bold">Open</emphasis>
  265. button.</para>
  266. <para>The file you selected should appear in the <emphasis
  267. role="bold">Select a file to upload:</emphasis> field. The data
  268. file is named: <emphasis role="bold">OriginalPerson.
  269. </emphasis></para>
  270. </listitem>
  271. <listitem>
  272. <para>Press on <emphasis role="bold">Upload Now</emphasis> to
  273. complete the file upload.</para>
  274. </listitem>
  275. </orderedlist>
  276. </sect2>
  277. <sect2 id="Spray_the_Data_File_to_your_DR-THOR_Cluster">
  278. <title>Spray the Data File to your THOR Cluster</title>
  279. <para>To use the data file in our HPCC cluster, we must first “spray”
  280. it to a Thor cluster. A <emphasis>spray</emphasis> or
  281. <emphasis>import</emphasis> is the relocation of a data file from one
  282. location to a Thor cluster. The term spray was adopted due to the
  283. nature of the file movement – the file is partitioned across all nodes
  284. within a cluster.</para>
  285. <para>In this example, the file is on your Landing Zone and is named
  286. <emphasis role="bold">OriginalPerson.</emphasis></para>
  287. <para>We are going to spray it to our Thor cluster and give it a
  288. logical name of <emphasis role="bold">tutorial::</emphasis><emphasis
  289. role="bold">YN</emphasis><emphasis
  290. role="bold">::OriginalPerson</emphasis><emphasis role="bold">
  291. </emphasis>where <emphasis role="bold">YN</emphasis> are your
  292. initials. The Distrubuted File Utility maintains a list of logical
  293. files and their corresponding physical file locations.</para>
  294. <orderedlist>
  295. <listitem>
  296. <para>Open ECL Watch in your browser using the following
  297. URL:</para>
  298. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp
  299. </emphasis><emphasis role="bold">(where nnn.nnn.nnn.nnn is your
  300. ESP Server’s IP Address and pppp is the port. The default port is
  301. 8010)</emphasis></para>
  302. </listitem>
  303. <listitem>
  304. <para>Click on the <emphasis role="bold">Spray Fixed</emphasis>
  305. hyperlink under the DFU Files menu on the left.</para>
  306. <para>The <emphasis role="bold">DFU Spray Fixed</emphasis> page
  307. displays.</para>
  308. </listitem>
  309. <listitem>
  310. <para>Using the Source <emphasis
  311. role="bold">Machine/dropzone</emphasis> drop-down list, select the
  312. Landing Zone where the file was placed.</para>
  313. <para>In the VM or Community Edition, there is only one Landing
  314. Zone.</para>
  315. <para>The IP Address is automatically filled and the Local Path is
  316. partially filled with the default folder on your landing
  317. zone.</para>
  318. </listitem>
  319. <listitem>
  320. <para>Complete the <emphasis role="bold">Local Path</emphasis> to
  321. include the complete file name or use the <emphasis
  322. role="bold">Choose File</emphasis> button to select the file from
  323. a list of files in the folder. (The file to choose is
  324. <emphasis>OriginalPerson</emphasis>)</para>
  325. </listitem>
  326. <listitem>
  327. <para>Fill in the <emphasis role="bold">Record Length</emphasis>
  328. (124).</para>
  329. </listitem>
  330. <listitem>
  331. <para>Fill in the <emphasis role="bold">Label</emphasis> using the
  332. naming convention described earlier: <emphasis
  333. role="bold">tutorial::</emphasis><emphasis
  334. role="bold">YN</emphasis><emphasis
  335. role="bold">::OriginalPerson</emphasis> (remember, <emphasis
  336. role="bold">YN</emphasis> are your initials).</para>
  337. </listitem>
  338. <listitem>
  339. <?dbfo keep-together="always"?>
  340. <para>Make sure the <emphasis
  341. role="bold">Replicate</emphasis><emphasis role="bold">
  342. </emphasis>box is checked.</para>
  343. <para><emphasis role="bold">Note:</emphasis> If replication is
  344. disabled in your Thor settings, this checkbox does not
  345. appear.</para>
  346. <para><figure>
  347. <title>Dropzones and Files</title>
  348. <mediaobject>
  349. <imageobject>
  350. <imagedata fileref="images/DTimg01.jpg" scale="85" />
  351. </imageobject>
  352. </mediaobject>
  353. </figure></para>
  354. </listitem>
  355. <listitem>
  356. <para>Press the <emphasis role="bold">Submit<emphasis role="bold">
  357. </emphasis></emphasis>button.</para>
  358. </listitem>
  359. <listitem>
  360. <?dbfo keep-together="always"?>
  361. <para>Click on the <emphasis role="bold">View Progress</emphasis>
  362. hyperlink</para>
  363. <para><figure>
  364. <title>View Progress</title>
  365. <mediaobject>
  366. <imageobject>
  367. <imagedata fileref="images/DTimg02.jpg" />
  368. </imageobject>
  369. </mediaobject>
  370. </figure>The Workunit progress page displays.</para>
  371. <para><figure>
  372. <title>Spray Complete</title>
  373. <mediaobject>
  374. <imageobject>
  375. <imagedata fileref="images/DTimg03.jpg" />
  376. </imageobject>
  377. </mediaobject>
  378. </figure> Once the spray is complete, we can proceed.</para>
  379. </listitem>
  380. </orderedlist>
  381. </sect2>
  382. </sect1>
  383. <sect1 id="Begin_Coding">
  384. <title>Begin Coding</title>
  385. <para>In this portion of the tutorial, we will write ECL code to define
  386. the data file and execute simple queries on it so we can evaluate it and
  387. determine any necessary pre-processing.</para>
  388. <orderedlist>
  389. <listitem>
  390. <para>Start the ECL IDE (Start &gt;&gt; All Programs &gt;&gt; HPCC
  391. Systems &gt;&gt; ECL IDE )</para>
  392. </listitem>
  393. <listitem>
  394. <para>Log in to your environment</para>
  395. <para>For purposes of this tutorial, let’s create a folder called
  396. <emphasis role="bold">Tutorial</emphasis><emphasis
  397. role="bold">YourName</emphasis><emphasis> </emphasis>(where
  398. <emphasis>YourName</emphasis> is your name).</para>
  399. </listitem>
  400. <listitem>
  401. <?dbfo keep-together="always"?>
  402. <para>Rt-Click on the <emphasis role="bold">My Files</emphasis>
  403. folder in the Repository<emphasis role="bold"></emphasis> window,
  404. and select <emphasis role="bold">Insert Folder</emphasis> from the
  405. pop-up menu.</para>
  406. <para><figure>
  407. <title>Insert Folder</title>
  408. <mediaobject>
  409. <imageobject>
  410. <imagedata fileref="images/DTimg04.jpg" />
  411. </imageobject>
  412. </mediaobject>
  413. </figure></para>
  414. </listitem>
  415. <listitem>
  416. <?dbfo keep-together="always"?>
  417. <para>Enter <emphasis role="bold">Tutorial</emphasis><emphasis
  418. role="bold">YourName</emphasis>(where <emphasis>YourName</emphasis>
  419. is your name)<emphasis></emphasis>for the label, then press the OK
  420. button.</para>
  421. <para><figure>
  422. <title>Enter Folder Label</title>
  423. <mediaobject>
  424. <imageobject>
  425. <imagedata fileref="images/DTimg05.jpg" />
  426. </imageobject>
  427. </mediaobject>
  428. </figure></para>
  429. </listitem>
  430. <listitem>
  431. <para>Rt-Click on the <emphasis
  432. role="bold">Tutorial</emphasis><emphasis
  433. role="bold">YourName</emphasis>Folder, and select <emphasis
  434. role="bold">Insert File</emphasis> from the pop-up menu.</para>
  435. </listitem>
  436. <listitem>
  437. <?dbfo keep-together="always"?>
  438. <para>Enter <emphasis role="bold">Layout_People</emphasis> for the
  439. label, then press the OK button.</para>
  440. <para><figure>
  441. <title>Insert File</title>
  442. <mediaobject>
  443. <imageobject>
  444. <imagedata fileref="images/DTimg06.jpg" />
  445. </imageobject>
  446. </mediaobject>
  447. </figure></para>
  448. <para>A Builder Window opens.</para>
  449. <para><figure>
  450. <title>Layout People in Builder</title>
  451. <mediaobject>
  452. <imageobject>
  453. <imagedata fileref="images/DTimg07.jpg" />
  454. </imageobject>
  455. </mediaobject>
  456. </figure></para>
  457. <para>Notice that some text has been written for you in the window.
  458. This helps you to remember that the name of the file (Layout_People)
  459. <emphasis>must always exactly match</emphasis> the name of the
  460. single EXPORT definition (Layout_People) contained in that file.
  461. This is a requirement -- one EXPORT definition per file, and its
  462. name must match the filename.</para>
  463. </listitem>
  464. <listitem>
  465. <?dbfo keep-together="always"?>
  466. <para>Write the following code in the Builder workspace:</para>
  467. <para><programlisting>EXPORT Layout_People := RECORD
  468. STRING15 FirstName;
  469. STRING25 LastName;
  470. STRING15 MiddleName;
  471. STRING5 Zip;
  472. STRING42 Street;
  473. STRING20 City;
  474. STRING2 State;
  475. END; </programlisting> <figure>
  476. <title>Code in Builder Window</title>
  477. <mediaobject>
  478. <imageobject>
  479. <imagedata fileref="images/DTimg08.jpg" />
  480. </imageobject>
  481. </mediaobject>
  482. </figure></para>
  483. </listitem>
  484. <listitem>
  485. <para>Press the syntax check button on the main toolbar (or press
  486. F7).</para>
  487. <para>It is always a good idea to check syntax before
  488. submitting.</para>
  489. <para><figure>
  490. <title>Check Syntax</title>
  491. <mediaobject>
  492. <imageobject>
  493. <imagedata fileref="images/DTimg23.jpg" />
  494. </imageobject>
  495. </mediaobject>
  496. </figure></para>
  497. <para>This file defines the record structure for the data file.
  498. Next, we will examine the data.</para>
  499. </listitem>
  500. </orderedlist>
  501. <sect2 id="Examine_the_Data" role="brk">
  502. <title>Examine the Data</title>
  503. <para>In this section, we will look at the data and determine if there
  504. is any pre-processing we want to perform on the data. This is the step
  505. in the development process where we convert the raw data into a form
  506. we can use.</para>
  507. <orderedlist>
  508. <listitem>
  509. <para>Rt-Click on the <emphasis
  510. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  511. </emphasis>Folder, and select <emphasis role="bold">Insert
  512. File</emphasis> from the pop-up menu.</para>
  513. </listitem>
  514. <listitem>
  515. <para>Enter <emphasis role="bold">File_OriginalPerson</emphasis>
  516. for the label, then press the OK button.</para>
  517. <para><figure>
  518. <title>Insert File</title>
  519. <mediaobject>
  520. <imageobject>
  521. <imagedata fileref="images/DTimg09.jpg" />
  522. </imageobject>
  523. </mediaobject>
  524. </figure>A Builder Window opens.</para>
  525. </listitem>
  526. <listitem>
  527. <para>Write the following code (remember to replace
  528. <emphasis>YN</emphasis>with your initials):</para>
  529. <para><programlisting>IMPORT TutorialYourName;
  530. EXPORT File_OriginalPerson :=
  531. DATASET('~tutorial::YN::OriginalPerson',TutorialYourName.Layout_People,THOR);
  532. </programlisting></para>
  533. <para><figure>
  534. <title>File_OriginalPerson.ecl</title>
  535. <mediaobject>
  536. <imageobject>
  537. <imagedata fileref="images/DTimg10.jpg" />
  538. </imageobject>
  539. </mediaobject>
  540. </figure></para>
  541. </listitem>
  542. <listitem>
  543. <para>Press the syntax check button on the main toolbar (or press
  544. F7) to check the syntax.</para>
  545. <para>This defines the Dataset. Next, we will examine the
  546. data.</para>
  547. </listitem>
  548. <listitem>
  549. <para>Open a new Builder Window (CTRL+N) and write the following
  550. code (remember to replace <emphasis>YourName </emphasis>with your
  551. name):</para>
  552. <programlisting>IMPORT TutorialYourName;
  553. COUNT(TutorialYourName.File_OriginalPerson);
  554. </programlisting>
  555. </listitem>
  556. <listitem>
  557. <para>Press the syntax check button on the main toolbar (or press
  558. F7) to check the syntax.</para>
  559. </listitem>
  560. <listitem>
  561. <?dbfo keep-together="always"?>
  562. <para>Make sure the selected cluster is your Thor cluster, then
  563. press the <emphasis role="bold">Submit</emphasis> button. Note
  564. that your target cluster might have a different name.</para>
  565. <para><figure>
  566. <title>Target Thor</title>
  567. <mediaobject>
  568. <imageobject>
  569. <imagedata fileref="images/DTimg11.jpg" />
  570. </imageobject>
  571. </mediaobject>
  572. </figure></para>
  573. </listitem>
  574. <listitem>
  575. <para>When the Workunit completes, it displays a green checkmark
  576. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  577. </listitem>
  578. <listitem>
  579. <para>Select the Workunit tab (the one with the number next to the
  580. checkmark) and select the <emphasis role="bold">Result
  581. 1</emphasis> tab (it may already be selected).</para>
  582. <para><figure>
  583. <title>Result tab</title>
  584. <mediaobject>
  585. <imageobject>
  586. <imagedata fileref="images/DT173-16.png" />
  587. </imageobject>
  588. </mediaobject>
  589. </figure>This shows us that there are 841,400 records in the
  590. data file.</para>
  591. </listitem>
  592. <listitem>
  593. <para>Select the Builder tab and change COUNT to OUTPUT, as shown
  594. below:</para>
  595. <para><programlisting>IMPORT TutorialYourName;
  596. <emphasis role="bold">OUTPUT</emphasis>(TutorialYourName.File_OriginalPerson);</programlisting></para>
  597. <para>Note: The modified portion is shown in <emphasis
  598. role="bold">bold</emphasis>.</para>
  599. </listitem>
  600. <listitem>
  601. <para>Check the syntax, if no errors, press the <emphasis
  602. role="bold">Submit</emphasis> button.</para>
  603. </listitem>
  604. <listitem>
  605. <?dbfo keep-together="always"?>
  606. <para>When it completes, select the Workunit tab, then select the
  607. <emphasis role="bold">Result 1</emphasis> tab.</para>
  608. <para><figure>
  609. <title>Output Results</title>
  610. <mediaobject>
  611. <imageobject>
  612. <imagedata fileref="images/DT173-17.png" />
  613. </imageobject>
  614. </mediaobject>
  615. </figure></para>
  616. <para>Notice the names are in mixed case.</para>
  617. <para>For our purposes, it will be easier to have all the names in
  618. all uppercase. This demonstrates one of the steps in the basic
  619. process of preparing data (Extract, Transform, and Load—ETL) using
  620. ECL.</para>
  621. </listitem>
  622. <listitem>
  623. <para>Close the Builder Window.</para>
  624. </listitem>
  625. </orderedlist>
  626. </sect2>
  627. <sect2 id="Process_the_Data">
  628. <title>Process the Data</title>
  629. <para>In this section, we will write code to convert the original data
  630. so that all names are in uppercase. We will then write this new file
  631. to our Thor cluster.</para>
  632. <orderedlist>
  633. <listitem>
  634. <para>Rt-Click on the <emphasis
  635. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  636. </emphasis>Folder, and select Insert File from the pop-up
  637. menu.</para>
  638. </listitem>
  639. <listitem>
  640. <para>Name this one <emphasis
  641. role="bold">BWR_ProcessRawData</emphasis> and write the following
  642. code (changing YN and YourName as before):</para>
  643. <para><programlisting>IMPORT TutorialYourName, Std;
  644. TutorialYourName.Layout_People toUpperPlease(TutorialYourName.Layout_People pInput)
  645. := TRANSFORM
  646. SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  647. SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  648. SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  649. SELF.Zip := pInput.Zip;
  650. SELF.Street := pInput.Street;
  651. SELF.City := pInput.City;
  652. SELF.State := pInput.State;
  653. END ;
  654. OrigDataset := TutorialYourName.File_OriginalPerson;
  655. UpperedDataset := PROJECT(OrigDataset,toUpperPlease(LEFT));
  656. OUTPUT(UpperedDataset,,'~tutorial::YN::TutorialPerson',OVERWRITE);
  657. </programlisting></para>
  658. </listitem>
  659. <listitem>
  660. <para>Check the syntax, if no errors press the <emphasis
  661. role="bold">Submit</emphasis> button.</para>
  662. </listitem>
  663. <listitem>
  664. <para>When it completes, select the Workunit tab, then select the
  665. Result 1 tab.</para>
  666. <para><figure>
  667. <title>Process Result</title>
  668. <mediaobject>
  669. <imageobject>
  670. <imagedata fileref="images/DT173-18.jpg" />
  671. </imageobject>
  672. </mediaobject>
  673. </figure></para>
  674. <para>The results show that the process has successfully converted
  675. the name fields to uppercase.</para>
  676. </listitem>
  677. <listitem>
  678. <para>After you examine the results, close the Builder
  679. window.</para>
  680. </listitem>
  681. </orderedlist>
  682. </sect2>
  683. <sect2 id="Using_our_Data">
  684. <title>Using our New Data</title>
  685. <para></para>
  686. <para>Now that we have our data in a useful format and the file is in
  687. place, we can write more code to use the new data file. We will
  688. determine the indexes we will need and create them. For this tutorial,
  689. let’s assume the field we need to index is the Zip code field.</para>
  690. <para></para>
  691. <para>In the DATASET definition, we will add a virtual field to the
  692. RECORD structure for the fileposition. This is required for
  693. indexes.</para>
  694. <para></para>
  695. <orderedlist>
  696. <listitem>
  697. <para>Insert a File into the <emphasis
  698. role="bold">Tutorial</emphasis><emphasis
  699. role="bold">YourName</emphasis><emphasis role="bold">
  700. </emphasis>Folder. Name it <emphasis role="bold">
  701. File_TutorialPerson </emphasis>and write this code (changing
  702. <emphasis>YN </emphasis>to your initials):</para>
  703. <para></para>
  704. <para><programlisting>IMPORT TutorialYourName;
  705. EXPORT File_TutorialPerson :=
  706. DATASET('~tutorial::YN::TutorialPerson',
  707. {TutorialYourName.Layout_People,
  708. UNSIGNED8 fpos {virtual(fileposition)}},THOR);
  709. </programlisting></para>
  710. </listitem>
  711. <listitem>
  712. <para>Check the syntax, if no errors press the <emphasis
  713. role="bold">Submit</emphasis> button.</para>
  714. </listitem>
  715. <listitem>
  716. <para>When it completes, it displays a green checkmark
  717. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  718. </listitem>
  719. </orderedlist>
  720. </sect2>
  721. <sect2 id="Index_the_Data">
  722. <title>Index the Data</title>
  723. <para>Next, we will define the INDEX.</para>
  724. <orderedlist>
  725. <listitem>
  726. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  727. role="bold">IDX_PeopleByZip</emphasis><emphasis role="bold">
  728. </emphasis>and write this code (changing <emphasis>YN</emphasis>
  729. and <emphasis>YourName</emphasis> as before):</para>
  730. <para><programlisting>IMPORT TutorialYourName;
  731. EXPORT IDX_PeopleByZIP :=
  732. INDEX(TutorialYourName.File_TutorialPerson,{zip,fpos},'~tutorial::YN::PeopleByZipINDEX');
  733. </programlisting></para>
  734. </listitem>
  735. <listitem>
  736. <para>Check the syntax.</para>
  737. <para>Next, we will build the index file.</para>
  738. </listitem>
  739. <listitem>
  740. <para>Insert a File into the <emphasis
  741. role="bold">Tutorial</emphasis><emphasis
  742. role="bold">YourName</emphasis><emphasis role="bold">
  743. </emphasis>Folder and name it <emphasis
  744. role="bold">BWR_BuildPeopleByZip </emphasis>and write this code
  745. (replacing <emphasis>YourName</emphasis> with your name):</para>
  746. <para><programlisting>IMPORT TutorialYourName;
  747. BUILDINDEX(TutorialYourName.IDX_PeopleByZIP,OVERWRITE);
  748. </programlisting></para>
  749. </listitem>
  750. <listitem>
  751. <para>Check the syntax and if there are no errors, press the
  752. <emphasis role="bold">Submit</emphasis> button.</para>
  753. </listitem>
  754. <listitem>
  755. <para>Wait for the Workunit to complete, then close the Builder
  756. Window.</para>
  757. </listitem>
  758. </orderedlist>
  759. </sect2>
  760. <sect2 id="Query_the_Data">
  761. <title>Build a Query</title>
  762. <para>Now that we have an index file, we will write a query that uses
  763. it.</para>
  764. <orderedlist>
  765. <listitem>
  766. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  767. role="bold">BWR_FetchPeopleByZip </emphasis>and write this code
  768. (changing <emphasis>YourName</emphasis> as before):</para>
  769. <para><programlisting>IMPORT TutorialYourName;
  770. ZipFilter :='33024';
  771. FetchPeopleByZip :=
  772. FETCH(TutorialYourName.File_TutorialPerson,
  773. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  774. RIGHT.fpos);
  775. OUTPUT(FetchPeopleByZip);
  776. </programlisting></para>
  777. </listitem>
  778. <listitem>
  779. <para>Check the syntax and if there are no errors, press the
  780. <emphasis role="bold">Submit</emphasis> button.</para>
  781. </listitem>
  782. <listitem>
  783. <para>When it completes, select the Workunit<emphasis role="bold">
  784. </emphasis>tab, then select the <emphasis
  785. role="bold">Result</emphasis> tab.</para>
  786. </listitem>
  787. <listitem>
  788. <para>Examine the result, then close the Builder window.</para>
  789. <para><emphasis role="bold">Note</emphasis>: You can change the
  790. value of the <emphasis role="bold">ZipValue</emphasis> field to
  791. get results from different Zip codes.</para>
  792. </listitem>
  793. </orderedlist>
  794. </sect2>
  795. </sect1>
  796. <sect1 id="Publishing_your_Query">
  797. <title>Publishing your Query</title>
  798. <para>Now that we have created an indexed query, the next step is to
  799. enable access to it through a Web interface.</para>
  800. <para>Our STORED variables provide a means to pass values as query
  801. parameters. In this example, the user can supply the ZIP code so the
  802. results are people from that ZIP code.</para>
  803. <orderedlist>
  804. <listitem>
  805. <para>Insert a File into the <emphasis role="bold">TutorialYourName
  806. </emphasis>Folder and name it <emphasis
  807. role="bold">FetchPeopleByZipService</emphasis></para>
  808. </listitem>
  809. <listitem>
  810. <para>Write this code (changing <emphasis>YourName</emphasis> as
  811. before):</para>
  812. <para><programlisting>IMPORT TutorialYourName;
  813. STRING10 ZipFilter := '' :STORED('ZIPValue');
  814. resultSet :=
  815. FETCH(TutorialYourName.File_TutorialPerson,
  816. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  817. RIGHT.fpos);
  818. OUTPUT(resultset);
  819. </programlisting></para>
  820. </listitem>
  821. <listitem>
  822. <para>Check the syntax, and save the file.</para>
  823. </listitem>
  824. <listitem>
  825. <para>Press the <emphasis role="bold">Submit</emphasis><emphasis
  826. role="bold"> </emphasis>button.</para>
  827. </listitem>
  828. <listitem>
  829. <para>When the workunit completes, select the Workunit<emphasis
  830. role="bold"> </emphasis>tab, then select the ECL Watch tab.</para>
  831. </listitem>
  832. <listitem>
  833. <?dbfo keep-together="always"?>
  834. <para>Press the <emphasis role="bold">Publish</emphasis> button, you
  835. may need to scroll down the main window.</para>
  836. <para><figure>
  837. <title>Publish Workunit</title>
  838. <mediaobject>
  839. <imageobject>
  840. <imagedata fileref="images/DTimg12.jpg" />
  841. </imageobject>
  842. </mediaobject>
  843. </figure></para>
  844. </listitem>
  845. <listitem>
  846. <para>When the workunit is published, a notice dialog
  847. displays.</para>
  848. <para><figure>
  849. <title>Workunit Published</title>
  850. <mediaobject>
  851. <imageobject>
  852. <imagedata fileref="images/DT173-18b.png" />
  853. </imageobject>
  854. </mediaobject>
  855. </figure></para>
  856. </listitem>
  857. </orderedlist>
  858. <sect2 id="Execute-using-the-Data-Delivery-Engine">
  859. <title>Execute using WsECL</title>
  860. <para>Now that the query is published, we can run it using the WsECL
  861. Web service. WsECL provides a Web-based interface to your published
  862. query. It also automatically creates an entry form to execute the
  863. query.</para>
  864. <para>Using the following URL:</para>
  865. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  866. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  867. Default port is 8002)</emphasis></para>
  868. <para></para>
  869. <para><figure>
  870. <title>WsECL</title>
  871. <mediaobject>
  872. <imageobject>
  873. <imagedata fileref="images/DTimg13.jpg" />
  874. </imageobject>
  875. </mediaobject>
  876. </figure></para>
  877. <para></para>
  878. <orderedlist>
  879. <listitem>
  880. <para>Click on the + sign next to <emphasis
  881. role="bold">thor</emphasis> to expand the tree.</para>
  882. </listitem>
  883. <listitem>
  884. <?dbfo keep-together="always"?>
  885. <para>Click on the <emphasis
  886. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  887. <para>The form for the service displays.</para>
  888. <para><figure>
  889. <title>Service Form</title>
  890. <mediaobject>
  891. <imageobject>
  892. <imagedata fileref="images/DTimg14a.jpg" />
  893. </imageobject>
  894. </mediaobject>
  895. </figure></para>
  896. </listitem>
  897. <listitem>
  898. <para>Provide a zip code (e.g., 33024) in the <emphasis
  899. role="bold">zipvalue</emphasis> field, , select <emphasis
  900. role="bold">Output Tables</emphasis> from the droplist, then press
  901. the <emphasis role="bold">Submit</emphasis> button.</para>
  902. <para>The results display.</para>
  903. <para><figure>
  904. <title>Results</title>
  905. <mediaobject>
  906. <imageobject>
  907. <imagedata fileref="images/DTimg15a.jpg" />
  908. </imageobject>
  909. </mediaobject>
  910. </figure></para>
  911. </listitem>
  912. </orderedlist>
  913. </sect2>
  914. </sect1>
  915. <sect1 id="Deploy_the_Roxie_Query">
  916. <title>Compile and Publish the Roxie Query</title>
  917. <para>The final step in this process is to publish the indexed query to
  918. a Rapid Data Delivery Engine (Roxie) Cluster.</para>
  919. <para>We will recompile the code with Roxie as the target cluster, then
  920. publish it to a Roxie cluster. <orderedlist>
  921. <listitem>
  922. <para>In the ECL IDE, select the Builder tab on the
  923. FetchPeopleByZipService file builder window,</para>
  924. </listitem>
  925. <listitem>
  926. <para>Using the <emphasis role="bold">Target</emphasis> droplist,
  927. select Roxie as the Target cluster.</para>
  928. <para><figure>
  929. <title>Target Roxie</title>
  930. <mediaobject>
  931. <imageobject>
  932. <imagedata fileref="images/DTimg16.jpg" />
  933. </imageobject>
  934. </mediaobject>
  935. </figure></para>
  936. </listitem>
  937. <listitem>
  938. <para>In the Builder window, in the upper left corner the
  939. <emphasis role="bold">Submit</emphasis> button has a drop down
  940. arrow next to it. Select the arrow to expose the <emphasis
  941. role="bold">Compile</emphasis> option.</para>
  942. <figure>
  943. <title>Compile</title>
  944. <mediaobject>
  945. <imageobject>
  946. <imagedata fileref="images/DTimg17.jpg" />
  947. </imageobject>
  948. </mediaobject>
  949. </figure>
  950. </listitem>
  951. <listitem>
  952. <para>Select <emphasis role="bold">Compile</emphasis></para>
  953. </listitem>
  954. <listitem>
  955. <?dbfo keep-together="always"?>
  956. <para>When the workunit finishes, it will display a green circle
  957. indicating it has compiled.</para>
  958. <para><figure>
  959. <title>Compiled</title>
  960. <mediaobject>
  961. <imageobject>
  962. <imagedata fileref="images/DTimg18.jpg" />
  963. </imageobject>
  964. </mediaobject>
  965. </figure></para>
  966. </listitem>
  967. </orderedlist></para>
  968. <sect2 id="Deploy_the_Query_to_Roxie">
  969. <title>Publish the Roxie query</title>
  970. <para>Next we will publish the query to a Roxie Cluster.</para>
  971. <orderedlist>
  972. <listitem>
  973. <para>Select the workunit tab for the FetchPeopleByZipService that
  974. you just compiled.</para>
  975. </listitem>
  976. <listitem>
  977. <para>Select the ECL Watch tab.</para>
  978. </listitem>
  979. <listitem>
  980. <?dbfo keep-together="always"?>
  981. <para>Press the <emphasis role="bold">Publish</emphasis> button
  982. (you may need to scroll down the main window)</para>
  983. <para><figure>
  984. <title>Publish Query</title>
  985. <mediaobject>
  986. <imageobject>
  987. <imagedata fileref="images/DTimg19.jpg" />
  988. </imageobject>
  989. </mediaobject>
  990. </figure>When it successfully publishes, you will see:</para>
  991. <para><figure>
  992. <title>Workunit Published</title>
  993. <mediaobject>
  994. <imageobject>
  995. <imagedata fileref="images/DT173-18b.png" />
  996. </imageobject>
  997. </mediaobject>
  998. </figure></para>
  999. </listitem>
  1000. </orderedlist>
  1001. </sect2>
  1002. <sect2 id="Run_the_Roxie_Query" role="brk">
  1003. <title>Run the Roxie Query in WsECL</title>
  1004. <para>Now that the query is deployed to a Roxie cluster, we can run it
  1005. using the WS-ECL service Using the following URL:</para>
  1006. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  1007. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  1008. The default port is 8002)</emphasis></para>
  1009. <orderedlist>
  1010. <listitem>
  1011. <para>Click on the + sign next to <emphasis
  1012. role="bold">myroxie</emphasis> to expand the tree.</para>
  1013. </listitem>
  1014. <listitem>
  1015. <?dbfo keep-together="always"?>
  1016. <para>Click on the <emphasis
  1017. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  1018. <para>The form for the service displays.</para>
  1019. <para><figure>
  1020. <title>RoxieECL</title>
  1021. <mediaobject>
  1022. <imageobject>
  1023. <imagedata fileref="images/DTimg21.jpg" />
  1024. </imageobject>
  1025. </mediaobject>
  1026. </figure></para>
  1027. </listitem>
  1028. <listitem>
  1029. <?dbfo keep-together="always"?>
  1030. <para>Provide a zip code (e.g., 33024), select <emphasis
  1031. role="bold">Output Tables</emphasis> from the droplist, and press
  1032. the Submit button.</para>
  1033. <para>The results display.</para>
  1034. <para><figure>
  1035. <title>RoxieResults</title>
  1036. <mediaobject>
  1037. <imageobject>
  1038. <imagedata fileref="images/DTimg22.jpg" />
  1039. </imageobject>
  1040. </mediaobject>
  1041. </figure></para>
  1042. </listitem>
  1043. </orderedlist>
  1044. </sect2>
  1045. </sect1>
  1046. </chapter>
  1047. <chapter id="Summary">
  1048. <title>Summary</title>
  1049. <para>Now that you have successfully processed raw data, sprayed it onto a
  1050. cluster, and deployed it to a RDDE cluster, what’s next?</para>
  1051. <!-- -->
  1052. <para>Here is a short list of suggestions on the path you might take from
  1053. here:</para>
  1054. <itemizedlist mark="bullet">
  1055. <listitem>
  1056. <para>Create indexes on other fields and create queries using
  1057. them.</para>
  1058. </listitem>
  1059. </itemizedlist>
  1060. <itemizedlist mark="bullet">
  1061. <listitem>
  1062. <para>Write client applications to access your queries using JSON or
  1063. SOAP interfaces.</para>
  1064. </listitem>
  1065. </itemizedlist>
  1066. <itemizedlist mark="bullet">
  1067. <listitem>
  1068. <para>Looks at the resources available on the Links tab</para>
  1069. <para><figure>
  1070. <title>Links</title>
  1071. <mediaobject>
  1072. <imageobject>
  1073. <imagedata fileref="images/DTimg24.jpg" />
  1074. </imageobject>
  1075. </mediaobject>
  1076. </figure>The Links tab provides easy access to a form, a Sample
  1077. Request, a Sample Response, the WSDL, the XML Schema (XSD) and
  1078. more...</para>
  1079. </listitem>
  1080. </itemizedlist>
  1081. <itemizedlist mark="bullet">
  1082. <listitem>
  1083. <para>Follow the procedures in this tutorial using your own
  1084. data!</para>
  1085. </listitem>
  1086. </itemizedlist>
  1087. </chapter>
  1088. </book>