DataTutorial.xml 47 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361136213631364136513661367136813691370137113721373137413751376137713781379138013811382138313841385138613871388138913901391139213931394
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <book lang="en_US" xml:base="../">
  5. <title>HPCC Data Tutorial</title>
  6. <bookinfo>
  7. <title>HPCC Data Tutorial</title>
  8. <mediaobject>
  9. <imageobject>
  10. <imagedata fileref="images/redswooshWithLogo3.jpg" />
  11. </imageobject>
  12. </mediaobject>
  13. <author>
  14. <surname>Boca Raton Documentation Team</surname>
  15. </author>
  16. <legalnotice>
  17. <para>We welcome your comments and feedback about this document via
  18. email to <email>docfeedback@hpccsystems.com</email> Please include
  19. <emphasis role="bold">Documentation Feedback</emphasis> in the subject
  20. line and reference the document name, page numbers, and current Version
  21. Number in the text of the message.</para>
  22. <para>LexisNexis and the Knowledge Burst logo are registered trademarks
  23. of Reed Elsevier Properties Inc., used under license. Other products and
  24. services may be trademarks or registered trademarks of their respective
  25. companies. All names and example data used in this manual are
  26. fictitious. Any similarity to actual persons, living or dead, is purely
  27. coincidental.</para>
  28. <para></para>
  29. </legalnotice>
  30. <xi:include href="common/Version.xml" xpointer="FooterInfo"
  31. xmlns:xi="http://www.w3.org/2001/XInclude" />
  32. <!--Release Info makes a running page footer: now an include! -->
  33. <!--The following include statement pulls in the date_ver from version.xml-->
  34. <xi:include href="common/Version.xml" xpointer="DateVer"
  35. xmlns:xi="http://www.w3.org/2001/XInclude" />
  36. <corpname>HPCC Systems</corpname>
  37. <!--corpname never prints-->
  38. <xi:include href="common/Version.xml" xpointer="Copyright"
  39. xmlns:xi="http://www.w3.org/2001/XInclude" />
  40. <!--Copyright tag inserts the symbol automatically: Now an Include!-->
  41. <mediaobject role="logo">
  42. <imageobject>
  43. <imagedata fileref="images/LN_Rightjustified.jpg" />
  44. </imageobject>
  45. </mediaobject>
  46. </bookinfo>
  47. <chapter>
  48. <title>Introduction</title>
  49. <sect1 id="Introduction_Case-Study-and-Tutorial" role="nobrk">
  50. <title>The ECL Development Process</title>
  51. <para>This tutorial provides a walk-through of the development process,
  52. from beginning to end, and is designed to be an introduction to working
  53. with data on any HPCCSystems HPCC<footnote>
  54. <para><emphasis role="bold">H</emphasis>igh <emphasis
  55. role="bold">P</emphasis>erformance <emphasis
  56. role="bold">C</emphasis>omputing <emphasis
  57. role="bold">C</emphasis>luster (HPCC) is a massively parallel
  58. processing computing platform that solves Big Data problems. See
  59. http://www.hpccsystems.com/Why-HPCC/How-it-works for more
  60. details.</para>
  61. </footnote>. We will write code in ECL<footnote>
  62. <para><emphasis role="bold">E</emphasis>nterprise <emphasis
  63. role="bold">C</emphasis>ontrol <emphasis
  64. role="bold">L</emphasis>anguage (ECL) is a declarative, data centric
  65. programming language used to manage all aspects of the massive data
  66. joins, sorts, and builds that truly differentiate HPCC (High
  67. Performance Computing Cluster) from other technologies in its
  68. ability to provide flexible data analysis on a massive scale.</para>
  69. </footnote>to process our data and query it.</para>
  70. <para>This tutorial assumes:</para>
  71. <itemizedlist>
  72. <listitem>
  73. <para>You have a running HPCC. This can be a VM Edition or a single
  74. or multinode HPCC platform</para>
  75. </listitem>
  76. </itemizedlist>
  77. <para>• You have the ECL IDE<footnote>
  78. <para>The ECL IDE (Integrated Development Environment) is the tool
  79. used to create queries into your data and ECL files with which to
  80. build your queries.</para>
  81. </footnote> installed and configured</para>
  82. <para>In this tutorial, we will:</para>
  83. <itemizedlist mark="bullet">
  84. <listitem>
  85. <para>Download a raw data file</para>
  86. <para>There are links to data file available at <ulink
  87. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink></para>
  88. <para>The download is approximately 30 MB (compressed) and is
  89. available in either ZIP or .tar.gz format. Choose the appropriate
  90. link.</para>
  91. </listitem>
  92. <listitem>
  93. <para>Spray the file to a Data Refinery cluster HPCC clusters
  94. "spray" data into file parts on each node.</para>
  95. <para>A <emphasis>spray</emphasis> or <emphasis>import</emphasis> is
  96. the relocation of a data file from one location to an HPCC cluster.
  97. The term spray was adopted due to the nature of the file movement –
  98. the file is partitioned across all nodes within a cluster.</para>
  99. </listitem>
  100. <listitem>
  101. <para>Examine the data and determine the pre-processing we need to
  102. perform</para>
  103. </listitem>
  104. <listitem>
  105. <para>Pre-process the data to produce a new data file</para>
  106. </listitem>
  107. <listitem>
  108. <para>Determine the types of queries we want</para>
  109. </listitem>
  110. <listitem>
  111. <para>Create the queries</para>
  112. </listitem>
  113. <listitem>
  114. <para>Test the queries</para>
  115. </listitem>
  116. <listitem>
  117. <para>Deploy them to a Rapid Data Delivery Engine (RDDE) cluster,
  118. also know as a Roxie cluster.</para>
  119. </listitem>
  120. </itemizedlist>
  121. </sect1>
  122. </chapter>
  123. <chapter id="Working_with_Data">
  124. <title>Working with Data</title>
  125. <sect1 id="The_Original_Data" role="nobrk">
  126. <title>The Original Data</title>
  127. <para>In this scenario, we receive a structured data file containing
  128. records with people's names and addresses. The HPCC also supports
  129. unstructured data, but this example is simpler. This file is documented
  130. in the following table:</para>
  131. <para></para>
  132. <para><informaltable colsep="1" frame="all" rowsep="1">
  133. <tgroup cols="3">
  134. <colspec colwidth="147.60pt" />
  135. <colspec colwidth="147.60pt" />
  136. <colspec colwidth="147.60pt" />
  137. <thead>
  138. <row>
  139. <entry align="left">Field Name</entry>
  140. <entry align="left">Type</entry>
  141. <entry align="left">Description</entry>
  142. </row>
  143. </thead>
  144. <tbody>
  145. <row>
  146. <entry>FirstName</entry>
  147. <entry>15 Character String</entry>
  148. <entry>First Name</entry>
  149. </row>
  150. <row>
  151. <entry>LastName</entry>
  152. <entry>25 Character String</entry>
  153. <entry>Last name</entry>
  154. </row>
  155. <row>
  156. <entry>MiddleName</entry>
  157. <entry>15 Character String</entry>
  158. <entry>Middle Name</entry>
  159. </row>
  160. <row>
  161. <entry>Zip</entry>
  162. <entry>5 Character String</entry>
  163. <entry>ZIP Code</entry>
  164. </row>
  165. <row>
  166. <entry>Street</entry>
  167. <entry>42 Character String</entry>
  168. <entry>Street Address</entry>
  169. </row>
  170. <row>
  171. <entry>City</entry>
  172. <entry>20 Character String</entry>
  173. <entry>City</entry>
  174. </row>
  175. <row>
  176. <entry>State</entry>
  177. <entry>2 Character String</entry>
  178. <entry>State</entry>
  179. </row>
  180. </tbody>
  181. </tgroup>
  182. </informaltable></para>
  183. <para>This gives us a record length of 124 (the total of all field
  184. lengths). You will need to know this length for the <emphasis
  185. role="bold">File Spray</emphasis> process.</para>
  186. <para></para>
  187. <sect2 id="Uploading_a_file">
  188. <title>Load the Incoming Data File to your Landing Zone</title>
  189. <para>A Landing Zone (or Drop Zone) is a physical storage location
  190. defined in your HPCC's environment. A daemon (DaFileSrv) must be
  191. running on that server to enable file sprays and desprays.</para>
  192. <para>For smaller data files, maximum of 2GB, you can use the
  193. upload/download file utility in ECL Watch (a Web-based interface to
  194. your HPCC platform). The sample data file is ~100 mb.</para>
  195. <orderedlist>
  196. <listitem>
  197. <para>Download the sample data file from the HPCC Systems
  198. portal.</para>
  199. <para>The data file is available from links found on <ulink
  200. url="http://hpccsystems.com/community/docs/data-tutorial-guide">http://hpccsystems.com/community/docs/data-tutorial-guide</ulink>.
  201. The download is approximately 30 MB (compressed) and is available
  202. in either ZIP or tar.gz format (<emphasis
  203. role="bold">OriginalPerson.tar.gz</emphasis> or <emphasis
  204. role="bold">OriginalPerson.zip</emphasis>)</para>
  205. </listitem>
  206. <listitem>
  207. <para>Extract it to a folder on your local machine.</para>
  208. </listitem>
  209. <listitem>
  210. <para>In your browser, go to the <emphasis role="bold">ECL
  211. Watch</emphasis> URL. For example, http://nnn.nnn.nnn.nnn:8010,
  212. where nnn.nnn.nnn.nnn is your ESP<footnote>
  213. <para>The ESP (Enterprise Services Platform) Server is the
  214. communication layer server in you HPCC environment.</para>
  215. </footnote> Server's IP address.</para>
  216. <para><informaltable colsep="1" frame="all" rowsep="1">
  217. <?dbfo keep-together="always"?>
  218. <tgroup cols="2">
  219. <colspec colwidth="49.50pt" />
  220. <colspec />
  221. <tbody>
  222. <row>
  223. <entry><inlinegraphic
  224. fileref="images/caution.png" /></entry>
  225. <entry>Your IP address could be different from the ones
  226. provided in the example images. Please use the IP
  227. address provided by <emphasis
  228. role="bold">your</emphasis> installation.</entry>
  229. </row>
  230. </tbody>
  231. </tgroup>
  232. </informaltable></para>
  233. </listitem>
  234. <listitem>
  235. <?dbfo keep-together="always"?>
  236. <para>From ECL Watch page, click on the <emphasis
  237. role="bold">Upload/download File </emphasis> link in the menu on
  238. the left side.</para>
  239. <para><figure>
  240. <title>Upload/download</title>
  241. <mediaobject>
  242. <imageobject>
  243. <imagedata fileref="images/LZimg03-1.jpg"
  244. vendor="eclwatchSS" />
  245. </imageobject>
  246. </mediaobject>
  247. </figure></para>
  248. <para>Once you click on the Upload/download file link, it will
  249. take you to the Dropzones and Files page, where you can choose to
  250. <emphasis role="bold">Browse</emphasis> your machine for a file to
  251. upload:</para>
  252. <para><figure>
  253. <title>Dropzones and Files</title>
  254. <mediaobject>
  255. <imageobject>
  256. <imagedata fileref="images/LZimg04.jpg"
  257. vendor="eclwatchSS" />
  258. </imageobject>
  259. </mediaobject>
  260. </figure></para>
  261. </listitem>
  262. <listitem>
  263. <para>Press the <emphasis role="bold">Browse</emphasis> button to
  264. browse the files on your local machine, select the file to upload
  265. and then press the <emphasis role="bold">Open</emphasis>
  266. button.</para>
  267. <para>The file you selected should appear in the <emphasis
  268. role="bold">Select a file to upload:</emphasis> field. The data
  269. file is named: <emphasis role="bold">OriginalPerson.
  270. </emphasis></para>
  271. </listitem>
  272. <listitem>
  273. <para>Press on <emphasis role="bold">Upload Now</emphasis> to
  274. complete the file upload.</para>
  275. </listitem>
  276. </orderedlist>
  277. </sect2>
  278. <sect2 id="Spray_the_Data_File_to_your_DR-THOR_Cluster">
  279. <title>Spray the Data File to your THOR Cluster</title>
  280. <para>To use the data file in our HPCC cluster, we must first “spray”
  281. it to a Thor cluster. A <emphasis>spray</emphasis> or
  282. <emphasis>import</emphasis> is the relocation of a data file from one
  283. location to a Thor cluster. The term spray was adopted due to the
  284. nature of the file movement – the file is partitioned across all nodes
  285. within a cluster.</para>
  286. <para>In this example, the file is on your Landing Zone and is named
  287. <emphasis role="bold">OriginalPerson.</emphasis></para>
  288. <para>We are going to spray it to our Thor cluster and give it a
  289. logical name of <emphasis role="bold">tutorial::</emphasis><emphasis
  290. role="bold">YN</emphasis><emphasis
  291. role="bold">::OriginalPerson</emphasis><emphasis role="bold">
  292. </emphasis>where <emphasis role="bold">YN</emphasis> are your
  293. initials. The Distrubuted File Utility maintains a list of logical
  294. files and their corresponding physical file locations.</para>
  295. <orderedlist>
  296. <listitem>
  297. <para>Open ECL Watch in your browser using the following
  298. URL:</para>
  299. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp
  300. </emphasis><emphasis role="bold">(where nnn.nnn.nnn.nnn is your
  301. ESP Server’s IP Address and pppp is the port. The default port is
  302. 8010)</emphasis></para>
  303. </listitem>
  304. <listitem>
  305. <para>Click on the <emphasis role="bold">Spray Fixed</emphasis>
  306. hyperlink under the DFU Files menu on the left.</para>
  307. <para>The <emphasis role="bold">DFU Spray Fixed</emphasis> page
  308. displays.</para>
  309. </listitem>
  310. <listitem>
  311. <para>Using the Source <emphasis
  312. role="bold">Machine/dropzone</emphasis> drop list, select the
  313. Landing Zone where the file was placed.</para>
  314. <para>In the VM or Community Edition, there is only one Landing
  315. Zone.</para>
  316. <para>The IP Address is automatically filled and the Local Path is
  317. partially filled with the default folder on your landing
  318. zone.</para>
  319. </listitem>
  320. <listitem>
  321. <para>Complete the <emphasis role="bold">Local Path</emphasis> to
  322. include the complete file name or use the <emphasis
  323. role="bold">Choose File</emphasis> button to select the file from
  324. a list of files in the folder. (The file to choose is
  325. <emphasis>OriginalPerson</emphasis>)</para>
  326. </listitem>
  327. <listitem>
  328. <para>Fill in the <emphasis role="bold">Record Length</emphasis>
  329. (124).</para>
  330. </listitem>
  331. <listitem>
  332. <para>Fill in the <emphasis role="bold">Label</emphasis> using the
  333. naming convention described earlier: <emphasis
  334. role="bold">tutorial::</emphasis><emphasis
  335. role="bold">YN</emphasis><emphasis
  336. role="bold">::OriginalPerson</emphasis> (remember, <emphasis
  337. role="bold">YN</emphasis> are your initials).</para>
  338. </listitem>
  339. <listitem>
  340. <?dbfo keep-together="always"?>
  341. <para>Make sure the <emphasis
  342. role="bold">Replicate</emphasis><emphasis role="bold">
  343. </emphasis>box is checked.</para>
  344. <para><emphasis role="bold">Note:</emphasis> If replication is
  345. disabled in your Thor settings, this checkbox does not
  346. appear.</para>
  347. <para><figure>
  348. <title>Dropzones and Files</title>
  349. <mediaobject>
  350. <imageobject>
  351. <imagedata fileref="images/DTimg01.jpg" />
  352. </imageobject>
  353. </mediaobject>
  354. </figure></para>
  355. </listitem>
  356. <listitem>
  357. <para>Press the <emphasis role="bold">Submit<emphasis role="bold">
  358. </emphasis></emphasis>button.</para>
  359. </listitem>
  360. <listitem>
  361. <?dbfo keep-together="always"?>
  362. <para>Click on the <emphasis role="bold">View Progress</emphasis>
  363. hyperlink</para>
  364. <para><figure>
  365. <title>View Progress</title>
  366. <mediaobject>
  367. <imageobject>
  368. <imagedata fileref="images/DTimg02.jpg"
  369. vendor="eclwatchSS" />
  370. </imageobject>
  371. </mediaobject>
  372. </figure>The Workunit progress page displays.</para>
  373. <para><figure>
  374. <title>Spray Complete</title>
  375. <mediaobject>
  376. <imageobject>
  377. <imagedata fileref="images/DTimg03.jpg"
  378. vendor="eclwatchSS" />
  379. </imageobject>
  380. </mediaobject>
  381. </figure> Once the spray is complete, we can proceed.</para>
  382. </listitem>
  383. </orderedlist>
  384. </sect2>
  385. </sect1>
  386. <sect1 id="Begin_Coding">
  387. <title>Begin Coding</title>
  388. <para>In this portion of the tutorial, we will write ECL code to define
  389. the data file and execute simple queries on it so we can evaluate it and
  390. determine any necessary pre-processing.</para>
  391. <orderedlist>
  392. <listitem>
  393. <para>Start the ECL IDE (Start &gt;&gt; All Programs &gt;&gt; HPCC
  394. Systems &gt;&gt; ECL IDE )</para>
  395. </listitem>
  396. <listitem>
  397. <para>Log in to your environment</para>
  398. <para>For purposes of this tutorial, let’s create a folder called
  399. <emphasis role="bold">Tutorial</emphasis><emphasis
  400. role="bold">YourName</emphasis><emphasis> </emphasis>(where
  401. <emphasis>YourName</emphasis> is your name).</para>
  402. </listitem>
  403. <listitem>
  404. <?dbfo keep-together="always"?>
  405. <para>Rt-Click on the <emphasis role="bold">My Files</emphasis>
  406. folder in the Repository<emphasis role="bold"></emphasis> window,
  407. and select <emphasis role="bold">Insert Folder</emphasis> from the
  408. pop-up menu.</para>
  409. <para><figure>
  410. <title>Insert Folder</title>
  411. <mediaobject>
  412. <imageobject>
  413. <imagedata fileref="images/DTimg04.jpg" />
  414. </imageobject>
  415. </mediaobject>
  416. </figure></para>
  417. </listitem>
  418. <listitem>
  419. <?dbfo keep-together="always"?>
  420. <para>Enter <emphasis role="bold">Tutorial</emphasis><emphasis
  421. role="bold">YourName</emphasis>(where <emphasis>YourName</emphasis>
  422. is your name)<emphasis></emphasis>for the label, then press the OK
  423. button.</para>
  424. <para><figure>
  425. <title>Enter Folder Label</title>
  426. <mediaobject>
  427. <imageobject>
  428. <imagedata fileref="images/DTimg05.jpg" />
  429. </imageobject>
  430. </mediaobject>
  431. </figure></para>
  432. </listitem>
  433. <listitem>
  434. <para>Rt-Click on the <emphasis
  435. role="bold">Tutorial</emphasis><emphasis
  436. role="bold">YourName</emphasis>Folder, and select <emphasis
  437. role="bold">Insert File</emphasis> from the pop-up menu.</para>
  438. </listitem>
  439. <listitem>
  440. <?dbfo keep-together="always"?>
  441. <para>Enter <emphasis role="bold">Layout_People</emphasis> for the
  442. label, then press the OK button.</para>
  443. <para><figure>
  444. <title>Insert File</title>
  445. <mediaobject>
  446. <imageobject>
  447. <imagedata fileref="images/DTimg06.jpg" />
  448. </imageobject>
  449. </mediaobject>
  450. </figure></para>
  451. <para>A Builder Window opens.</para>
  452. <para><figure>
  453. <title>Layout People in Builder</title>
  454. <mediaobject>
  455. <imageobject>
  456. <imagedata fileref="images/DTimg07.jpg" />
  457. </imageobject>
  458. </mediaobject>
  459. </figure></para>
  460. <para>Notice that some text has been written for you in the window.
  461. This helps you to remember that the name of the file (Layout_People)
  462. <emphasis>must always exactly match</emphasis> the name of the
  463. single EXPORT definition (Layout_People) contained in that file.
  464. This is a requirement -- one EXPORT definition per file, and its
  465. name must match the filename.</para>
  466. </listitem>
  467. <listitem>
  468. <?dbfo keep-together="always"?>
  469. <para>Write the following code in the Builder workspace:</para>
  470. <para><programlisting>EXPORT Layout_People := RECORD
  471. STRING15 FirstName;
  472. STRING25 LastName;
  473. STRING15 MiddleName;
  474. STRING5 Zip;
  475. STRING42 Street;
  476. STRING20 City;
  477. STRING2 State;
  478. END; </programlisting> <figure>
  479. <title>Code in Builder Window</title>
  480. <mediaobject>
  481. <imageobject>
  482. <imagedata fileref="images/DTimg08.jpg" />
  483. </imageobject>
  484. </mediaobject>
  485. </figure></para>
  486. </listitem>
  487. <listitem>
  488. <para>Press the syntax check button on the main toolbar (or press
  489. F7).</para>
  490. <para>It is always a good idea to check syntax before
  491. submitting.</para>
  492. <para><figure>
  493. <title>Check Syntax</title>
  494. <mediaobject>
  495. <imageobject>
  496. <imagedata fileref="images/DTimg23.jpg" />
  497. </imageobject>
  498. </mediaobject>
  499. </figure></para>
  500. <para>This file defines the record structure for the data file.
  501. Next, we will examine the data.</para>
  502. </listitem>
  503. </orderedlist>
  504. <sect2 id="Examine_the_Data" role="brk">
  505. <title>Examine the Data</title>
  506. <para>In this section, we will look at the data and determine if there
  507. is any pre-processing we want to perform on the data. This is the step
  508. in the development process where we convert the raw data into a form
  509. we can use.</para>
  510. <orderedlist>
  511. <listitem>
  512. <para>Rt-Click on the <emphasis
  513. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  514. </emphasis>Folder, and select <emphasis role="bold">Insert
  515. File</emphasis> from the pop-up menu.</para>
  516. </listitem>
  517. <listitem>
  518. <para>Enter <emphasis role="bold">File_OriginalPerson</emphasis>
  519. for the label, then press the OK button.</para>
  520. <para><figure>
  521. <title>Insert File</title>
  522. <mediaobject>
  523. <imageobject>
  524. <imagedata fileref="images/DTimg09.jpg" />
  525. </imageobject>
  526. </mediaobject>
  527. </figure>A Builder Window opens.</para>
  528. </listitem>
  529. <listitem>
  530. <para>Write the following code (remember to replace
  531. <emphasis>YN</emphasis>with your initials):</para>
  532. <para><programlisting>IMPORT TutorialYourName;
  533. EXPORT File_OriginalPerson :=
  534. DATASET('~tutorial::YN::OriginalPerson',TutorialYourName.Layout_People,THOR);
  535. </programlisting></para>
  536. <para><figure>
  537. <title>File_OriginalPerson.ecl</title>
  538. <mediaobject>
  539. <imageobject>
  540. <imagedata fileref="images/DTimg10.jpg" />
  541. </imageobject>
  542. </mediaobject>
  543. </figure></para>
  544. </listitem>
  545. <listitem>
  546. <para>Press the syntax check button on the main toolbar (or press
  547. F7) to check the syntax.</para>
  548. <para>This defines the Dataset. Next, we will examine the
  549. data.</para>
  550. </listitem>
  551. <listitem>
  552. <para>Open a new Builder Window (CTRL+N) and write the following
  553. code (remember to replace <emphasis>YourName </emphasis>with your
  554. name):</para>
  555. <programlisting>IMPORT TutorialYourName;
  556. COUNT(TutorialYourName.File_OriginalPerson);
  557. </programlisting>
  558. </listitem>
  559. <listitem>
  560. <para>Press the syntax check button on the main toolbar (or press
  561. F7) to check the syntax.</para>
  562. </listitem>
  563. <listitem>
  564. <?dbfo keep-together="always"?>
  565. <para>Make sure the selected cluster is your Thor cluster, then
  566. press the <emphasis role="bold">Submit</emphasis> button. Note
  567. that your target cluster might have a different name.</para>
  568. <para><figure>
  569. <title>Target Thor</title>
  570. <mediaobject>
  571. <imageobject>
  572. <imagedata fileref="images/DTimg11.jpg" />
  573. </imageobject>
  574. </mediaobject>
  575. </figure></para>
  576. </listitem>
  577. <listitem>
  578. <para>When the Workunit completes, it displays a green checkmark
  579. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  580. </listitem>
  581. <listitem>
  582. <para>Select the Workunit tab (the one with the number next to the
  583. checkmark) and select the <emphasis role="bold">Result
  584. 1</emphasis> tab (it may already be selected).</para>
  585. <para><figure>
  586. <title>Result tab</title>
  587. <mediaobject>
  588. <imageobject>
  589. <imagedata fileref="images/DT173-16.png" />
  590. </imageobject>
  591. </mediaobject>
  592. </figure>This shows us that there are 841,400 records in the
  593. data file.</para>
  594. </listitem>
  595. <listitem>
  596. <para>Select the Builder tab and change COUNT to OUTPUT, as shown
  597. below:</para>
  598. <para><programlisting>IMPORT TutorialYourName;
  599. <emphasis role="bold">OUTPUT</emphasis>(TutorialYourName.File_OriginalPerson);</programlisting></para>
  600. <para>Note: The modified portion is shown in <emphasis
  601. role="bold">bold</emphasis>.</para>
  602. </listitem>
  603. <listitem>
  604. <para>Check the syntax, if no errors, press the <emphasis
  605. role="bold">Submit</emphasis> button.</para>
  606. </listitem>
  607. <listitem>
  608. <?dbfo keep-together="always"?>
  609. <para>When it completes, select the Workunit tab, then select the
  610. <emphasis role="bold">Result 1</emphasis> tab.</para>
  611. <para><figure>
  612. <title>Output Results</title>
  613. <mediaobject>
  614. <imageobject>
  615. <imagedata fileref="images/DT173-17.png" />
  616. </imageobject>
  617. </mediaobject>
  618. </figure></para>
  619. <para>Notice the names are in mixed case.</para>
  620. <para>For our purposes, it will be easier to have all the names in
  621. all uppercase. This demonstrates one of the steps in the basic
  622. process of preparing data (Extract, Transform, and Load—ETL) using
  623. ECL.</para>
  624. </listitem>
  625. <listitem>
  626. <para>Close the Builder Window.</para>
  627. </listitem>
  628. </orderedlist>
  629. </sect2>
  630. <sect2 id="Process_the_Data">
  631. <title>Process the Data</title>
  632. <para>In this section, we will write code to convert the original data
  633. so that all names are in uppercase. We will then write this new file
  634. to our Thor cluster.</para>
  635. <orderedlist>
  636. <listitem>
  637. <para>Rt-Click on the <emphasis
  638. role="bold">Tutorial</emphasis><emphasis role="bold">YourName
  639. </emphasis>Folder, and select Insert File from the pop-up
  640. menu.</para>
  641. </listitem>
  642. <listitem>
  643. <para>Name this one <emphasis
  644. role="bold">BWR_ProcessRawData</emphasis> and write the following
  645. code (changing YN and YourName as before):</para>
  646. <para><programlisting>IMPORT TutorialYourName, Std;
  647. TutorialYourName.Layout_People toUpperPlease(TutorialYourName.Layout_People pInput)
  648. := TRANSFORM
  649. SELF.FirstName := Std.Str.ToUpperCase(pInput.FirstName);
  650. SELF.LastName := Std.Str.ToUpperCase(pInput.LastName);
  651. SELF.MiddleName := Std.Str.ToUpperCase(pInput.MiddleName);
  652. SELF.Zip := pInput.Zip;
  653. SELF.Street := pInput.Street;
  654. SELF.City := pInput.City;
  655. SELF.State := pInput.State;
  656. END ;
  657. OrigDataset := TutorialYourName.File_OriginalPerson;
  658. UpperedDataset := PROJECT(OrigDataset,toUpperPlease(LEFT));
  659. OUTPUT(UpperedDataset,,'~tutorial::YN::TutorialPerson',OVERWRITE);
  660. </programlisting></para>
  661. </listitem>
  662. <listitem>
  663. <para>Check the syntax, if no errors press the <emphasis
  664. role="bold">Submit</emphasis> button.</para>
  665. </listitem>
  666. <listitem>
  667. <para>When it completes, select the Workunit tab, then select the
  668. Result 1 tab.</para>
  669. <para><figure>
  670. <title>Process Result</title>
  671. <mediaobject>
  672. <imageobject>
  673. <imagedata fileref="images/DT173-18.jpg" />
  674. </imageobject>
  675. </mediaobject>
  676. </figure></para>
  677. <para>The results show that the process has successfully converted
  678. the name fields to uppercase.</para>
  679. </listitem>
  680. <listitem>
  681. <para>After you examine the results, close the Builder
  682. window.</para>
  683. </listitem>
  684. </orderedlist>
  685. </sect2>
  686. <sect2 id="Using_our_Data">
  687. <title>Using our New Data</title>
  688. <para></para>
  689. <para>Now that we have our data in a useful format and the file is in
  690. place, we can write more code to use the new data file. We will
  691. determine the indexes we will need and create them. For this tutorial,
  692. let’s assume the field we need to index is the Zip code field.</para>
  693. <para></para>
  694. <para>In the DATASET definition, we will add a virtual field to the
  695. RECORD structure for the fileposition. This is required for
  696. indexes.</para>
  697. <para></para>
  698. <orderedlist>
  699. <listitem>
  700. <para>Insert a File into the <emphasis
  701. role="bold">Tutorial</emphasis><emphasis
  702. role="bold">YourName</emphasis><emphasis role="bold">
  703. </emphasis>Folder. Name it <emphasis role="bold">
  704. File_TutorialPerson </emphasis>and write this code (changing
  705. <emphasis>YN </emphasis>to your initials):</para>
  706. <para></para>
  707. <para><programlisting>IMPORT TutorialYourName;
  708. EXPORT File_TutorialPerson :=
  709. DATASET('~tutorial::YN::TutorialPerson',
  710. {TutorialYourName.Layout_People,
  711. UNSIGNED8 fpos {virtual(fileposition)}},THOR);
  712. </programlisting></para>
  713. </listitem>
  714. <listitem>
  715. <para>Check the syntax, if no errors press the <emphasis
  716. role="bold">Submit</emphasis> button.</para>
  717. </listitem>
  718. <listitem>
  719. <para>When it completes, it displays a green checkmark
  720. <inlinegraphic fileref="images/DT173-15.jpg" />.</para>
  721. </listitem>
  722. </orderedlist>
  723. </sect2>
  724. <sect2 id="Index_the_Data">
  725. <title>Index the Data</title>
  726. <para>Next, we will define the INDEX.</para>
  727. <orderedlist>
  728. <listitem>
  729. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  730. role="bold">IDX_PeopleByZip</emphasis><emphasis role="bold">
  731. </emphasis>and write this code (changing <emphasis>YN</emphasis>
  732. and <emphasis>YourName</emphasis> as before):</para>
  733. <para><programlisting>IMPORT TutorialYourName;
  734. EXPORT IDX_PeopleByZIP :=
  735. INDEX(TutorialYourName.File_TutorialPerson,{zip,fpos},'~tutorial::YN::PeopleByZipINDEX');
  736. </programlisting></para>
  737. </listitem>
  738. <listitem>
  739. <para>Check the syntax.</para>
  740. <para>Next, we will build the index file.</para>
  741. </listitem>
  742. <listitem>
  743. <para>Insert a File into the <emphasis
  744. role="bold">Tutorial</emphasis><emphasis
  745. role="bold">YourName</emphasis><emphasis role="bold">
  746. </emphasis>Folder and name it <emphasis
  747. role="bold">BWR_BuildPeopleByZip </emphasis>and write this code
  748. (replacing <emphasis>YourName</emphasis> with your name):</para>
  749. <para><programlisting>IMPORT TutorialYourName;
  750. BUILDINDEX(TutorialYourName.IDX_PeopleByZIP,OVERWRITE);
  751. </programlisting></para>
  752. </listitem>
  753. <listitem>
  754. <para>Check the syntax and if there are no errors, press the
  755. <emphasis role="bold">Submit</emphasis> button.</para>
  756. </listitem>
  757. <listitem>
  758. <para>Wait for the Workunit to complete, then close the Builder
  759. Window.</para>
  760. </listitem>
  761. </orderedlist>
  762. </sect2>
  763. <sect2 id="Query_the_Data">
  764. <title>Build a Query</title>
  765. <para>Now that we have an index file, we will write a query that uses
  766. it.</para>
  767. <orderedlist>
  768. <listitem>
  769. <para>Insert a File into your Tutorial Folder. Name it <emphasis
  770. role="bold">BWR_FetchPeopleByZip </emphasis>and write this code
  771. (changing <emphasis>YourName</emphasis> as before):</para>
  772. <para><programlisting>IMPORT TutorialYourName;
  773. ZipFilter :='33024';
  774. FetchPeopleByZip :=
  775. FETCH(TutorialYourName.File_TutorialPerson,
  776. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  777. RIGHT.fpos);
  778. OUTPUT(FetchPeopleByZip);
  779. </programlisting></para>
  780. </listitem>
  781. <listitem>
  782. <para>Check the syntax and if there are no errors, press the
  783. <emphasis role="bold">Submit</emphasis> button.</para>
  784. </listitem>
  785. <listitem>
  786. <para>When it completes, select the Workunit<emphasis role="bold">
  787. </emphasis>tab, then select the <emphasis
  788. role="bold">Result</emphasis> tab.</para>
  789. </listitem>
  790. <listitem>
  791. <para>Examine the result, then close the Builder window.</para>
  792. <para><emphasis role="bold">Note</emphasis>: You can change the
  793. value of the <emphasis role="bold">ZipValue</emphasis> field to
  794. get results from different Zip codes.</para>
  795. </listitem>
  796. </orderedlist>
  797. </sect2>
  798. </sect1>
  799. <sect1 id="Publishing_your_Query">
  800. <title>Publishing your Query</title>
  801. <para>Now that we have created an indexed query, the next step is to
  802. enable access to it through a Web interface.</para>
  803. <para>Our STORED variables provide a means to pass values as query
  804. parameters. In this example, the user can supply the ZIP code so the
  805. results are people from that ZIP code.</para>
  806. <orderedlist>
  807. <listitem>
  808. <para>Insert a File into the <emphasis role="bold">TutorialYourName
  809. </emphasis>Folder and name it <emphasis
  810. role="bold">FetchPeopleByZipService</emphasis></para>
  811. </listitem>
  812. <listitem>
  813. <para>Write this code (changing <emphasis>YourName</emphasis> as
  814. before):</para>
  815. <para><programlisting>IMPORT TutorialYourName;
  816. STRING10 ZipFilter := '' :STORED('ZIPValue');
  817. resultSet :=
  818. FETCH(TutorialYourName.File_TutorialPerson,
  819. TutorialYourName.IDX_PeopleByZIP(zip=ZipFilter),
  820. RIGHT.fpos);
  821. OUTPUT(resultset);
  822. </programlisting></para>
  823. </listitem>
  824. <listitem>
  825. <para>Check the syntax, and save the file.</para>
  826. </listitem>
  827. <listitem>
  828. <para>Press the <emphasis role="bold">Submit</emphasis><emphasis
  829. role="bold"> </emphasis>button.</para>
  830. </listitem>
  831. <listitem>
  832. <para>When the workunit completes, select the Workunit<emphasis
  833. role="bold"> </emphasis>tab, then select the ECL Watch tab.</para>
  834. </listitem>
  835. <listitem>
  836. <?dbfo keep-together="always"?>
  837. <para>Press the <emphasis role="bold">Publish</emphasis> button, you
  838. may need to scroll down the main window.</para>
  839. <para><figure>
  840. <title>Publish Workunit</title>
  841. <mediaobject>
  842. <imageobject>
  843. <imagedata fileref="images/DTimg12.jpg" />
  844. </imageobject>
  845. </mediaobject>
  846. </figure></para>
  847. </listitem>
  848. <listitem>
  849. <para>When the workunit is published, a notice dialog
  850. displays.</para>
  851. <para><figure>
  852. <title>Workunit Published</title>
  853. <mediaobject>
  854. <imageobject>
  855. <imagedata fileref="images/DT173-18b.png" />
  856. </imageobject>
  857. </mediaobject>
  858. </figure></para>
  859. </listitem>
  860. </orderedlist>
  861. <sect2 id="Execute-using-the-Data-Delivery-Engine">
  862. <title>Execute using WsECL</title>
  863. <para>Now that the query is published, we can run it using the WsECL
  864. Web service. WsECL provides a Web-based interface to your published
  865. query. It also automatically creates an entry form to execute the
  866. query.</para>
  867. <para>Using the following URL:</para>
  868. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  869. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  870. Default port is 8002)</emphasis></para>
  871. <para></para>
  872. <para><figure>
  873. <title>WsECL</title>
  874. <mediaobject>
  875. <imageobject>
  876. <imagedata fileref="images/DTimg13.jpg" />
  877. </imageobject>
  878. </mediaobject>
  879. </figure></para>
  880. <para></para>
  881. <orderedlist>
  882. <listitem>
  883. <para>Click on the + sign next to <emphasis
  884. role="bold">thor</emphasis> to expand the tree.</para>
  885. </listitem>
  886. <listitem>
  887. <?dbfo keep-together="always"?>
  888. <para>Click on the <emphasis
  889. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  890. <para>The form for the service displays.</para>
  891. <para><figure>
  892. <title>Service Form</title>
  893. <mediaobject>
  894. <imageobject>
  895. <imagedata fileref="images/DTimg14a.jpg" />
  896. </imageobject>
  897. </mediaobject>
  898. </figure></para>
  899. </listitem>
  900. <listitem>
  901. <para>Provide a zip code (e.g., 33024) in the <emphasis
  902. role="bold">zipvalue</emphasis> field, , select <emphasis
  903. role="bold">Output Tables</emphasis> from the drop list, then
  904. press the <emphasis role="bold">Submit</emphasis> button.</para>
  905. <para>The results display.</para>
  906. <para><figure>
  907. <title>Results</title>
  908. <mediaobject>
  909. <imageobject>
  910. <imagedata fileref="images/DTimg15a.jpg" />
  911. </imageobject>
  912. </mediaobject>
  913. </figure></para>
  914. </listitem>
  915. </orderedlist>
  916. </sect2>
  917. </sect1>
  918. <sect1 id="Deploy_the_Roxie_Query">
  919. <title>Compile and Publish the Roxie Query</title>
  920. <para>The final step in this process is to publish the indexed query to
  921. a Rapid Data Delivery Engine (Roxie) Cluster.</para>
  922. <para>We will recompile the code with Roxie as the target cluster, then
  923. publish it to a Roxie cluster. <orderedlist>
  924. <listitem>
  925. <para>In the ECL IDE, select the Builder tab on the
  926. FetchPeopleByZipService file builder window,</para>
  927. </listitem>
  928. <listitem>
  929. <para>Using the <emphasis role="bold">Target</emphasis> drop list,
  930. select Roxie as the Target cluster.</para>
  931. <para><figure>
  932. <title>Target Roxie</title>
  933. <mediaobject>
  934. <imageobject>
  935. <imagedata fileref="images/DTimg16.jpg" />
  936. </imageobject>
  937. </mediaobject>
  938. </figure></para>
  939. </listitem>
  940. <listitem>
  941. <para>In the Builder window, in the upper left corner the
  942. <emphasis role="bold">Submit</emphasis> button has a drop down
  943. arrow next to it. Select the arrow to expose the <emphasis
  944. role="bold">Compile</emphasis> option.</para>
  945. <figure>
  946. <title>Compile</title>
  947. <mediaobject>
  948. <imageobject>
  949. <imagedata fileref="images/DTimg17.jpg" />
  950. </imageobject>
  951. </mediaobject>
  952. </figure>
  953. </listitem>
  954. <listitem>
  955. <para>Select <emphasis role="bold">Compile</emphasis></para>
  956. </listitem>
  957. <listitem>
  958. <?dbfo keep-together="always"?>
  959. <para>When the workunit finishes, it will display a green circle
  960. indicating it has compiled.</para>
  961. <para><figure>
  962. <title>Compiled</title>
  963. <mediaobject>
  964. <imageobject>
  965. <imagedata fileref="images/DTimg18.jpg" />
  966. </imageobject>
  967. </mediaobject>
  968. </figure></para>
  969. </listitem>
  970. </orderedlist></para>
  971. <sect2 id="Deploy_the_Query_to_Roxie">
  972. <title>Publish the Roxie query</title>
  973. <para>Next we will publish the query to a Roxie Cluster.</para>
  974. <orderedlist>
  975. <listitem>
  976. <para>Select the workunit tab for the FetchPeopleByZipService that
  977. you just compiled.</para>
  978. </listitem>
  979. <listitem>
  980. <para>Select the ECL Watch tab.</para>
  981. </listitem>
  982. <listitem>
  983. <?dbfo keep-together="always"?>
  984. <para>Press the <emphasis role="bold">Publish</emphasis> button
  985. (you may need to scroll down the main window)</para>
  986. <para><figure>
  987. <title>Publish Query</title>
  988. <mediaobject>
  989. <imageobject>
  990. <imagedata fileref="images/DTimg19.jpg" />
  991. </imageobject>
  992. </mediaobject>
  993. </figure>When it successfully publishes, you will see:</para>
  994. <para><figure>
  995. <title>Workunit Published</title>
  996. <mediaobject>
  997. <imageobject>
  998. <imagedata fileref="images/DT173-18b.png" />
  999. </imageobject>
  1000. </mediaobject>
  1001. </figure></para>
  1002. </listitem>
  1003. </orderedlist>
  1004. </sect2>
  1005. <sect2 id="Run_the_Roxie_Query" role="brk">
  1006. <title>Run the Roxie Query in WsECL</title>
  1007. <para>Now that the query is deployed to a Roxie cluster, we can run it
  1008. using the WS-ECL service Using the following URL:</para>
  1009. <para><emphasis role="bold">http://nnn.nnn.nnn.nnn:pppp (where
  1010. nnn.nnn.nnn.nnn is your ESP Server’s IP address and pppp is the port.
  1011. The default port is 8002)</emphasis></para>
  1012. <orderedlist>
  1013. <listitem>
  1014. <para>Click on the + sign next to <emphasis
  1015. role="bold">myroxie</emphasis> to expand the tree.</para>
  1016. </listitem>
  1017. <listitem>
  1018. <?dbfo keep-together="always"?>
  1019. <para>Click on the <emphasis
  1020. role="bold">fetchpeoplebyzipservice.1</emphasis> hyperlink.</para>
  1021. <para>The form for the service displays.</para>
  1022. <para><figure>
  1023. <title>RoxieECL</title>
  1024. <mediaobject>
  1025. <imageobject>
  1026. <imagedata fileref="images/DTimg21.jpg" />
  1027. </imageobject>
  1028. </mediaobject>
  1029. </figure></para>
  1030. </listitem>
  1031. <listitem>
  1032. <?dbfo keep-together="always"?>
  1033. <para>Provide a zip code (e.g., 33024), select <emphasis
  1034. role="bold">Output Tables</emphasis> from the drop list, and press
  1035. the Submit button.</para>
  1036. <para>The results display.</para>
  1037. <para><figure>
  1038. <title>RoxieResults</title>
  1039. <mediaobject>
  1040. <imageobject>
  1041. <imagedata fileref="images/DTimg22.jpg" />
  1042. </imageobject>
  1043. </mediaobject>
  1044. </figure></para>
  1045. </listitem>
  1046. </orderedlist>
  1047. </sect2>
  1048. </sect1>
  1049. </chapter>
  1050. <chapter id="Summary">
  1051. <title>Summary</title>
  1052. <para>Now that you have successfully processed raw data, sprayed it onto a
  1053. cluster, and deployed it to a RDDE cluster, what’s next?</para>
  1054. <!-- -->
  1055. <para>Here is a short list of suggestions on the path you might take from
  1056. here:</para>
  1057. <itemizedlist mark="bullet">
  1058. <listitem>
  1059. <para>Create indexes on other fields and create queries using
  1060. them.</para>
  1061. </listitem>
  1062. </itemizedlist>
  1063. <itemizedlist mark="bullet">
  1064. <listitem>
  1065. <para>Write client applications to access your queries using JSON or
  1066. SOAP interfaces.</para>
  1067. </listitem>
  1068. </itemizedlist>
  1069. <itemizedlist mark="bullet">
  1070. <listitem>
  1071. <para>Looks at the resources available on the Links tab</para>
  1072. <para><figure>
  1073. <title>Links</title>
  1074. <mediaobject>
  1075. <imageobject>
  1076. <imagedata fileref="images/DTimg24.jpg" />
  1077. </imageobject>
  1078. </mediaobject>
  1079. </figure>The Links tab provides easy access to a form, a Sample
  1080. Request, a Sample Response, the WSDL, the XML Schema (XSD) and
  1081. more...</para>
  1082. </listitem>
  1083. </itemizedlist>
  1084. <itemizedlist mark="bullet">
  1085. <listitem>
  1086. <para>Follow the procedures in this tutorial using your own
  1087. data!</para>
  1088. </listitem>
  1089. </itemizedlist>
  1090. </chapter>
  1091. </book>