PrG_Roxie_Overview2.xml 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. <?xml version="1.0"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  3. <sect1 id="Roxie_Overview">
  4. <title><emphasis role="bold">Roxie Overview</emphasis></title>
  5. <para><emphasis>Let’s start with some definitions:</emphasis></para>
  6. <para><informaltable colsep="0" frame="none" rowsep="0">
  7. <tgroup cols="2">
  8. <colspec align="left" colwidth="87.45pt"/>
  9. <colspec/>
  10. <tbody>
  11. <row>
  12. <entry>Data Refinery</entry>
  13. <entry>A supercomputer cluster specifically designed to perform
  14. massive data manipulation (ETL) processes. This is a back-office
  15. data preparation tool and not meant for end-user
  16. production-level queries. See the Data Refinery Reference for
  17. complete documentation.</entry>
  18. </row>
  19. <row>
  20. <entry>Rapid Data Delivery Engine</entry>
  21. <entry>A supercomputer cluster specifically designed to service
  22. standard queries, providing a throughput rate of a thousand-plus
  23. respones per second (actual response rate for any given query
  24. is, of course, dependent on its complexity). This is a
  25. production-level tool designed for mission-critical application.
  26. See the Rapid Data Delivery Engine Reference for complete
  27. documentation.</entry>
  28. </row>
  29. <row>
  30. <entry>Data Delivery Engine</entry>
  31. <entry>An R&amp;D platform designed for iterative, interactive
  32. development and testing of Roxie queries. This is not a separate
  33. supercomputer cluster, but a “piggyback” implementation of ECL
  34. Agent and Thor. See the Data Delivery Engine Reference for
  35. complete documentation.</entry>
  36. </row>
  37. <row>
  38. <entry>Thor</entry>
  39. <entry>The commonly used name for an instance of Seisint’s Data
  40. Refinery.</entry>
  41. </row>
  42. <row>
  43. <entry>Roxie</entry>
  44. <entry>The commonly used name for an instance of Seisint’s Rapid
  45. Data Delivery Engine.</entry>
  46. </row>
  47. <row>
  48. <entry>Doxie</entry>
  49. <entry>The commonly used name for an instance of Seisint’s Data
  50. Delivery Engine.</entry>
  51. </row>
  52. </tbody>
  53. </tgroup>
  54. </informaltable></para>
  55. <sect2 id="Thor">
  56. <title>Thor</title>
  57. <para>Thor clusters are used to do all the “heavy lifting” data
  58. preparation work to process raw data into standard formats. Once that
  59. process is complete, end-users can query that standardized data to glean
  60. real information. However, end-users typically want to see their results
  61. “immediately or sooner”—and usually more than one end-user wants their
  62. result at the same time. The Thor platform only works on one query at a
  63. time, which makes it impractical for use by end-users, and that is why
  64. the Roxie platform was created.</para>
  65. </sect2>
  66. <sect2 id="Roxie">
  67. <title>Roxie</title>
  68. <para>Roxie clusters can handle thousands of simultaneous end-users and
  69. provide them all with the perception of “immediately or sooner” results.
  70. It does this by only allowing end-users to run standard, pre-compiled
  71. queries that have been developed specifically for end-user use on the
  72. Roxie cluster. Typically, these queries use indexes and thus, provide
  73. extremely fast performance. However, the Roxie cluster is impractical
  74. for use as a development tool, since all its queries must be
  75. pre-compiled and the data they use must have been previously deployed.
  76. Therefore, the iterative query development and testing process is
  77. performed using Doxie.</para>
  78. </sect2>
  79. <sect2 id="Doxie">
  80. <title>Doxie</title>
  81. <para>Doxie is not a separate cluster on its own; it is an instance of
  82. ECL Agent (which operates on a single server) that emulates the
  83. operation of a Roxie cluster. Just as with Thor queries, Doxie queries
  84. are compiled each time they are run. Doxie queries access data directly
  85. from an associated Thor cluster’s disk drives without interfering with
  86. any Thor operations. This makes it an appropriate tool for developing
  87. queries that are destined for use on a Roxie cluster.</para>
  88. </sect2>
  89. <sect2 id="How_to_Structure_Roxie_Queries">
  90. <title>How to Structure Roxie Queries</title>
  91. <para>To begin developing queries for use on Roxie clusters you must
  92. start by deciding what data to query and how to index that data so that
  93. end-users see their result in minimum time. The process of putting the
  94. data into its most useful form and indexing it is accomplished on a Thor
  95. cluster. The previous articles on indexing and superfiles should guide
  96. you in the right direction for that.</para>
  97. <para>Once the data is ready to use, you can then write the query.
  98. Queries for Roxie clusters are always contained in MACRO structures, and
  99. those MACROs always contain at least one action—usually a simple OUTPUT
  100. to return the result set.</para>
  101. <para>Unlike MACROs used to generate ECL code for standard Thor
  102. processes, the MACROs for Roxie queries do not receive parameters.
  103. Instead, a SOAP (Simple Object Access Protocol) interface is used to
  104. “pass in” data values (the <emphasis>SOAP-enabling MACROs</emphasis>
  105. article discusses the specifics of this interface). The values passed
  106. through the SOAP interface wind up in attributes that have been defined
  107. with the STORED workflow service. Your ECL code then can use those
  108. attributes to determine the passed values and return the appropriate
  109. result to the end-user.</para>
  110. <para>Here is a simple example of the structure of a Roxie query
  111. (contained in RoxieOverview1.ECL):</para>
  112. <programlisting>/*--SOAP--
  113. &lt;message name="PeopleSearchService"&gt;
  114. &lt;part name="LastName" type="xsd:string" required="1"/&gt;
  115. &lt;part name="FirstName" type="xsd:string"/&gt;
  116. &lt;/message&gt;
  117. */
  118. EXPORT PeopleSearchService() := MACRO
  119. STRING30 lname_value := '' : STORED('LastName');
  120. STRING30 fname_value := '' : STORED('FirstName');
  121. IDX := ProgGuide.IDX__Person_LastName_FirstName;
  122. Base := ProgGuide.Person.FilePlus;
  123. Fetched := IF(fname_value = '',
  124. FETCH(Base,IDX(LastName=lname_value),RIGHT.RecPos),
  125. FETCH(Base,IDX(LastName=lname_value,
  126. FirstName=fname_value),RIGHT.RecPos));
  127. OUTPUT(CHOOSEN(Fetched,2000))
  128. ENDMACRO;</programlisting>
  129. <para>The comment block contains XML that defines the SOAP interface for
  130. the service. As a comment block, this doesn’t affect the ECL code at
  131. all, but it is required.</para>
  132. <para>Notice that the MACRO does not receive any parameters. Instead,
  133. the lname_value and fname_value attributes both have the STORED workflow
  134. service, and their storage names exactly match the part name in the XML.
  135. The SOAP interface requires that all the part names must exactly match
  136. the storage names for the STORED attributes, because the STORED option
  137. opens up a storage space in the workunit where the SOAP interface can
  138. place the values to pass to the service.</para>
  139. <para>This code uses FETCH because it is the simplest example of using
  140. an INDEX in ECL. More typically, Roxie queries use half-keyed JOIN
  141. operations with payload keys (the <emphasis>Complex Roxie
  142. Queries</emphasis> article addresses this issue). Note that the OUTPUT
  143. contains a CHOOSEN as a simple example of how to ensure you should limit
  144. the maximum amount of data that can be returned from the query to some
  145. “reasonable” amount—it doesn’t make much sense to have a Roxie query
  146. that could possibly return 10 billion records to an end-user’s PC
  147. (anybody actually needing that much data should be working in Thor, not
  148. Roxie).</para>
  149. </sect2>
  150. <sect2 id="Testing_Queries_with_Doxie">
  151. <title>Testing Queries with Doxie</title>
  152. <para>Once you have written your query you naturally need to test it.
  153. That’s where Doxie comes into play. Doxie is the interactive test system
  154. that you can use before deploying your queries to Roxie. The easiest way
  155. to describe the process is to walk through it using this simple example
  156. query.</para>
  157. <para>► In your Repository, create a module named <emphasis role="bold">Training_ProgGuide</emphasis></para>
  158. <para>► Insert an attribute named <emphasis role="bold">PeopleSearchService</emphasis></para>
  159. <para>► Copy all the code from the RoxieOverview.ECL file into that
  160. attribute, overwriting the default starting-point text</para>
  161. <para>► Save the attribute definition</para>
  162. <para>Now that you’ve saved this query, you’re ready to run it in
  163. Doxie.</para>
  164. <para>You can find the IP and port for your environment’s Doxie by
  165. opening the ECL Watch web page (not using QueryBuilder—open it in
  166. Internet Explorer).</para>
  167. <para>► Click on <emphasis role="underline">System Servers</emphasis>
  168. (it’s in the Topology section)</para>
  169. <para>► Find the <emphasis role="bold">ESP Servers</emphasis>
  170. section</para>
  171. <para>There may be several listed there, so locate the one that applies
  172. to your Thor (not the one for your Roxie—we’ll get to that one
  173. later).</para>
  174. <para>► Note the IP listed beside it (this is frequently the same IP as
  175. ECL Watch)</para>
  176. <para>► Click on the ESP server’s name link to display its list of
  177. services and their ports</para>
  178. <para>► Note the port number beside the <emphasis role="bold">wsecl</emphasis> Service Type (this is usually 8002, but it
  179. could be set to something else)</para>
  180. <para>Once you’ve obtained the IP and port for your Doxie, you can go
  181. there and run the query.</para>
  182. <para>► Edit Internet Explorer’s address bar to point to the Doxie
  183. IP:port</para>
  184. <para>► Press the Enter key</para>
  185. <para>A login dialog should appear—your login ID and password are the
  186. same as the ones you use for the QueryBuilder program. After you’ve
  187. logged in, you’ll see a list of modules on the left.</para>
  188. <para>► Click on the <emphasis role="underline">Training_ProgGuide</emphasis> link</para>
  189. <para>A list of all the SOAP-enabled attributes (aka Roxie Queries) that
  190. are in that module appears in the frame to the right. in this case,
  191. there’s only the one.</para>
  192. <para>► Click on the <emphasis role="underline">PeopleSearchService</emphasis> link</para>
  193. <para>A web page containing two entry controls and a <emphasis role="bold">Submit</emphasis> button appears.</para>
  194. <para>► Type in any last name from the set of last names that were used
  195. by the code in GenData.ECL to generate the data files for this
  196. <emphasis>Programmer’s Guide</emphasis></para>
  197. <para>COOLING is a good example to use. Note that, since this is an
  198. extremely simple example, you’ll need to type it in ALL CAPS, otherwise
  199. the FETCH will fail to find any matching records (this is only due to
  200. the simplicity of this ECL code and not any inherent lack in the
  201. system).</para>
  202. <para>► Press the <emphasis role="bold">Submit</emphasis> button</para>
  203. <para>Doxie queries are re-compiled every time they run, so after a few
  204. seconds you should see a result set with 1,000 records in it.</para>
  205. <para>The SOAP comment block at the top of the ECL code contains XML
  206. that defines the data values that can be passed to the query. That XML
  207. is processed through standard XSLT templates to format the data entry
  208. page (and the result page) for this service. These standard XSLT
  209. templates can be overridden, but they work quite well for testing and
  210. debug purposes.</para>
  211. </sect2>
  212. <sect2 id="Deploying_Queries_to_Roxie">
  213. <title>Deploying Queries to Roxie</title>
  214. <para>Once you’ve done enough testing on Doxie to be sure the query does
  215. what you expect it to do, the only step then required is to deploy it to
  216. Roxie and test it there, too (just to be completely certain that
  217. everything operates the way it should). Once you’ve tested it on Roxie,
  218. you can inform the users that the query is available for their
  219. use.</para>
  220. <para>The Roxie deployment process is fully documented in the
  221. <emphasis>Rapid Data Delivery Engine Reference</emphasis>. Interactive
  222. deployment is done through a web page similar to ECL Watch. You can find
  223. the IP and port for this page on ECL Watch’s System Servers page, listed
  224. under your Roxie’s ESP Server. The service to look for is <emphasis role="bold">ws_roxieconfig</emphasis>. Go to that IP and port, log in,
  225. and follow the deployment process outlined in the <emphasis>Rapid Data
  226. Delivery Engine Reference</emphasis>.</para>
  227. <para>Once you’ve deployed the query, you can test it the same way you
  228. tested it on Doxie, by finding the IP and port for your Roxie ESP
  229. Server’s <emphasis role="bold">wsecl</emphasis> service. Once you’ve
  230. gone to that web address (following the same process as listed above for
  231. Doxie) and logged in, you will see Roxie’s version of the same page you
  232. just used for Doxie testing. The only difference between the two are the
  233. IP and port, and, of course, the execution speed of the Roxie query as
  234. opposed to the Doxie version (Roxie’s response time should be MUCH
  235. faster).</para>
  236. </sect2>
  237. <sect2 id="Restrictions">
  238. <title>Restrictions</title>
  239. <para>Roxie queries may <emphasis role="underline">not</emphasis>
  240. contain any code that would write a file to disk, such as:</para>
  241. <para>OUTPUT actions that write to disk files BUILD (or BUILDINDEX)
  242. actions PERSISTed attributes</para>
  243. <para>A SuperFile used in Roxie may not contain more than a single
  244. sub-file (a SuperKey, however, may contain multiple payload indexes).
  245. This restriction makes using SuperFiles in Roxie just an exercise in
  246. file redirection. By writing queries that use SuperFiles (even though
  247. they contain only a single sub-file) you have the advantage of being
  248. able to update your Roxie by simply deploying new data without needing
  249. to re-compile the queries that use that data simply because the sub-file
  250. name changed. This saves compilation time, and in a production
  251. environment (which a Roxie always is) where a data file is used by many
  252. queries, this savings can be a significant amount.</para>
  253. </sect2>
  254. </sect1>