123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320 |
- <?xml version="1.0"?>
- <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
- <sect1 id="Roxie_Overview">
- <title><emphasis role="bold">Roxie Overview</emphasis></title>
- <para><emphasis>Let’s start with some definitions:</emphasis></para>
- <para><informaltable colsep="0" frame="none" rowsep="0">
- <tgroup cols="2">
- <colspec align="left" colwidth="87.45pt"/>
- <colspec/>
- <tbody>
- <row>
- <entry>Data Refinery</entry>
- <entry>A supercomputer cluster specifically designed to perform
- massive data manipulation (ETL) processes. This is a back-office
- data preparation tool and not meant for end-user
- production-level queries. See the Data Refinery Reference for
- complete documentation.</entry>
- </row>
- <row>
- <entry>Rapid Data Delivery Engine</entry>
- <entry>A supercomputer cluster specifically designed to service
- standard queries, providing a throughput rate of a thousand-plus
- respones per second (actual response rate for any given query
- is, of course, dependent on its complexity). This is a
- production-level tool designed for mission-critical application.
- See the Rapid Data Delivery Engine Reference for complete
- documentation.</entry>
- </row>
- <row>
- <entry>Data Delivery Engine</entry>
- <entry>An R&D platform designed for iterative, interactive
- development and testing of Roxie queries. This is not a separate
- supercomputer cluster, but a “piggyback” implementation of ECL
- Agent and Thor. See the Data Delivery Engine Reference for
- complete documentation.</entry>
- </row>
- <row>
- <entry>Thor</entry>
- <entry>The commonly used name for an instance of Seisint’s Data
- Refinery.</entry>
- </row>
- <row>
- <entry>Roxie</entry>
- <entry>The commonly used name for an instance of Seisint’s Rapid
- Data Delivery Engine.</entry>
- </row>
- <row>
- <entry>Doxie</entry>
- <entry>The commonly used name for an instance of Seisint’s Data
- Delivery Engine.</entry>
- </row>
- </tbody>
- </tgroup>
- </informaltable></para>
- <sect2 id="Thor">
- <title>Thor</title>
- <para>Thor clusters are used to do all the “heavy lifting” data
- preparation work to process raw data into standard formats. Once that
- process is complete, end-users can query that standardized data to glean
- real information. However, end-users typically want to see their results
- “immediately or sooner”—and usually more than one end-user wants their
- result at the same time. The Thor platform only works on one query at a
- time, which makes it impractical for use by end-users, and that is why
- the Roxie platform was created.</para>
- </sect2>
- <sect2 id="Roxie">
- <title>Roxie</title>
- <para>Roxie clusters can handle thousands of simultaneous end-users and
- provide them all with the perception of “immediately or sooner” results.
- It does this by only allowing end-users to run standard, pre-compiled
- queries that have been developed specifically for end-user use on the
- Roxie cluster. Typically, these queries use indexes and thus, provide
- extremely fast performance. However, the Roxie cluster is impractical
- for use as a development tool, since all its queries must be
- pre-compiled and the data they use must have been previously deployed.
- Therefore, the iterative query development and testing process is
- performed using Doxie.</para>
- </sect2>
- <sect2 id="Doxie">
- <title>Doxie</title>
- <para>Doxie is not a separate cluster on its own; it is an instance of
- ECL Agent (which operates on a single server) that emulates the
- operation of a Roxie cluster. Just as with Thor queries, Doxie queries
- are compiled each time they are run. Doxie queries access data directly
- from an associated Thor cluster’s disk drives without interfering with
- any Thor operations. This makes it an appropriate tool for developing
- queries that are destined for use on a Roxie cluster.</para>
- </sect2>
- <sect2 id="How_to_Structure_Roxie_Queries">
- <title>How to Structure Roxie Queries</title>
- <para>To begin developing queries for use on Roxie clusters you must
- start by deciding what data to query and how to index that data so that
- end-users see their result in minimum time. The process of putting the
- data into its most useful form and indexing it is accomplished on a Thor
- cluster. The previous articles on indexing and superfiles should guide
- you in the right direction for that.</para>
- <para>Once the data is ready to use, you can then write the query.
- Queries for Roxie clusters are always contained in MACRO structures, and
- those MACROs always contain at least one action—usually a simple OUTPUT
- to return the result set.</para>
- <para>Unlike MACROs used to generate ECL code for standard Thor
- processes, the MACROs for Roxie queries do not receive parameters.
- Instead, a SOAP (Simple Object Access Protocol) interface is used to
- “pass in” data values (the <emphasis>SOAP-enabling MACROs</emphasis>
- article discusses the specifics of this interface). The values passed
- through the SOAP interface wind up in attributes that have been defined
- with the STORED workflow service. Your ECL code then can use those
- attributes to determine the passed values and return the appropriate
- result to the end-user.</para>
- <para>Here is a simple example of the structure of a Roxie query
- (contained in RoxieOverview1.ECL):</para>
- <programlisting>/*--SOAP--
- <message name="PeopleSearchService">
- <part name="LastName" type="xsd:string" required="1"/>
- <part name="FirstName" type="xsd:string"/>
- </message>
- */
- EXPORT PeopleSearchService() := MACRO
- STRING30 lname_value := '' : STORED('LastName');
- STRING30 fname_value := '' : STORED('FirstName');
- IDX := ProgGuide.IDX__Person_LastName_FirstName;
- Base := ProgGuide.Person.FilePlus;
- Fetched := IF(fname_value = '',
- FETCH(Base,IDX(LastName=lname_value),RIGHT.RecPos),
- FETCH(Base,IDX(LastName=lname_value,
- FirstName=fname_value),RIGHT.RecPos));
- OUTPUT(CHOOSEN(Fetched,2000))
- ENDMACRO;</programlisting>
- <para>The comment block contains XML that defines the SOAP interface for
- the service. As a comment block, this doesn’t affect the ECL code at
- all, but it is required.</para>
- <para>Notice that the MACRO does not receive any parameters. Instead,
- the lname_value and fname_value attributes both have the STORED workflow
- service, and their storage names exactly match the part name in the XML.
- The SOAP interface requires that all the part names must exactly match
- the storage names for the STORED attributes, because the STORED option
- opens up a storage space in the workunit where the SOAP interface can
- place the values to pass to the service.</para>
- <para>This code uses FETCH because it is the simplest example of using
- an INDEX in ECL. More typically, Roxie queries use half-keyed JOIN
- operations with payload keys (the <emphasis>Complex Roxie
- Queries</emphasis> article addresses this issue). Note that the OUTPUT
- contains a CHOOSEN as a simple example of how to ensure you should limit
- the maximum amount of data that can be returned from the query to some
- “reasonable” amount—it doesn’t make much sense to have a Roxie query
- that could possibly return 10 billion records to an end-user’s PC
- (anybody actually needing that much data should be working in Thor, not
- Roxie).</para>
- </sect2>
- <sect2 id="Testing_Queries_with_Doxie">
- <title>Testing Queries with Doxie</title>
- <para>Once you have written your query you naturally need to test it.
- That’s where Doxie comes into play. Doxie is the interactive test system
- that you can use before deploying your queries to Roxie. The easiest way
- to describe the process is to walk through it using this simple example
- query.</para>
- <para>► In your Repository, create a module named <emphasis role="bold">Training_ProgGuide</emphasis></para>
- <para>► Insert an attribute named <emphasis role="bold">PeopleSearchService</emphasis></para>
- <para>► Copy all the code from the RoxieOverview.ECL file into that
- attribute, overwriting the default starting-point text</para>
- <para>► Save the attribute definition</para>
- <para>Now that you’ve saved this query, you’re ready to run it in
- Doxie.</para>
- <para>You can find the IP and port for your environment’s Doxie by
- opening the ECL Watch web page (not using QueryBuilder—open it in
- Internet Explorer).</para>
- <para>► Click on <emphasis role="underline">System Servers</emphasis>
- (it’s in the Topology section)</para>
- <para>► Find the <emphasis role="bold">ESP Servers</emphasis>
- section</para>
- <para>There may be several listed there, so locate the one that applies
- to your Thor (not the one for your Roxie—we’ll get to that one
- later).</para>
- <para>► Note the IP listed beside it (this is frequently the same IP as
- ECL Watch)</para>
- <para>► Click on the ESP server’s name link to display its list of
- services and their ports</para>
- <para>► Note the port number beside the <emphasis role="bold">wsecl</emphasis> Service Type (this is usually 8002, but it
- could be set to something else)</para>
- <para>Once you’ve obtained the IP and port for your Doxie, you can go
- there and run the query.</para>
- <para>► Edit Internet Explorer’s address bar to point to the Doxie
- IP:port</para>
- <para>► Press the Enter key</para>
- <para>A login dialog should appear—your login ID and password are the
- same as the ones you use for the QueryBuilder program. After you’ve
- logged in, you’ll see a list of modules on the left.</para>
- <para>► Click on the <emphasis role="underline">Training_ProgGuide</emphasis> link</para>
- <para>A list of all the SOAP-enabled attributes (aka Roxie Queries) that
- are in that module appears in the frame to the right. in this case,
- there’s only the one.</para>
- <para>► Click on the <emphasis role="underline">PeopleSearchService</emphasis> link</para>
- <para>A web page containing two entry controls and a <emphasis role="bold">Submit</emphasis> button appears.</para>
- <para>► Type in any last name from the set of last names that were used
- by the code in GenData.ECL to generate the data files for this
- <emphasis>Programmer’s Guide</emphasis></para>
- <para>COOLING is a good example to use. Note that, since this is an
- extremely simple example, you’ll need to type it in ALL CAPS, otherwise
- the FETCH will fail to find any matching records (this is only due to
- the simplicity of this ECL code and not any inherent lack in the
- system).</para>
- <para>► Press the <emphasis role="bold">Submit</emphasis> button</para>
- <para>Doxie queries are re-compiled every time they run, so after a few
- seconds you should see a result set with 1,000 records in it.</para>
- <para>The SOAP comment block at the top of the ECL code contains XML
- that defines the data values that can be passed to the query. That XML
- is processed through standard XSLT templates to format the data entry
- page (and the result page) for this service. These standard XSLT
- templates can be overridden, but they work quite well for testing and
- debug purposes.</para>
- </sect2>
- <sect2 id="Deploying_Queries_to_Roxie">
- <title>Deploying Queries to Roxie</title>
- <para>Once you’ve done enough testing on Doxie to be sure the query does
- what you expect it to do, the only step then required is to deploy it to
- Roxie and test it there, too (just to be completely certain that
- everything operates the way it should). Once you’ve tested it on Roxie,
- you can inform the users that the query is available for their
- use.</para>
- <para>The Roxie deployment process is fully documented in the
- <emphasis>Rapid Data Delivery Engine Reference</emphasis>. Interactive
- deployment is done through a web page similar to ECL Watch. You can find
- the IP and port for this page on ECL Watch’s System Servers page, listed
- under your Roxie’s ESP Server. The service to look for is <emphasis role="bold">ws_roxieconfig</emphasis>. Go to that IP and port, log in,
- and follow the deployment process outlined in the <emphasis>Rapid Data
- Delivery Engine Reference</emphasis>.</para>
- <para>Once you’ve deployed the query, you can test it the same way you
- tested it on Doxie, by finding the IP and port for your Roxie ESP
- Server’s <emphasis role="bold">wsecl</emphasis> service. Once you’ve
- gone to that web address (following the same process as listed above for
- Doxie) and logged in, you will see Roxie’s version of the same page you
- just used for Doxie testing. The only difference between the two are the
- IP and port, and, of course, the execution speed of the Roxie query as
- opposed to the Doxie version (Roxie’s response time should be MUCH
- faster).</para>
- </sect2>
- <sect2 id="Restrictions">
- <title>Restrictions</title>
- <para>Roxie queries may <emphasis role="underline">not</emphasis>
- contain any code that would write a file to disk, such as:</para>
- <para>OUTPUT actions that write to disk files BUILD (or BUILDINDEX)
- actions PERSISTed attributes</para>
- <para>A SuperFile used in Roxie may not contain more than a single
- sub-file (a SuperKey, however, may contain multiple payload indexes).
- This restriction makes using SuperFiles in Roxie just an exercise in
- file redirection. By writing queries that use SuperFiles (even though
- they contain only a single sub-file) you have the advantage of being
- able to update your Roxie by simply deploying new data without needing
- to re-compile the queries that use that data simply because the sub-file
- name changed. This saves compilation time, and in a production
- environment (which a Roxie always is) where a data file is used by many
- queries, this savings can be a significant amount.</para>
- </sect2>
- </sect1>
|