RONCC
/
Big-Data-HPC-Platform


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318
							<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
<sect1 id="SOAPCALL_from_Thor_to_Roxie">
  <title><emphasis role="bold">SOAPCALL from Thor to Roxie</emphasis></title>

  <para>Once you have your SOAP-enabled queries tested and deployed to Roxie,
  you need to be able to use them. Many Roxie queries can be launched through
  some specially-designed user interface that allow end-users to enter search
  criteria and get results, one at a time. However, sometimes you need to
  retrieve data in a batch mode, where the same query is run once against each
  record from a dataset. That makes Thor a candidate to be the requesting
  platform, by using SOAPCALL.</para>

  <sect2 id="One_Record_Input_Record_Set_Return">
    <title>One Record Input, Record Set Return</title>

    <para>This example code (contained in Soapcall1.ECL) calls the service
    previously deployed in the <emphasis role="bold">Roxie Overview
    </emphasis>article (you will need to change the IP attribute in this code
    to the appropriate IP and port for the Roxie to which it has been
    deployed):</para>

    <programlisting>IMPORT $;

OutRec1 := $.DeclareData.Layout_Person;
RoxieIP := 'http://127.0.0.1:8002/WsEcl/soap/query/roxie/roxieoverview1.1';
svc     := 'RoxieOverview1.1';
     
InputRec := RECORD
  STRING30 LastName  := 'KLYDE';
  STRING30 FirstName := '';
END;
//1 rec in, recordset out
ManyRec1 := SOAPCALL(RoxieIP,
                     svc,
                     InputRec,
                     DATASET(OutRec1));
OUTPUT(ManyRec1);</programlisting>

    <para>This example shows how you would make a SOAPCALL to the service
    passing it a single set of parameters to retrieve only those records that
    relate to the set of passed parameters. The service receives a single set
    of input data and returns only those records that meet that criteria. The
    expected result from this query is a returned set of the 1000 records
    whose LastName field contains “KLYDE.”</para>
  </sect2>

  <sect2 id="Record_Set_Input_Record_Set_Return">
    <title>Record Set Input, Record Set Return</title>

    <para>This next example code (contained in Soapcall2.ECL) also calls the
    same service as the previous example (remember, you will need to change
    the IP attribute in this code to the appropriate IP and port for the Roxie
    to which it has been deployed):</para>

    <programlisting>IMPORT $;

OutRec1 := $.DeclareData.Layout_Person;
RoxieIP := 'http://127.0.0.1:8002/WsEcl/soap/query/roxie/roxieoverview1.1';
svc     := 'RoxieOverview1.1';
//recordset in, recordset out
InRec := RECORD
  STRING30 LastName {XPATH('LastName')};
  STRING30 FirstName{XPATH('FirstName')};
END;

InputDataset := DATASET([{'TRAYLOR','CISSY'},
                         {'KLYDE','CLYDE'},
                         {'SMITH','DAR'},
                         {'BOWEN','PERCIVAL'},
                         {'ROMNEY','GEORGE'}],Inrec);
     
ManyRec2 := SOAPCALL(InputDataset,
      RoxieIP,
      svc,
      Inrec,
      TRANSFORM(LEFT),
      DATASET(OutRec1),
      ONFAIL(SKIP));
OUTPUT(ManyRec2);</programlisting>

    <para>This example passes a dataset containing multiple sets of parameters
    on which the service will operate, returning a single recordset of all
    records returned by each set of parameters. In this form, the TRANSFORM
    function allows the SOAPCALL to operate like a PROJECT to produce the
    input records that provide the input parameters for the service.</para>

    <para>The service operates on each record in the input dataset in turn,
    combining the results from each into a single return result set. The
    ONFAIL option indicates that if there is any type of error, then the
    record should simply by skipped. The expected result from this query is a
    returned set of three records for the only three records that match the
    input criteria (CISSY TRAYLOR, CLYDE KLYDE, and PERCIVAL BOWEN).</para>
  </sect2>

  <sect2 id="Performance_Considerations_PARALLEL">
    <title>Performance Considerations: PARALLEL</title>

    <para>The form of the first example takes a single row as its input. When
    a single URL is specified, SOAPCALL sends the request to that one URL and
    waits for a response. If multiple URLs are specified, SOAPCALL sends a
    request to the first URL in the list, waits for a response, sends a
    request to the second URL, and on down the list. The PARALLEL option
    controls concurrency, so if PARALLEL(<emphasis>n</emphasis>) is specified,
    requests are sent concurrently from each Thor node, with up to
    <emphasis>n</emphasis> requests in flight at once from each node.</para>

    <para>The form of the second example takes a dataset as its input. When a
    single URL specified, the default behaviour is to send two requests with
    the first and second rows concurrently, wait for a response, send the
    third rows, and so on down the dataset, with up to two requests in flight
    at once. If PARALLEL(<emphasis>n</emphasis>) is specified, it sends
    <emphasis>n</emphasis> requests with the first <emphasis>n</emphasis> rows
    concurrently from each Thor node, and so on, with up to
    <emphasis>n</emphasis> requests in flight at once from each node.</para>

    <para>In an ideal world you would specify a PARALLEL value that multiplies
    out to at least the number of Roxie URLs, so that every available host can
    work simultaneously. Also, if you're using a dataset as input, you might
    want to try a value several times the number of URLs. However, this could
    cause network contention (timeouts and dropped connections) if set too
    high.</para>

    <para>You should add the PARALLEL option to the code from both previous
    examples to see what effect differing values may have in your
    environment.</para>
  </sect2>

  <sect2 id="Performance_Considerations_MERGE">
    <title>Performance Considerations: MERGE</title>

    <para>The MERGE option controls the number of rows per request for the
    form that takes a dataset (MERGE does not apply to the forms of SOAPCALL
    that take a single row as input). If MERGE(<emphasis>m</emphasis>) is
    specified, each request contains up to <emphasis>m</emphasis> rows, rather
    than a single row.</para>

    <para>If the concurrency (PARALLEL option setting) is less than or equal
    to the number of URLs then each URL will normally only see one request at
    a time (assuming all hosts operate at about the same speed). In that case,
    you might choose a value of MERGE as high as the host and the network can
    take: too high a value and a massive request might kill or swamp the
    service, but too low a value needlessly increases overhead by sending many
    small requests in place of fewer larger ones. If the concurrency is
    greater than the number of URLs then each URL will see several requests at
    a time and these considerations still apply.</para>

    <para>Assuming that the host processes a single request serially, there is
    one additional consideration. You should ensure that the MERGE value is
    smaller than the number of rows in the dataset so as to ensure that you
    are making use of the parallelization on the hosts. If the value of MERGE
    is greater than or equal to the number of input rows, then you send the
    entire input dataset in one request and the host processes the rows
    serially.</para>

    <para>You should add the MERGE option to the code from the second example
    to see what effect differing values may have in your environment.</para>
  </sect2>

  <sect2 id="A_Real_World_Example">
    <title>A Real World Example</title>

    <para>A customer asked for help with a problem—how to compare two strings
    and determine if the first contains every word in the second, in any
    order, when there are an indeterminate number of words in each string.
    This is a fairly straight-forward problem in ECL. Using JOIN and ROLLUP
    would be one approach, or nested child dataset queries (not supported in
    Thor at the time of the request for help, though they may be by the time
    you read this). All the following code is contained in the Soapcall3.ECL
    file.</para>

    <para>The first need was to create a function that would extract all the
    discrete words from a string. This is the kind of job that the PARSE
    function excels at, so that's exactly what this code does:</para>

    <programlisting>ParseWords(STRING LineIn) := FUNCTION
  PATTERN Ltrs := PATTERN('[A-Za-z]');
  PATTERN Char := Ltrs | '-' | '\'';
  TOKEN   Word := Char+;
          ds   := DATASET([{LineIn}],{STRING line});
  RETURN PARSE(ds,line,Word,{STRING Pword := MATCHTEXT(Word)});
END;</programlisting>

    <para>This FUNCTION (contained in Soapcall3.ECL) receives an input string
    and produces a record set result of all the words contained in that
    string. It defines a PATTERN attribute (Char) of allowable characters in a
    word as the set of all upper and lower case letters (defined by the Ltrs
    PATTERN), the hyphen, and the apostrophe. Any other character than these
    will be ignored.</para>

    <para>Next, it defines a Word as one or more allowable Char pattern
    characters. This pattern is defined as a TOKEN so that only the full word
    match is returned and not all the possible alternative matches (i.e.
    returning just SOAP, instead of SOAP, SOA, SO, and S—all the possible
    alternative matches that a PATTERN would generate).</para>

    <para>The one record in-line DATASET attribute (ds) creates the input
    “file” for the PARSE function to work on, producing the result record set
    of all the discrete words from the input string.</para>

    <para>Next, we need a Roxie query to compare the two strings (also
    contained in Soapcall3.ECL):</para>

    <programlisting>EXPORT Soapcall3() := FUNCTION
  STRING UID     := '' : STORED('UIDstr');
  STRING LeftIn  := '' : STORED('LeftInStr');
  STRING RightIn := '' : STORED('RightInStr');
  BOOLEAN TokenMatch := FUNCTION
    P1 := ParseWords(LeftIn);
    P2 := ParseWords(RightIn);
    SetSrch := SET(P1,Pword);
    ProjRes := PROJECT(P2,
                       TRANSFORM({BOOLEAN Fnd},
                                 SELF.Fnd := LEFT.Pword IN SetSrch));
    AllRes  := DEDUP(SORT(ProjRes,Fnd));
    RETURN COUNT(AllRes) = 1 AND AllRes[1].Fnd = TRUE;
  END;
  RETURN OUTPUT(DATASET([{UID,TokenMatch}],{STRING UID,BOOLEAN res}));
END;</programlisting>

    <para>There are three pieces of data this query expects to receive: a
    string containing an identifier for the comparison (for context purposes
    in the result), and the two strings whose words to compare.</para>

    <para>The FUNCTION passes the input strings to the ParseWords function to
    create two recordsets of words from those strings. The SET function then
    re-defines the first recordset as a SET so the the IN operator may be
    used.</para>

    <para>The PROJECT operation does all the real work. It passes each word in
    turn from the second input string to its inline TRANSFORM function, which
    produces a Boolean result for that word—TRUE or FALSE, is it present in
    the set of words from the first input string or not?</para>

    <para>To determine if all the words in the second string were contained in
    the first, the SORT/DEDUP sorts all the resulting Boolean values then
    removes all the duplicate entries. There will only be one or two records
    left: either a TRUE and a FALSE, or a single TRUE or FALSE record.</para>

    <para>The RETURN expression detects which of the three scenarios has
    occurred. Two records left indicates some, but not all, of the words were
    present. One record indicates either all or none of the words were
    present, and if the value of that record is TRUE, then all words were
    present and the FUNCTION returns TRUE. All other cases return
    FALSE.</para>

    <para>The OUTPUT uses a one-record inline DATASET to format the result.
    The identifier that was passed in is passed back along with the Boolean
    result of the compare. The identifier becomes important when the query is
    called multiple times in Roxie to process through a dataset of strings to
    compare in a batch mode because the results may not be returned in the
    same order as the input records. If it were only ever used interactively,
    this identifier would not be necessary.</para>

    <para>Once you've saved the query to the Repository, you can test it with
    hThor and/or deploy it to Roxie (hThor will work for testing, but Roxie is
    much faster for production). Either way, you can use SOAPCALL to access it
    like this (the only difference would be the IP and port you target for the
    query (contained in Soapcall4.ECL)):</para>

    <programlisting>RoxieIP := 'http://127.0.0.1:8002/WsEcl/soap/query/roxie/soapcall3.1'; //Roxie
svc     := 'soapcall3.1';

InRec := RECORD
  STRING UIDstr{XPATH('UIDstr')}; 
  STRING LeftInStr{XPATH('LeftInStr')};
  STRING RightInStr{XPATH('RightInStr')};
END;
InDS := DATASET([
   {'1','the quick brown fox jumped over the lazy red dog','quick fox red dog'},
   {'2','the quick brown fox jumped over the lazy red dog','quick fox black dog'},
   {'3','george of the jungle lives here','fox black dog'},
   {'4','fred and wilma flintstone','fred flintstone'},
   {'5','yomama comeonah','brake chill'} ],InRec);

RS := SOAPCALL(InDS,
               RoxieIP,
               svc,
               InRec,
               TRANSFORM(LEFT),
               DATASET({STRING UIDval{XPATH('uid')},BOOLEAN CompareResult{XPATH('res')}}));

OUTPUT(RS);
</programlisting>

    <para>Of course, <emphasis role="bold">you must first change the IP and
    port in this code to the correct values for your environment</emphasis>.
    You can find the proper IP and port to use by looking at the System
    Servers page of your ECL Watch. To target Doxie (aka ECL Agent or hthor),
    use the IP of your Thor's ESP Server and the port for its wsecl service.
    To target Roxie, use the IP of your Roxie's ESP Server and the port for
    its wsecl service. It's possible that both ESP servers could be on the
    same box. If so, then the difference will only be in the port assignment
    for each.</para>

    <para>The key to this SOAPCALL query is the InRec RECORD structure with
    its XPATH definitions. These must exactly match the part names and the
    STORED names of the query's parameter receiving attributes (NB that these
    are case sensitive, since XPATH is XML and XML is always case sensitive).
    This is what maps the input data fields through the SOAP interface to the
    query's attributes.</para>

    <para>This SOAPCALL receives a recordset as input and produces a recordset
    as its result, making it very similar to the second example above. One
    small change from that previous example of this type is the use of the
    shorthand TRANSFORM instead of an inline TRANSFORM function. Also, note
    that the XPATH for the first field in the DATASET parameter's inline
    RECORD structure contains lower case “uid” while it is obviously
    referencing the query's OUTPUT field named “UID”—the XML returned from the
    SOAP service uses lower case tag names for the returned data
    fields.</para>

    <para>When you run this you'll get a TRUE result for records one and four,
    and FALSE for all others.</para>
  </sect2>
</sect1>