Complex Roxie Query
Techniques
The ECL coding techniques used in Roxie queries can be quite complex,
making use of multiple keys, payload keys, half-keyed JOINs, the KEYDIFF
function, and various other ECL language features. All these techniques
share a single focus, though—to maximize the performance of the query so its
result is delivered as efficiently as possible, thereby maximizing the total
transaction throughput rate possible for the Roxie that services the
query.
Key Selection Based on Input
It all starts with the architecture of your data and the keys you
build from it. Typically, a single dataset would have multiple indexes
into it so as to provide multiple access methods into the data. Therefore,
one of the key techniques used in Roxie queries is to detect which of the
set of possible values have been passed to the query, and based on those
values, choose the correct INDEX to use.
The basis for detecting which values have been passed to the query
is determined by the STORED attributes defined to receive the passed
values. The SOAP Interface automatically populates these attributes with
whatever values have been passed to the query. That means the query code
need simply interrogate those parameters for the presence of values other
than their defaults.
This example demonstrates the technique:
IMPORT $;
EXPORT PeopleSearchService() := FUNCTION
STRING30 lname_value := '' : STORED('LastName');
STRING30 fname_value := '' : STORED('FirstName');
IDX := $.IDX__Person_LastName_FirstName;
Base := $.Person.FilePlus;
Fetched := IF(fname_value = '',
FETCH(Base,IDX(LastName=lname_value),RIGHT.RecPos),
FETCH(Base,IDX(LastName=lname_value,FirstName=fname_value),RIGHT.RecPos));
RETURN OUTPUT(CHOOSEN(Fetched,2000));
END;
This query is written assuming that the LastName parameter will
always be passed, so the IF needs only to detect whether a FirstName was
also entered by the user. If so, then the filter on the index parameter to
the FETCH needs to include that value, otherwise the FETCH just needs to
filter the index with the LastName value.
There are several ways this code could have been written. Here's an
alternative:
IMPORT $;
EXPORT PeopleSearchService() := FUNCTION
STRING30 lname_value := '' : STORED('LastName');
STRING30 fname_value := '' : STORED('FirstName');
IDX := $.IDX__Person_LastName_FirstName;
Base := $.Person.FilePlus;
IndxFilter := IF(fname_value = '',
IDX.LastName=lname_value,
IDX.LastName=lname_value AND IDX.FirstName=fname_value);
Fetched := FETCH(Base,IDX(IndxFilter),RIGHT.RecPos);
RETURN OUTPUT(CHOOSEN(Fetched,2000));
END;
In this example, the IF simply builds the correct filter expression
for the FETCH to use. Using this form makes the code easier to read and
maintain by separating out the multiple possible forms of the filter logic
from the function that uses it.
Keyed Joins
Although the FETCH function was specifically designed for indexed
access to data, in practice the half-keyed JOIN operation is more commonly
used in Roxie queries. A major reason for this is the flexibility that is
possible with JOIN.
The advantages of using keyed JOIN operations in any query is fully
discussed in the Using ECL Keys (INDEX Files)
article. These advantages really benefit Roxie queries tremendously.
Because of the nature of Roxie, the best advantage from keyed JOINs comes
from the use of half-keyed JOINs that utilize payload keys (eliminating
the need for additional FETCH operations).
Limiting Output
One major consideration for developing a Roxie query is the amount
of data that may possibly be returned from the query. Since JOIN
operations can possibly result in huge datasets, care should be taken to
limit the number of records any given query may return to a number that is
“reasonable” for that specific type of query. Here are some techniques to
help accomplish that goal:
*
The CHOOSEN and LIMIT functions should be used to limit
index reads to some maximum number.
*
Keyed JOINs should use the ATMOST, KEEP, or LIMIT
option.
*
When a nested child dataset is defined, it should have a
MAXCOUNT option defined on the child DATASET field in the RECORD
structure, and the code that builds the nested child dataset
should use CHOOSEN with a value that exactly matches the
MAXCOUNT.
All of these techniques will help to ensure that, when the end-user
expects to get around ten results, that they don't end up with ten
million.