<emphasis role="bold">Controlling Roxie Queries</emphasis> There are several ECL functions that are designed specifically to help optimize queries for execution on Roxie. These include PRELOAD, ALLNODES, THISNODE, LOCAL, and NOLOCAL. Understanding how all these functions work together can make a big difference in the performance of your Roxie queries. How Graphs Execute Writing efficient queries for Roxie or Thor can require an understanding of how the different clusters operate. This brings up three questions: How does the graph execute, on a single node, or on all nodes in parallel? How are datasets accessed by each node executing the graph, only the parts that are local to the node, or all parts on all nodes? Does an operation coordinate with the same operation on other nodes, or does each node operate independently? Here's how queries “normally” execute on each type of cluster: Thor Graphs execute on multiple slave nodes in parallel. Index/disk reads are done locally by each slave node. All other disk access (FETCH, keyed JOIN, etc.) are effectively accessed across all nodes. Coordination with operations on other nodes is controlled by the presence or absence of the LOCAL option on the operation. No support for child queries (this may change in future releases). hthor Graphs execute on the single ECL Agent node. All parts of the dataset/index are accessed by directly accessing the disk drive of the node with the data—no other interaction with the other nodes. Child queries always execute on same node as parent. Roxie Graphs execute on a single (Roxie server) node. All parts of the dataset/index are accessed by directly accessing the disk drive of the node with the data—no other interaction with the other nodes. Child queries might execute on a single agent node instead of a Roxie server node. ALLNODES vs. THISNODE In Roxie, graphs execute on a single Roxie server node unless the ALLNODES() function is used. ALLNODES() causes the portion of the query it encloses to execute on all agent nodes in parallel. The results are calculated independently on each node then merged together, without ordering the records. It is generally used to do some complex remote processing which only requires local index access, substantially reducing the network traffic between the nodes. By default, everything within the ALLNODES() will be executed on all the nodes, but sometimes the ALLNODES() query requires some input or arguments that shouldn't be executed on all the nodes—for example, the previous best guess at the results, or some information controlling the parallel query. The THISNODE() function can be used to surround element that are to be evaluated by the current node instead. A typical usage would look like this: bestSearchResults := ALLNODES(doRemoteSearch(THISNODE(searchWords),THISNODE(previousResults))) Where 'searchWords' and 'previousResults' are effectively calculated on the current node, and then passed as parameters to each instance of the doRemoteSearch() executing in parallel on all nodes. LOCAL vs. NOLOCAL The LOCAL option available on many functions (like JOIN, SORT, etc.) and the LOCAL() and NOLOCAL() functions control whether the graphs running on a particular node access all parts of a file/index or only those associated with the particular node (LOCAL). Often within an ALLNODES() context you only want to access local index parts from a single node because each node is independently processing its associated parts. Specifying that an index read or a keyed JOIN is LOCAL means that only the local part is used on each node. A local read of a single part INDEX will only be evaluated on the first agent node (or the farmer node if not within an ALLNODES) Local evaluation can be specified in two ways:
1) As a dataset operation: LOCAL(MyIndex)(myField = searchField) 2) As an option on the operation: JOIN(... ,LOCAL) FETCH(... ,LOCAL)
The LOCAL(dataset) function causes every operation on the dataset to access the file/key locally. For example, LOCAL(JOIN(index1, index2,...)) will read index1 and index2 locally. This rule is recursively applied until you reach one of the following:
Use of the NOLOCAL() function A non-local attribute—the operation stays non-local, but children are still marked as local as necessary A GLOBAL() or THISNODE() or workflow operation—since they will be evaluated in a different context Use of the ALLNODES() function (as in a nested child query)
Note that: JOIN(x, LOCAL(index1)...) is treated the same as JOIN(x, index1, ..., local). LOCAL is also supported as an option on an INDEX, but the LOCAL() function is preferred, because it generally depends on the context an index is used in whether or not access to it should be local or not. A non-local attribute is supported everywhere that a LOCAL attribute is allowed - to override an enclosing LOCAL() function. The use of LOCAL to indicate that dataset/key access is local does not conflict with its use to control coordination of an operation with other nodes, because there is no operation that potentially co-ordinates with other nodes and also accesses indexes or datasets.
NOROOT Indexes The ALLNODES() function is particularly useful if there is more than one index co-distributed on a particular value so that all information that relates to a particular key field value is associated with the same node. However generally indexes are globally sorted. Adding a NOROOT option to a BUILD action or INDEX declaration indicates that the index is not globally sorted, and there is no root index to indicate which part of the index will contain a particular entry.