PrG_control_ROXIE_queries.xml 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="Controlling_Roxie_Queries">
  5. <title><emphasis role="bold">Controlling Roxie Queries</emphasis></title>
  6. <para>There are several ECL functions that are designed specifically to help
  7. optimize queries for execution on Roxie. These include PRELOAD, ALLNODES,
  8. THISNODE, LOCAL, and NOLOCAL. Understanding how all these functions work
  9. together can make a big difference in the performance of your Roxie
  10. queries.</para>
  11. <sect2 id="How_Graphs_Execute">
  12. <title>How Graphs Execute</title>
  13. <para>Writing efficient queries for Roxie or Thor can require an
  14. understanding of how the different clusters operate. This brings up three
  15. questions:</para>
  16. <para>How does the graph execute, on a single node, or on all nodes in
  17. parallel?</para>
  18. <para>How are datasets accessed by each node executing the graph, only the
  19. parts that are local to the node, or all parts on all nodes?</para>
  20. <para>Does an operation coordinate with the same operation on other nodes,
  21. or does each node operate independently?</para>
  22. <para>Here's how queries “normally” execute on each type of
  23. cluster:</para>
  24. <para><informaltable colsep="0" frame="none" rowsep="0">
  25. <tgroup cols="2">
  26. <colspec colwidth="77.40pt" />
  27. <colspec />
  28. <tbody>
  29. <row>
  30. <entry><emphasis role="bold">Thor</emphasis></entry>
  31. <entry>Graphs execute on multiple slave nodes in
  32. parallel.</entry>
  33. </row>
  34. <row>
  35. <entry></entry>
  36. <entry>Index/disk reads are done locally by each slave
  37. node.</entry>
  38. </row>
  39. <row>
  40. <entry></entry>
  41. <entry>All other disk access (FETCH, keyed JOIN, etc.) are
  42. effectively accessed across all nodes.</entry>
  43. </row>
  44. <row>
  45. <entry></entry>
  46. <entry>Coordination with operations on other nodes is controlled
  47. by the presence or absence of the LOCAL option on the
  48. operation.</entry>
  49. </row>
  50. <row>
  51. <entry></entry>
  52. <entry>No support for child queries (this may change in future
  53. releases).</entry>
  54. </row>
  55. <row>
  56. <entry><emphasis role="bold">hthor</emphasis></entry>
  57. <entry>Graphs execute on the single ECL Agent node.</entry>
  58. </row>
  59. <row>
  60. <entry></entry>
  61. <entry>All parts of the dataset/index are accessed by directly
  62. accessing the disk drive of the node with the data—no other
  63. interaction with the other nodes.</entry>
  64. </row>
  65. <row>
  66. <entry></entry>
  67. <entry>Child queries always execute on same node as
  68. parent.</entry>
  69. </row>
  70. <row>
  71. <entry><emphasis role="bold">Roxie</emphasis></entry>
  72. <entry>Graphs execute on a single (Roxie server) node.</entry>
  73. </row>
  74. <row>
  75. <entry></entry>
  76. <entry>All parts of the dataset/index are accessed by directly
  77. accessing the disk drive of the node with the data—no other
  78. interaction with the other nodes.</entry>
  79. </row>
  80. <row>
  81. <entry></entry>
  82. <entry>Child queries might execute on a single agent node
  83. instead of a Roxie server node.</entry>
  84. </row>
  85. </tbody>
  86. </tgroup>
  87. </informaltable></para>
  88. </sect2>
  89. <sect2 id="ALLNODES_vs_THISNODE">
  90. <title>ALLNODES vs. THISNODE</title>
  91. <para>In Roxie, graphs execute on a single Roxie server node unless the
  92. ALLNODES() function is used. ALLNODES() causes the portion of the query it
  93. encloses to execute on all agent nodes in parallel. The results are
  94. calculated independently on each node then merged together, without
  95. ordering the records. It is generally used to do some complex remote
  96. processing which only requires local index access, substantially reducing
  97. the network traffic between the nodes.</para>
  98. <para>By default, everything within the ALLNODES() will be executed on all
  99. the nodes, but sometimes the ALLNODES() query requires some input or
  100. arguments that shouldn't be executed on all the nodes—for example, the
  101. previous best guess at the results, or some information controlling the
  102. parallel query. The THISNODE() function can be used to surround element
  103. that are to be evaluated by the current node instead.</para>
  104. <para>A typical usage would look like this:</para>
  105. <programlisting>bestSearchResults := ALLNODES(doRemoteSearch(THISNODE(searchWords),THISNODE(previousResults)))
  106. </programlisting>
  107. <para>Where 'searchWords' and 'previousResults' are effectively calculated
  108. on the current node, and then passed as parameters to each instance of the
  109. doRemoteSearch() executing in parallel on all nodes.</para>
  110. </sect2>
  111. <sect2 id="LOCAL_vs_NOLOCAL">
  112. <title>LOCAL vs. NOLOCAL</title>
  113. <para>The LOCAL option available on many functions (like JOIN, SORT, etc.)
  114. and the LOCAL() and NOLOCAL() functions control whether the graphs running
  115. on a particular node access all parts of a file/index or only those
  116. associated with the particular node (LOCAL). Often within an ALLNODES()
  117. context you only want to access local index parts from a single node
  118. because each node is independently processing its associated parts.
  119. Specifying that an index read or a keyed JOIN is LOCAL means that only the
  120. local part is used on each node. A local read of a single part INDEX will
  121. only be evaluated on the first agent node (or the farmer node if not
  122. within an ALLNODES)</para>
  123. <para>Local evaluation can be specified in two ways:</para>
  124. <blockquote>
  125. <para>1) As a dataset operation:</para>
  126. <programlisting>LOCAL(MyIndex)(myField = searchField)</programlisting>
  127. <para>2) As an option on the operation:</para>
  128. <programlisting>JOIN(... ,LOCAL)
  129. FETCH(... ,LOCAL)</programlisting>
  130. </blockquote>
  131. <para>The LOCAL(<emphasis>dataset</emphasis>) function causes every
  132. operation on the <emphasis>dataset</emphasis> to access the file/key
  133. locally. For example,</para>
  134. <programlisting>LOCAL(JOIN(index1, index2,...))</programlisting>
  135. <para>will read index1 and index2 locally. This rule is recursively
  136. applied until you reach one of the following:</para>
  137. <blockquote>
  138. <para>Use of the NOLOCAL() function</para>
  139. <para>A non-local attribute—the operation stays non-local, but children
  140. are still marked as local as necessary</para>
  141. <para>A GLOBAL() or THISNODE() or workflow operation—since they will be
  142. evaluated in a different context</para>
  143. <para>Use of the ALLNODES() function (as in a nested child query)</para>
  144. </blockquote>
  145. <para>Note that:</para>
  146. <para>JOIN(x, LOCAL(index1)...) is treated the same as JOIN(x, index1,
  147. ..., local).</para>
  148. <para>LOCAL is also supported as an option on an INDEX, but the LOCAL()
  149. function is preferred, because it generally depends on the context an
  150. index is used in whether or not access to it should be local or
  151. not.</para>
  152. <para>A non-local attribute is supported everywhere that a LOCAL attribute
  153. is allowed - to override an enclosing LOCAL() function.</para>
  154. <para>The use of LOCAL to indicate that dataset/key access is local does
  155. not conflict with its use to control coordination of an operation with
  156. other nodes, because there is no operation that potentially co-ordinates
  157. with other nodes and also accesses indexes or datasets.</para>
  158. </sect2>
  159. <sect2 id="NOROOT_Indexes">
  160. <title>NOROOT Indexes</title>
  161. <para>The ALLNODES() function is particularly useful if there is more than
  162. one index co-distributed on a particular value so that all information
  163. that relates to a particular key field value is associated with the same
  164. node. However generally indexes are globally sorted. <emphasis
  165. role="bold">Adding a NOROOT option to a BUILD action or INDEX declaration
  166. indicates that the index is </emphasis><emphasis
  167. role="bold">not</emphasis><emphasis role="bold"> globally sorted, and
  168. there is no root index to indicate which part of the index will contain a
  169. particular entry.</emphasis></para>
  170. </sect2>
  171. </sect1>