Преглед на файлове

Merge remote-tracking branch 'origin/candidate-6.2.x' into candidate-6.2.8

Signed-off-by: Richard Chapman <rchapman@hpccsystems.com>
Richard Chapman преди 8 години
родител
ревизия
900d83f924

+ 32 - 8
docs/DynamicESDL/DynamicESDL_Includer.xml

@@ -224,9 +224,10 @@
         </listitem>
 
         <listitem>
-          <para><?dbfo keep-together="always"?>Select <emphasis
-          role="bold">Advanced View</emphasis> and then select the source
-          environment XML file to edit.</para>
+          <?dbfo keep-together="always"?>
+
+          <para>Select <emphasis role="bold">Advanced View</emphasis> and then
+          select the source environment XML file to edit.</para>
 
           <para><graphic fileref="images/desdl-openconfig.jpg" /></para>
         </listitem>
@@ -240,18 +241,20 @@
           and select <emphasis role="bold">New Esp Services</emphasis> &gt;
           <emphasis role="bold">Dynamic ESDL</emphasis>.</para>
 
-          <para></para>
-
           <para><graphic fileref="images/desdl-addDESDL.jpg" /></para>
         </listitem>
 
         <listitem>
+          <?dbfo keep-together="always"?>
+
           <para>Provide a name for the service.</para>
 
-          <para><graphic fileref="images/dsdl-NametheServoice.jpg" /></para>
+          <para><graphic fileref="images/dsdl-NametheService.jpg" /></para>
         </listitem>
 
         <listitem>
+          <?dbfo keep-together="always"?>
+
           <para>Select your ESP , then select the ESP Service Bindings
           tab.</para>
 
@@ -259,6 +262,8 @@
         </listitem>
 
         <listitem>
+          <?dbfo keep-together="always"?>
+
           <para>Right-click in the list of bindings and select <emphasis
           role="bold">Add</emphasis></para>
 
@@ -319,6 +324,8 @@ sudo cp /etc/HPCCSystems/source/NewEnvironment.xml /etc/HPCCSystems/environment.
         </listitem>
 
         <listitem>
+          <?dbfo keep-together="always"?>
+
           <para>Restart the HPCC system on <emphasis
           role="bold">every</emphasis> node. The following command starts the
           HPCC system on an individual node:</para>
@@ -492,13 +499,29 @@ OUTPUT(ds_out, NAMED('AddThisResponse')); </programlisting>
             <para>Create the configuration XML file</para>
 
             <programlisting>&lt;Methods&gt;
-    &lt;Method name="AddThis" url="&lt;RoxieIP&gt;:9876" querytype="roxie" queryname="AddThis"/&gt;
+  &lt;Method name="AddThis" url="&lt;RoxieIP&gt;:9876" querytype="roxie" queryname="AddThis"&gt;
+    &lt;!--Optional Method Context Information start--&gt;
+    &lt;Gateways&gt;
+       &lt;Gateway name="mygateway" url="1.1.1.1:2222/someservice/somemethod/&gt;
+       &lt;Gateway name="anothergateway" url="2.2.2.2:9999/someservice/somemethod/&gt;
+    &lt;/Gateways&gt;
+    &lt;!--Optional Method Context Information end--&gt;
+  &lt;/Method&gt;
 &lt;/Methods&gt;</programlisting>
 
             <para>Where name is the name of your method(s) and url is the
             Roxie server's IP and port and queryname is the published name
             (alias) of the query. For a multi-node Roxie, you can use a range
             in the form of nnn.nnn.nnn.n-nnn.</para>
+
+            <para>Optionally, your method could include context information as
+            illustrated in the above example. The context information should
+            be formated such that it is consumable by the target ECL query.
+            The HPCC DESDL ESP does not impose any restrictions on the context
+            information passed in the configuration file other than it must be
+            valid XML.</para>
+
+            <!-- /* Refer to DESDL XML Configurations?  */  Ex. For more information on XXXX see XXXX.-->
           </listitem>
 
           <listitem>
@@ -509,7 +532,8 @@ OUTPUT(ds_out, NAMED('AddThisResponse')); </programlisting>
             <para>Bind the service methods to the queries using a the
             configuration XML.</para>
 
-            <programlisting>esdl bind-service myesp 8003 MathService.1 MathService --config MathSvcCfg.xml -s nnn.nnn.nnn.nnn -p 8010
+            <programlisting>esdl bind-service myesp 8003 MathService.1 MathService --config MathSvcCfg.xml 
+                  -s nnn.nnn.nnn.nnn -p 8010
 </programlisting>
 
             <para>Where myesp is the name of your ESP process, 8003 is the

+ 1 - 3
docs/ECLProgrammersGuide/PrGd-Includer.xml

@@ -178,13 +178,11 @@
                 xmlns:xi="http://www.w3.org/2001/XInclude" />
   </chapter>
 
-  <chapter>
+  <chapter id="EmbeddedLanguages_DataStores">
     <title>Embedded Languages and Data stores</title>
 
     <xi:include href="ECLProgrammersGuide/PRG_Mods/CodeSign.xml"
                 xpointer="element(/1)"
                 xmlns:xi="http://www.w3.org/2001/XInclude" />
-
-   
   </chapter>
 </book>

+ 89 - 3
docs/HPCCClientTools/CT_Mods/CT_ESDL_CLI.xml

@@ -741,13 +741,99 @@
         syntax:</para>
 
         <programlisting format="linespecific">&lt;Methods&gt;
-  &lt;Method name="myMthd1" url="http://&lt;RoxieIP&gt;:9876/somepath?someparam=value" user="me" password="mypw"/&gt;
-  &lt;Method name="myMthd2" url="http://&lt;RoxieIP&gt;:9876/somepath?someparam=value" user="me" password="mypw"/&gt;
+  &lt;Method name="myMthd1" url="&lt;RoxieIP&gt;:9876/path?param=value" user="me" password="mypw"/&gt;
+  &lt;Method name="myMthd2" url="&lt;RoxieIP&gt;:9876/path?param=value" user="me" password="mypw"/&gt;
 &lt;/Methods&gt;</programlisting>
 
         <para><emphasis role="bold">Example:</emphasis></para>
 
-        <programlisting format="linespecific">esdl bind-service myesp 8003 MathSvc.1 MathSvc --config MathSvcCfg.xml -s nnn.nnn.nnn.nnn -p 8010</programlisting>
+        <programlisting format="linespecific">esdl bind-service myesp 8003 MathSvc.1 MathSvc --config MathSvcCfg.xml 
+                  -s nnn.nnn.nnn.nnn -p 8010</programlisting>
+
+        <sect3>
+          <title>Configuring ESDL binding methods</title>
+
+          <para>The DESDL binding methods can optionally provide context
+          information to the target ECL query. The way this information is
+          configured, is by appending child elements to the Method
+          (&lt;Method&gt;...&lt;/Method&gt;) portion of the ESDL
+          Binding.</para>
+
+          <para>For example, the following XML provides a sample ESDL
+          Binding.</para>
+
+          <programlisting>&lt;Methods&gt; 
+  &lt;Method name="AddThis" url="&lt;RoxieIP&gt;:9876" querytype="roxie" queryname="AddThis"/&gt; 
+&lt;/Methods&gt;</programlisting>
+
+          <para>If this Method requires context information, for example about
+          gateways, then you could include the Gateways Structure
+          (&lt;Gateways&gt;...&lt;/Gateways&gt;) depicted as follows.</para>
+
+          <programlisting>&lt;Methods&gt;
+  &lt;Method name="AddThis" url="&lt;RoxieIP&gt;:9876" querytype="roxie" queryname="AddThis"&gt;
+    &lt;!--Optional Method Context Information start--&gt;
+    &lt;Gateways&gt;
+      &lt;Gateway name="mygateway" url="1.1.1.1:2222/someservice/somemethod/&gt;
+      &lt;Gateway name="anothergateway" url="2.2.2.2:9999/someservice/somemethod/&gt;
+    &lt;/Gateways&gt;
+    &lt;!--Optional Method Context Information end--&gt;
+  &lt;/Method&gt;
+&lt;/Methods&gt;</programlisting>
+
+          <para>The DESDL ESP does not pose any restrictions on the layout of
+          this information, only that it is valid XML. This provides the
+          flexibility to include context information in any valid XML
+          format.</para>
+
+          <para>Roxie (query) ECL developers need to decide what information
+          they will need from the ESP request and design how that information
+          is laid-out in the ESP request and ESDL binding
+          configuration.</para>
+
+          <para>In the following example, every "AddThis" request processed by
+          the ESP and sent to Roxie would contain the sample gateway
+          information in the request context.</para>
+
+          <programlisting>&lt;?xml version="1.0" encoding="utf-8"?&gt;
+&lt;soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"&gt;
+&lt;soap:Body&gt;
+  &lt;roxie.AddThis&gt;
+   &lt;Context&gt;
+    &lt;Row&gt;
+     &lt;Common&gt;
+      &lt;ESP&gt;
+       &lt;ServiceName&gt;wsmath&lt;/ServiceName&gt;
+       &lt;Config&gt;
+        &lt;Method name="AddThis" url="&lt;RoxieIP&gt;:9876" querytype="roxie" queryname="AddThis"&gt;
+          &lt;Gateways&gt;
+            &lt;Gateway name="mygateway" url="1.1.1.1:2222/someservice/somemethod/&gt;
+            &lt;Gateway name="anothergateway" url="2.2.2.2:9999/someservice/somemethod/&gt;
+          &lt;/Gateways&gt;
+        &lt;/Method&gt;
+       &lt;/Config&gt;
+      &lt;/ESP&gt;
+        &lt;TransactionId&gt;sometrxid&lt;/TransactionId&gt;
+     &lt;/Common&gt;
+    &lt;/Row&gt;
+   &lt;/Context&gt;
+   &lt;AddThisRequest&gt;
+    &lt;Row&gt;
+     &lt;Number1&gt;34&lt;/Number1&gt;
+     &lt;Number2&gt;232&lt;/Number2&gt;
+    &lt;/Row&gt;
+   &lt;/AddThisRequest&gt;
+  &lt;/roxie.AddThis&gt;
+&lt;/soap:Body&gt;
+&lt;/soap:Envelope&gt;</programlisting>
+
+          <para>The ECL query consumes this information and is free to do
+          whatever it needs to with it. In some instances, the query needs to
+          send a request to a gateway in order to properly process the current
+          request. It can interrogate the context information for the
+          appropriate gateway's connection information, then use that
+          information to create the actual gateway request connection.</para>
+        </sect3>
       </sect2>
 
       <sect2 id="CT_ESDL_CLI_esdl_list-bindings" role="brk">

+ 201 - 96
docs/HPCCSystemAdmin/HPCCSystemAdministratorsGuide.xml

@@ -175,7 +175,7 @@
         <title>System Servers</title>
 
         <para>The System Servers are integral middleware components of an HPCC
-        system. They are used to control workflow and intercomponent
+        system. They are used to control workflow and inter-component
         communication.</para>
 
         <sect3 id="SysAdm_Dali">
@@ -183,26 +183,43 @@
 
           <para>Dali is also known as the system data store. It manages
           workunit records, logical file directory, and shared object
-          services.</para>
+          services. It maintains the message queues that drive job execution
+          and scheduling.</para>
 
-          <para>It maintains the message queues that drive job execution and
-          scheduling. It also enforces the all LDAP security
-          restrictions.</para>
+          <para>Dali also performs session management. It tracks all active
+          Dali client sessions registered in the environment, such that you
+          can list all clients and their roles. (see <emphasis>dalidiag
+          -clients</emphasis>)</para>
+
+          <para>Another task Dali performs is to act as the locking manager.
+          HPCC uses Dali's locking manager to control shared and exclusive
+          locks to metadata.</para>
         </sect3>
 
         <sect3 id="SysAdm_Sahsa">
           <title>Sasha</title>
 
           <para>The Sasha server is a companion “housekeeping” server to the
-          Dali server. It works independently of all other components. Sasha’s
-          main function is to reduce the stress on the Dali server. Whenever
-          possible, Sasha reduces the resource utilization on Dali.</para>
+          Dali server. Sasha works independently of, yet in conjunction with
+          Dali. Sasha’s main function is to reduce the stress on the Dali
+          server. Wherever possible, Sasha reduces the resource utilization on
+          Dali. A very important aspect of Sasha is coalescing, by saving the
+          in-memory store to a new store edition.</para>
 
-          <para>Sasha archives workunits (including DFU Workunits) which are
-          stored in a series of folders.</para>
+          <para>Sasha archives workunits (including DFU Workunits) that are
+          then stored in folders on a disk.</para>
 
           <para>Sasha also performs routine housekeeping such as removing
           cached workunits and DFU recovery files.</para>
+
+          <para>Sasha can also run XREF, to cross reference physical files
+          with logical metadata, to determine if there are lost/found/orphaned
+          files. It then presents options (via EclWatch) for their recovery or
+          deletion.</para>
+
+          <para>Sasha is the component responsible for removing expired files
+          when the criteria has been met. The EXPIRE option on ECL's OUTPUT or
+          PERSIST sets that condition.</para>
         </sect3>
 
         <sect3 id="SysAdm_DFU">
@@ -303,21 +320,6 @@
           Those credentials are then used to authenticate any requests from
           those tools.</para>
         </sect3>
-
-        <!-- *** COMMENTING OUT WHOLE Of MONITORING SECTION
-       <sect3>
-          <title>HPCC Reporting</title>
-
-          <para>HPCC leverages the use of Ganglia reporting and monitoring
-          components to monitor several aspects of the HPCC System.</para>
-
-          <para>See <emphasis>HPCC Monitoring and Reporting</emphasis> for
-          more information on how to add monitoring and reporting to your HPCC
-          System.</para>
-
-          <para>More to come***</para>          
-        </sect3>
-        END COMMENT ***-->
       </sect2>
 
       <sect2 id="SysAdm_ClienInterfaces">
@@ -442,7 +444,7 @@
   </chapter>
 
   <chapter id="SysAdm_HWSizing">
-    <title>Hardware and Component Sizing</title>
+    <title>Hardware and Components</title>
 
     <para>This section provides some insight as to what sort of hardware and
     infrastructure optimally HPCC works well on. This is not an exclusive
@@ -527,17 +529,85 @@
       <para>HPCC Dali processes store cluster metadata in RAM. For optimal
       efficiency, provide at least 48GB of RAM, 6 or more CPU cores, 1Gb/sec
       network interface and a high availability disk for a single HPCC Dali.
-      HPCC's Dali processes are one of the few active/passive components.
-      Using standard “swinging disk” clustering is recommended for a high
-      availability setup. For a single HPCC Dali process, any suitable High
-      Availability (HA) RAID level is fine.</para>
-
-      <para>Sasha does not store any data. Sasha reads data from Dali then
-      processes it. Sasha does store archived workunits (WUs) on a disk.
-      Allocating a larger disk for Sasha reduces the amount of housekeeping
-      needed. Since Sasha assists Dali by performing housekeeping, it works
-      best when on its own node. You should avoid putting Sasha and Dali on
-      the same node.</para>
+      HPCC's Dali processes are one of the few native active/passive
+      components. Using standard “swinging disk” clustering is recommended for
+      a high availability setup. For a single HPCC Dali process, any suitable
+      High Availability (HA) RAID level is fine.</para>
+
+      <para>Sasha only stores data to locally available disks, reading data
+      from Dali then processing it by archiving workunits (WUs) to disk. It is
+      beneficial to configure Sasha for a larger amount of archiving so that
+      Dali does not keep too many workunits in memory. This requires a larger
+      amount of disk space.</para>
+
+      <para>Allocating greater disk space for Sasha is sound practice as
+      configuring Sasha for more archiving better benefits Dali. Since Sasha
+      assists Dali by performing housekeeping, it works best when on its own
+      node. Ideally, you should avoid putting Sasha and Dali on the same node,
+      because the node that runs these components is extremely critical,
+      particularly when it comes to recovering from losses. Therefore, it
+      should be as robust as possible: RAID drives, fault tolerant,
+      etc.</para>
+
+      <sect2>
+        <title>Sasha/Dali Interactions</title>
+
+        <para>A critical role of Sasha is in coalescing. When Dali shuts down,
+        it saves its in-memory store to a new store edition by creating a new
+        <emphasis>dalisdsXXXX.xml</emphasis>, where XXXX is incremented to the
+        new edition. The current edition is recorded by the filename
+        store.XXXX</para>
+
+        <para>An explicit request to save using
+        <emphasis>dalidiag</emphasis>:</para>
+
+        <programlisting> dalidiag . -save </programlisting>
+
+        <para>The new editions, as per the above example are created the same
+        way. During an explicit save, all changes to SDS are blocked.
+        Therefore all clients will block if they try to make any alteration
+        until the save is complete.</para>
+
+        <para>There are some options (though not commonly used) that can
+        configure Dali to detect quiet/idle time and force a save in exactly
+        the same way an explicit save request does, meaning that it will block
+        any write transactions while saving.</para>
+
+        <para>All Dali SDS changes are recorded in a delta transaction log (in
+        XML format) with a naming convention of
+        <emphasis>daliincXXXX.xml</emphasis>, where XXXX is the current store
+        edition. They are also optionally mirrored to a backup location. This
+        transaction log grows indefinitely until the store is saved.</para>
+
+        <para>In the normal/recommended setup, Sasha is the primary creator of
+        new SDS store editions. It does so on a schedule and according to
+        other configuration options (for example, you could configure for a
+        minimum delta transaction log size). Sasha reads the last saved store
+        and the current transaction log and replays the transaction log over
+        the last saved store to form a new in-memory version, and then saves
+        it. Unlike the Dali saving process, this does not block or interfere
+        with Dali. In the event of abrupt termination of the Dali process
+        (such as being killed or a power loss) Dali uses the same delta
+        transaction log at restart in order to replay the last save and
+        changes to return to the last operational state.</para>
+
+        <para></para>
+
+        <!-- *** COMMENTING OUT WHOLE Of MONITORING SECTION
+       <sect3>
+          <title>HPCC Reporting</title>
+
+          <para>HPCC leverages the use of Ganglia reporting and monitoring
+          components to monitor several aspects of the HPCC System.</para>
+
+          <para>See <emphasis>HPCC Monitoring and Reporting</emphasis> for
+          more information on how to add monitoring and reporting to your HPCC
+          System.</para>
+
+          <para>More to come***</para>          
+        </sect3>
+        END COMMENT ***-->
+      </sect2>
     </sect1>
 
     <sect1 id="SysAdm_OtherHPCCcomponents">
@@ -602,46 +672,59 @@
     <sect1 id="SysAdm_BackUpData" role="nobrk">
       <title>Back Up Data</title>
 
-      <para>An integral part of routine maintenance is the back up of
-      essential data. Devise a back up strategy to meet the needs of your
-      organization. This section is not meant to replace your current back up
-      strategy, instead this section supplements it by outlining special
-      considerations for HPCC Systems<superscript>®</superscript>.</para>
+      <para>An integral part of routine maintenance is the backup of essential
+      data. Devise a backup strategy to meet the needs of your organization.
+      This section is not meant to replace your current backup strategy,
+      instead this section supplements it by outlining special considerations
+      for HPCC Systems<superscript>®</superscript>.</para>
 
       <sect2 id="SysAdm_BackUpConsider">
-        <title>Back Up Considerations</title>
+        <title>Backup Considerations</title>
 
-        <para>You probably already have some sort of a back up strategy in
+        <para>You probably already have some sort of a backup strategy in
         place, by adding HPCC Systems<superscript>®</superscript> into your
         operating environment there are some additional considerations to be
-        aware of. The following sections discuss back up considerations for
-        the individual HPCC system components.</para>
+        aware of. The following sections discuss backup considerations for the
+        individual HPCC system components.</para>
 
         <sect3 id="SysAdm_BkU_Dali">
           <title>Dali</title>
 
-          <para>Dali can be configured to create its own back up, ideally you
-          would want that back up kept on a different server or node. You can
-          specify the Dali back up folder location using the Configuration
-          Manager. You may want to keep multiple copies that back up, to be
-          able to restore to a certain point in time. For example, you may
-          want to do daily snapshots, or weekly.</para>
-
-          <para>You may want to keep back up copies at a system level using
-          traditional back up methods.</para>
+          <para>Dali can be configured to create its own backup. It is
+          strongly recommended that the backup be kept on a different server
+          or node for disaster recovery purposes. You can specify the Dali
+          backup folder location using the Configuration Manager. You may want
+          to keep multiple generations of backups, to be able to restore to a
+          certain point in time. For example, you may want to do daily
+          snapshots, or weekly.</para>
+
+          <para>You may want to keep backup copies at a system level using
+          traditional methods. Regardless of method or scheme you would be
+          well advised to backup your Dali.</para>
+
+          <para>You should try to avoid putting Dali, Sasha, and even your
+          Thor Master on the same node. Ideally you want each of these
+          components to be on separate nodes to not only reduce the stress on
+          the system hardware (allowing the system to operate better) but also
+          enabling you to recover your entire environment, files, and
+          workunits in the event of a loss. In addition it would affect every
+          other Thor/Roxie cluster in the same environment if you lose this
+          node.</para>
         </sect3>
 
         <sect3 id="SysAdm_BkUp_Sasha">
           <title>Sasha</title>
 
-          <para>Sasha itself generates no original data but archives workunits
-          to disks. Be aware that Sasha can create quite a bit of archive
-          data. Once the workunits are archived they are no longer available
-          in the Dali data store. The archives can still be retrieved, but
-          that archive now becomes the only copy of these workunits.</para>
+          <para>Sasha is the component that does the SDS coalescing. It is
+          normally the sole component that creates new store editions. It's
+          also the component that creates the XREF metadata that ECLWatch
+          uses. Be aware that Sasha can create quite a bit of archive data.
+          Once the workunits are archived they are no longer available in the
+          Dali data store. The archives can still be accessed through ECL
+          Watch by restoring them to Dali.</para>
 
-          <para>If you need high availability for these archived workunits,
-          you should back them up at a system level using traditional back up
+          <para>If you need high availability for archived workunits, you
+          should back them up at a system level using traditional backup
           methods.</para>
         </sect3>
 
@@ -688,18 +771,18 @@
           <title>Thor</title>
 
           <para>Thor, the data refinery, as one of the critical components of
-          HPCC Systems<superscript>®</superscript> needs to be backed up. Back
-          up Thor by configuring replication and setting up a nightly back up
-          cron task. Back up Thor on demand before and/or after any node swap
-          or drive swap if you do not have a RAID configured.</para>
+          HPCC Systems<superscript>®</superscript> needs to be backed up.
+          Backup Thor by configuring replication and setting up a nightly
+          backup cron task. Backup Thor on demand before and/or after any node
+          swap or drive swap if you do not have a RAID configured.</para>
 
           <para>A very important part of administering Thor is to check the
-          logs to ensure the previous back ups completed successfully.</para>
+          logs to ensure the previous backups completed successfully.</para>
 
           <para><emphasis role="bold">Backupnode</emphasis></para>
 
           <para>Backupnode is a tool that is packaged with HPCC. Backupnode
-          allows you to back up Thor nodes on demand or in a script. You can
+          allows you to backup Thor nodes on demand or in a script. You can
           also use backupnode regularly in a crontab. You would always want to
           run it on the Thor master of that cluster.</para>
 
@@ -718,7 +801,7 @@
           <programlisting>  /bin/su - hpcc -c "/opt/HPCCSystems/bin/start_backupnode thor400_7s" &amp; </programlisting>
 
           <para>To run backupnode regularly you could use cron. For example,
-          you may want a crontab entry (to back up thor400_7s) set to run at
+          you may want a crontab entry (to backup thor400_7s) set to run at
           1am daily:</para>
 
           <programlisting>  0 1 * * * /bin/su - hpcc -c "/opt/HPCCSystems/bin/start_backupnode thor400_7s" &amp; </programlisting>
@@ -729,7 +812,7 @@
           <para>/var/log/HPCCSystems/backupnode/MM_DD_YYYY_HH_MM_SS.log</para>
 
           <para>The (MM) Month, (DD) Day, (YYYY) 4-digit Year, (HH) Hour, (MM)
-          Minutes, and (SS) Seconds of the back up comprising the log file
+          Minutes, and (SS) Seconds of the backup comprising the log file
           name.</para>
 
           <para>The main log file exists on the Thor master node. It shows
@@ -737,9 +820,9 @@
           backupnode logs on each of the Thor nodes showing what files, if
           any, it needed to restore.</para>
 
-          <para>It is important to check the logs to ensure the previous back
-          ups completed successfully. The following entry is from the
-          backupnode log showing that back up completed successfully:</para>
+          <para>It is important to check the logs to ensure the previous
+          backups completed successfully. The following entry is from the
+          backupnode log showing that backup completed successfully:</para>
 
           <programlisting>00000028 2014-02-19 12:01:08 26457 26457 "Completed in 0m 0s with 0 errors" 
 00000029 2014-02-19 12:01:08 26457 26457 "backupnode finished" </programlisting>
@@ -755,9 +838,9 @@
               <para><emphasis role="bold">Original Source Data File
               Retention:</emphasis> When a query is published, the data is
               typically copied from a remote site, either a Thor or a Roxie.
-              The Thor data can serve as back up, provided it is not removed
-              or altered on Thor. Thor data is typically retained for a period
-              of time sufficient to serve as a back up copy.</para>
+              The Thor data can serve as backup, provided it is not removed or
+              altered on Thor. Thor data is typically retained for a period of
+              time sufficient to serve as a backup copy.</para>
             </listitem>
 
             <listitem>
@@ -777,7 +860,7 @@
               copies of data files. With three sibling Roxie clusters that
               have peer node redundancy, there are always six copies of each
               file part at any given time; eliminating the need to use
-              traditional back up procedures for Roxie data files.</para>
+              traditional backup procedures for Roxie data files.</para>
             </listitem>
           </itemizedlist>
         </sect3>
@@ -787,15 +870,15 @@
 
           <para>The Landing Zone is used to host incoming and outgoing files.
           This should be treated similarly to an FTP server. Use traditional
-          system level back ups.</para>
+          system level backups.</para>
         </sect3>
 
         <sect3 id="SysAdm_BkUp_Misc">
           <title>Misc</title>
 
-          <para>Back up of any additional component add-ons, your environment
+          <para>Backup of any additional component add-ons, your environment
           files (environment.xml), or other custom configurations should be
-          done according to traditional back up methods.</para>
+          done according to traditional backup methods.</para>
         </sect3>
       </sect2>
     </sect1>
@@ -834,7 +917,7 @@
         <para>Understanding the log files, and what is normally reported in
         the log files, helps in troubleshooting the HPCC system.</para>
 
-        <para>As part of routine maintenance you may want to back up, archive,
+        <para>As part of routine maintenance you may want to backup, archive,
         and remove the older log files.</para>
       </sect2>
 
@@ -1116,11 +1199,11 @@ lock=/var/lock/HPCCSystems</programlisting>
         <para>Configuring your system for remote file access over Transport
         Layer Security (TLS) requires modifying the <emphasis
         role="bold">dafilesrv</emphasis> setting in the
-        <emphasis>environment.conf</emphasis> file. </para>
+        <emphasis>environment.conf</emphasis> file.</para>
 
         <para>To do this either uncomment (if they are already there), or add
         the following lines to the <emphasis>environment.conf</emphasis> file.
-        Then set the values as appropriate for your system. </para>
+        Then set the values as appropriate for your system.</para>
 
         <para><programlisting>#enable SSL for dafilesrv remote file access
 dfsUseSSL=true
@@ -1129,7 +1212,7 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
         role="blue">dfsUseSSL=true</emphasis> and set the value for the paths
         to point to the certificate and key file paths on your system. Then
         deploy the <emphasis>environment.conf</emphasis> file (and cert/key
-        files) to all nodes as appropriate. </para>
+        files) to all nodes as appropriate.</para>
 
         <para>When dafilesrv is enabled for TLS (port 7600), it can still
         connect over a non-TLS connection (port 7100) to allow legacy clients
@@ -1326,7 +1409,7 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
         Active/passive meaning you would have two Dalis running, one primary,
         or active, and the other passive. In this scenario all actions are run
         on the active Dali, but duplicated on the passive one. If the active
-        Dali fails, then you can fail over to the passive Dali.</para>
+        Dali fails, then you can fail over to the passive Dali.<!--NOTE: Add steps for how to configure an Active/Passive Dali--></para>
 
         <para>Another suggested best practice is to use standard clustering
         with a quorum and a takeover VIP (a kind of load balancer). If the
@@ -1397,8 +1480,7 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
         meaning you would have two instances running, one primary (active),
         and the other passive. No load balancer needed. If the active instance
         fails, then you can fail over to the passive. Failover then uses the
-        VIP (a kind of load balancer) to distribute any incoming
-        requests.</para>
+        VIP (a kind of load balancer) to distribute any incoming requests.<!--NOTE: Add steps for how to configure the Active/Passive Thor--></para>
       </sect2>
 
       <sect2 id="SysAdm_BestPrac_DropZone">
@@ -1427,17 +1509,23 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
 
         <para>When designing a Thor cluster for high availability, consider
         how it actually works -- a Thor cluster accepts jobs from a job queue.
-        If there are two Thor clusters handling the queue, one will continue
-        accepting jobs if the other one fails.</para>
+        If there are two Thor clusters servicing the job queue, one will
+        continue accepting jobs if the other one fails.</para>
 
-        <para>If a single component (thorslave or thormaster) fails, the other
-        will continue to process requests. With replication enabled, it will
-        be able to read data from the back up location of the broken Thor.
-        Other components (such as ECL Server, or ESP) can also have multiple
+        <para>With replication enabled, the still-functioning Thor will be
+        able to read data from the back up location of the broken Thor. Other
+        components (such as ECL Server, or ESP) can also have multiple
         instances. The remaining components, such as Dali, or DFU Server, work
-        in a traditional shared storage high availability fail over
+        in a traditional shared storage high availability failover
         model.</para>
 
+        <para>Another important consideration is to keep your ESP and Dali on
+        separate nodes from your Thor master. This way if your Thor master
+        fails, you can replace it, bring up the replacement with the same IP
+        (address) and it should then come up. Since Thor stores no workunit
+        data, the DALI and ESP can provide the file metadata to recover your
+        workunits.</para>
+
         <sect3 id="Thor_HA_Downside">
           <title>The Downside</title>
 
@@ -1522,7 +1610,7 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
         <para>Replication of some components (ECL Agent, ESP/Eclwatch, DFU
         Server, etc.) are pretty straight forward as they really don’t have
         anything to replicate. Dali is the biggest consideration when it comes
-        to replication. In the case of Dali, you have Sasha as the back up
+        to replication. In the case of Dali, you have Sasha as the backup
         locally. The Dali files can be replicated using rsync. A better
         approach could be to use a synchronizing device (cluster WAN sync, SAN
         block replication, etc.), and just put the Dali stores on that and
@@ -1571,7 +1659,24 @@ dfsSSLPrivateKeyFile=/keyfilepath/keyfile</programlisting>Set the <emphasis
         number of cores divided by two is the maximum number of Thor clusters
         to use.</para>
 
-        <para></para>
+        <sect3>
+          <title>Multiple Nodes</title>
+
+          <para>Try to keep resources running on their own nodes, if possible
+          for either one or multiple Thor clusters. If running some kind of
+          active/passive high availability, don't keep your active and passive
+          master on the same node. Try to keep Dali and ESP on separate nodes.
+          Even if you don't have the luxury of very many nodes, you want the
+          Thor master and the Dali (at minimum) to be on separate nodes. The
+          best practice is to keep as many components as possible their own
+          nodes.</para>
+
+          <para>Another consideration for a multiple node system is to avoid
+          putting any of the components on nodes with slaves. This is not a
+          best practice and leads to an unbalanced cluster, resulting in those
+          slaves with less memory/cpu taking longer than the rest and dragging
+          the whole performance of the cluster down as a result.</para>
+        </sect3>
       </sect2>
 
       <sect2 id="virtual-thor-slaves">

BIN
docs/images/DHSMC8508T.jpg


BIN
docs/images/E1200-lg1.jpg


BIN
docs/images/ECLWUserDetails.jpg


BIN
docs/images/Exa_E1200i.jpg


BIN
docs/images/Force10_ExaScaleE6001200.jpg


BIN
docs/images/Perm007.jpg


BIN
docs/images/c150-lg.jpg


BIN
docs/images/dsdl-NametheService.jpg


BIN
docs/images/s25v_s50v.jpg


BIN
docs/images/s55.jpg