Преглед изворни кода

Merge pull request #1089 from rengolin/docs

Doc additions: testing, style, patterns

Reviewed-By: Richard Chapman <rchapman@hpccsystems.com>
Richard Chapman пре 13 година
родитељ
комит
965a89b830
1 измењених фајлова са 407 додато и 14 уклоњено
  1. 407 14
      sourcedoc.xml

+ 407 - 14
sourcedoc.xml

@@ -193,9 +193,96 @@
                 Ubuntu). If the operating system is not one of the above, or is not recognized,
                 make package will create a tarball.
             </para>
+            <para>
+                The package installation does not start the service on the machine, so if you
+                want to give it a go or test it (see below), make sure to start the service manually
+                and wait until all services are up (mainly wait for EclWatch to come up on port 8010).
+            </para>
         </sect1>
         <sect1>
-            <title>Debugging a system</title>
+            <title>Testing the system</title>
+            <para>
+                After compiling, installing the package and starting the services, you can test
+                the HPCC platform on a single-node setup.
+            </para>
+            <sect2>
+                <title>Unit Tests</title>
+                <para>
+                    Some components have their own unit-tests. Once you have compiled (no need to
+                    start the services), you can already run them. Supposing you build a Debug
+                    version, from the build directory you can run:
+                    <programlisting>./Debug/bin/roxie -selftest</programlisting>
+                    and
+                    <programlisting>./Debug/bin/eclagent -selftest</programlisting>
+                </para>
+                <para>
+                    You can also run the Dali regression self-tests:
+                    <programlisting>./Debug/bin/daregress localhost</programlisting>
+                </para>
+            </sect2>
+            <sect2>
+                <title>Regression Tests</title>
+                <para>
+                    After the initial batch of unit-tests, which are quick and show only the most
+                    basic errors in the system, you can run the more complete regressions' test.
+                    These tests are located in the source directory 'testing/ecl' and you'll need
+                    the HPCC platform up and running to execute them.
+                </para>
+                <para>
+                    Step 1: Configure your regression suites. This only need to be done once.
+                    <programlisting>./runregress -ini=environment.xml</programlisting>
+                    The file 'environment.xml' is normally located in your '/etc/HPCCPlatform'
+                    directory and contains information on how your cluster is set-up, so the
+                    regression engine can reach it. You should see a new file, 'regress.ini'.
+                    Edit it to accommodate to your preferred setup.
+                </para>
+                    Note: There is a current issue with Roxie tests, so you should comment out
+                    the 'roxie' from 'setup_clusters'. That will leave you about 650 tests to run.
+                </para>
+                <para>
+                    Note 2: There is another issue with eclplus having to live in the current
+                    testing directory. For now, you have to copy or symlink 'eclplus' into that
+                    directory. You can get it from your build directory.
+                </para>
+                <para>
+                    Step 2: Create test files. You'll need some files created as part of the
+                    tests. You should also need this to be run once, too, unless you have cleaned
+                    the files for any reason.
+                    <programlisting>./runregress -setup</programlisting>
+                    There is no reason for this to fail, you should get all queries executed
+                    successfully.
+                </para>
+                <para>
+                    Step 3: Run the regression tests. This takes about 5-10 minutes on a machine
+                    with multiple CPUs/cores. There is an optimum value on the number of parallel
+                    queries, not necessarily more is faster. Start with 50 and work your way up
+                    and down to a better number for your machine.
+                    <programlisting>./runregress -pq 50 hthor_suite</programlisting>
+                    If some of the queries gets locked, CTRL+C them won't help. You need to abort
+                    them from the EclWatch interface, or restart the service.
+                </para>
+                <para>
+                    If, after it finishes, you want to see the report again, just run:
+                    <programlisting>./runregress -n -report Summary hthor_suite</programlisting>
+                </para>
+                <para>
+                    If you want to re-run a simgle test, just run:
+                    <programlisting>./runregress -n -query anytest.ecl hthor_suite</programlisting>
+                </para>
+                <para>
+                    All test results and their expected files are in the suite's directory (like
+                    hthor_suite), on 'out' and 'key' respectively.
+                </para>
+            </sect2>
+            <sect2>
+                <title>Compiler Tests</title>
+                <para>
+                    TODO: Describe compiler tests at 'ecl/regress'.
+                </para>
+            </sect2>
+        </sect1>
+        <sect1>
+            <title>Debugging the system</title>
             <para>
                 On linux systems, the makefile generated by cmake will build a specific
                 version (debug or release) of the system depending on the options selected 
@@ -234,9 +321,46 @@
         <sect1>
             <title>C++ coding conventions</title>
             <para>
-                For the most part out coding style conventions match those described at 
-                http://geosoft.no/development/cppstyle.html, with a few exceptions or extensions as 
-                noted below.
+                Unlike most software projects around, HPCC has some very specific
+                constraints that makes most basic design decisions difficult, and often
+                the results are odd to developers getting acquainted with its code base.
+                For example, when HPCC was initially developed, most common-place
+                libraries we have today (like STL and Boost) weren't available or stable
+                enough at the time.
+            </para>
+            <para>
+                Also, at the beginning, both C++ and Java were being considered as
+                the language of choice, but development started with C++. So a C++
+                library that copied most behaviour of the Java standard library (At the
+                time, Java 1.4) was created (see jlib below) to make the transition, if
+                ever taken, easier. The transition never happened, but the decisions
+                were taken and the whole platform is designed on those terms.
+            </para>
+            <para>
+                Most importantly, the performance constraints in HPCC can make
+                no-brainer decisions look impossible in HPCC. One example is the use of
+                traditional smart pointers implementations (such as boost::shared_ptr or
+                C++'s auto_ptr), that can lead to up to 20% performance hit if used
+                instead of our internal shared pointer implementation.
+            </para>
+            <para>
+                The last important point to consider is that some
+                libraries/systems were designed to replace older ones but haven't got
+                replaced yet. There is a slow movement to deprecate old systems in
+                favour of consolidating a few ones as the elected official ways to use
+                HPCC (Thor, Roxie) but old systems still could be used for years in
+                tests or legacy sub-systems.
+            </para>
+            <para>
+                In a nutshell, expect re-implementation of well-known containers
+                and algorithms, expect duplicated functionality of sub-systems and
+                expect to be required to use less-friendly libraries for the sake of
+                performance, stability and longevity.
+            </para>
+            <para>
+                For the most part out coding style conventions match those
+                described at http://geosoft.no/development/cppstyle.html, with a few
+                exceptions or extensions as noted below.
             </para>
             <sect2>
                 <title>Source files</title>
@@ -255,21 +379,186 @@
                 </para>
             </sect2>
             <sect2>
+                <title>Java-style</title>
+                <para>
+                    We adopted a Java-like inheritance model, with macro
+                    substitution for the basic Java keywords. This changes nothing on the
+                    code, but make it clearer for the reader on what's the recipient of
+                    the inheritance doing with it's base.
+                </para>
+                <para>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                interface (struct): declares an interface (pure virtual class)
+                            </para>
+                        </listitem>
+
+                        <listitem>
+                            <para>
+                                extends (public): One interface extending another, both are pure virtual
+                            </para>
+                        </listitem>
+
+                        <listitem>
+                            <para>
+                                implements (public): Concrete class implementing an interface
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                </para>
+                <para>
+                    There is no semantic check, which makes it difficult to enforce
+                    such scheme, which has led to code not using it intermixed with code
+                    using it. You should use it when possible, most importantly on code
+                    that already uses it.
+                </para>
+                <para>
+                    We also tend to write methods inline, which matches well with
+                    C++ Templates requirements. We, however, do not enforce the
+                    one-class-per-file rule.
+                </para>
+                <para>
+                    See chapter 3.2 for more information on our implementation of
+                    interfaces.
+                </para>
+            </sect2>
+            <sect2>
                 <title>Identifiers</title>
                 <para>
-                    We generally follow the java conventions for identifier naming and formatting.
+                    Class and interface names are in CamelCase with a leading
+                    capital letter. Interface names should be prefixed capital I followed
+                    by another capital. Class names may be prefixed with a C if there is a
+                    corresponding I-prefixed interface name, but need not be
+                    otherwise.
                 </para>
-                
                 <para>
-                    Class and interface names are in CamelCase with a leading capital letter. 
-                    Interface names should be prefixed capital I followed by another capital.
-                    Class names may be prefixed with a C if there is a corresponding I-prefixed
-                    interface name, but need not be otherwise.
+                    Variables, function and method names, and parameters use
+                    camelCase starting with a lower case letter. Parameters may be
+                    prefixed with underscore, normally when overwritten by local
+                    variables.
+                </para>
+                <para>Example:</para>
+                <para>
+                  <programlisting>    class MySQLSuperClass {
+        void mySQLFunctionIsCool(int _haslocalcopy, bool enablewrite) {
+        bool haslocalcopy = false;
+            if (enablewrite)
+                haslocalcopy = _haslocalcopy;
+        }
+    };
+                  </programlisting>
+                </para>
+            </sect2>
+            <sect2>
+                <title>Pointers</title>
+                <para>
+                    Use real pointers when you can, and smart pointers when you have
+                    to. Take extra care on understanding the needs of your pointers and
+                    their scope. Most programs can afford a few dangling pointers, but a
+                    high-performance clustering platform cannot.
+                </para>
+                <para>
+                    Most importantly, use common sense and a lot of thought. Here
+                    are a few guidelines:
+                </para>
+                <para>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Use real pointers for return values, parameter passing
+                            </para>
+                        </listitem>
+                        <listitem>
+                          <para>
+                              For local variables use real pointers if their lifetime is
+                              guaranteed to be longer than the function (and no exception
+                              is thrown from functions you call), shared pointers otherwise.
+                          </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Use Shared pointers for member variables - unless there is
+                                a strong guarantee the object has a longer lifetime.
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Create Shared&lt;&gt; with either:
+                            </para>
+                            <itemizedlist>
+                                <listitem>
+                                    <para>
+                                        Owned&lt;&gt;: if your new pointer will own the
+                                        pointer alone (transfer)
+                                    </para>
+                                </listitem>
+                                <listitem>
+                                    <para>
+                                        Linked&lt;&gt;: if you still want to share the
+                                        ownership (shared)
+                                    </para>
+                                </listitem>
+                            </itemizedlist>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Consider whether your code is critical and use
+                                link/release when necessary
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                </para>
+                <para>
+                    Warning: Direct manipulation of the ownership might
+                    cause Shared&lt;&gt; pointers to lose the pointers, so subsequent
+                    calls to it (like o2-&gt;doIt() after o3 gets ownership) *will* cause
+                    segmentation faults.
+                  </para>
+                <para>
+                    Refer to chapter 5.3 for more information on our smart pointer
+                    implementation, Shared&lt;&gt;.
+                </para>
+                <para>
+                    Methods that return Shared&lt;&gt; pointers, or that use them,
+                    should have a common naming standard.
                 </para>
-
                 <para>
-                    Variables, function and method names, and parameters use camelCase starting with a
-                    lower case letter. Parameters may be prefixed with underscore.
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Foo * queryFoo(): does not return a linked pointer since
+                                lifetime is guaranteed for a set period. Caller should link if it
+                                needs to retain it for longer.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Foo * getFoo(): returned values is linked - should be
+                                assigned to an owned, or returned directly.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                void setFoo(Foo * x): generally parameters to functions are
+                                assumed to not be linked, the callee needs to link them if they
+                                are retained.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                void setownFoo(Foo * ownedX): Some functions do take
+                                pointers that are linked - where you are implicitly transferring
+                                ownership.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
                 </para>
             </sect2>
             <sect2>
@@ -338,8 +627,112 @@
                 abstract class with no data members and all functions pure virtual can be used
                 in the same way.
             </para>
+            <para>
+                Interfaces are pure virtual classes. They are similar concepts to
+                Java's interfaces and should be used on public APIs. If you need common
+                code, use policies (see below).
+            </para>
+            <para>
+                An interface's name must start with an 'I' and the base class for
+                its concrete implementations should start with a 'C' and have the same
+                name, ex:
+            </para>
+            <programlisting>    CFoo : implements IFoo { };</programlisting>
+            <para>
+                When an interface has multiple implementations, try to stay as
+                close as possible from this rule. Ex:
+            </para>
+            <programlisting>    CFooCool : implements IFoo { };
+    CFooWarm : implements IFoo { };
+    CFooALot : implements IFoo { };
+            </programlisting>
+            <para>
+                Or, for partial implementation, use something like this:
+            </para>
+            <programlisting>    CFoo : implements IFoo { };
+    CFooCool : public CFoo { };
+    CFooWarm : public CFoo { };
+            </programlisting>
+            <para>
+                Extend current interfaces only on a 'is-a' approach, not to
+                aggregate functionality. Avoid pollution of public interfaces by having
+                only the public methods on the most-base interface in the header, and
+                internal implementation in the source file. Prefer pImpl idiom
+                (pointer-to-implementation) for functionality-only requirements and
+                policy based design for interface requirements.
+            </para>
+            <para>
+                Example 1: You want to decouple part of the implementation from
+                your class, and this part does not implements the interface your
+                contract requires.
+            </para>
+            <programlisting>    interface IFoo {
+        virtual void foo()=0;
+    };
+    class CFoo : implements IFoo {
+        MyImpl *pImpl;
+    public:
+        void foo() { pImpl-&gt;doSomething(); }
+    };
+            </programlisting>
+            <para>
+                Example2: You want to implement the common part of one (or more)
+                interface(s) in a range of sub-classes.
+            </para>
+            <programlisting>    interface ICommon {
+        virtual void common()=0;
+    };
+    interface IFoo : extends ICommon {
+        virtual void foo()=0;
+    };
+    interface IBar : extends ICommon {
+        virtual void bar()=0;
+    };
+
+    template &lt;class IFACE&gt;
+    class Base : implements IFACE {
+        void common() { ... };
+    }; // Still virtual
+
+    class CFoo : Base&lt;IFoo&gt; {
+        void foo() { 1+1; };
+    };
+    class CBar : Base&lt;IBar&gt; {
+        void bar() { 2+2; };
+    };
+            </programlisting>
+        </sect1>
+        <sect1>
+            <title>Reference counted objects</title>
+            <para>
+                Shared&lt;&gt; is an in-house smart pointer implementation. It's
+                close to boost's intrusive_ptr. It has two derived implementations:
+                Linked and Owned, which are used to control whether the pointer is
+                linked when a shared pointer is created from a real pointer or not,
+                respectively. Ex:
+            </para>
+            <programlisting>    Owned&lt;Foo&gt; = new Foo; // Owns the pointers
+    Linked&lt;Foo&gt; = myFooParmeter; // Shared ownership
+            </programlisting>
+            <para>
+                Shared&lt;&gt; is thread-safe and uses atomic reference count
+                handled by each object (rather than by the smart pointer itself, like
+                boost's shared_ptr).
+            </para>
+            <para>
+                This means that, to use Shared&lt;&gt;, your class must implement
+                the IInterface interface, most commonly by extending the CInterface
+                class (and using the IMPLEMENT_IINTERFACE macro in the public section of
+                your class declaration).
+            </para>
+            <para>
+                This interface controls how you Link() and Release() the pointer.
+                This is necessary because in some inner parts of HPCC, the use of a
+                "really smart" smart pointer would add too many links and releases (on
+                temporaries, local variables, members, etc) that could add to a
+                significant performance hit.
+            </para>
         </sect1>
-        <sect1><title>Reference counted objects</title><para/></sect1>
         <sect1><title>STL</title><para/></sect1>
     </chapter>