Browse Source

Doc additions: testing, style, patterns

Adding a section on how to test HPCC: unit-tests and regression tests.
Also adding description on code style and design decisions, as well
as describing the design patterns of interfaces and smart pointers.
Renato Golin 13 years ago
parent
commit
aa9eb3a541
1 changed files with 407 additions and 14 deletions
  1. 407 14
      sourcedoc.xml

+ 407 - 14
sourcedoc.xml

@@ -193,9 +193,96 @@
                 Ubuntu). If the operating system is not one of the above, or is not recognized,
                 make package will create a tarball.
             </para>
+            <para>
+                The package installation does not start the service on the machine, so if you
+                want to give it a go or test it (see below), make sure to start the service manually
+                and wait until all services are up (mainly wait for EclWatch to come up on port 8010).
+            </para>
         </sect1>
         <sect1>
-            <title>Debugging a system</title>
+            <title>Testing the system</title>
+            <para>
+                After compiling, installing the package and starting the services, you can test
+                the HPCC platform on a single-node setup.
+            </para>
+            <sect2>
+                <title>Unit Tests</title>
+                <para>
+                    Some components have their own unit-tests. Once you have compiled (no need to
+                    start the services), you can already run them. Supposing you build a Debug
+                    version, from the build directory you can run:
+                    <programlisting>./Debug/bin/roxie -selftest</programlisting>
+                    and
+                    <programlisting>./Debug/bin/eclagent -selftest</programlisting>
+                </para>
+                <para>
+                    You can also run the Dali regression self-tests:
+                    <programlisting>./Debug/bin/daregress localhost</programlisting>
+                </para>
+            </sect2>
+            <sect2>
+                <title>Regression Tests</title>
+                <para>
+                    After the initial batch of unit-tests, which are quick and show only the most
+                    basic errors in the system, you can run the more complete regressions' test.
+                    These tests are located in the source directory 'testing/ecl' and you'll need
+                    the HPCC platform up and running to execute them.
+                </para>
+                <para>
+                    Step 1: Configure your regression suites. This only need to be done once.
+                    <programlisting>./runregress -ini=environment.xml</programlisting>
+                    The file 'environment.xml' is normally located in your '/etc/HPCCPlatform'
+                    directory and contains information on how your cluster is set-up, so the
+                    regression engine can reach it. You should see a new file, 'regress.ini'.
+                    Edit it to accommodate to your preferred setup.
+                </para>
+                    Note: There is a current issue with Roxie tests, so you should comment out
+                    the 'roxie' from 'setup_clusters'. That will leave you about 650 tests to run.
+                </para>
+                <para>
+                    Note 2: There is another issue with eclplus having to live in the current
+                    testing directory. For now, you have to copy or symlink 'eclplus' into that
+                    directory. You can get it from your build directory.
+                </para>
+                <para>
+                    Step 2: Create test files. You'll need some files created as part of the
+                    tests. You should also need this to be run once, too, unless you have cleaned
+                    the files for any reason.
+                    <programlisting>./runregress -setup</programlisting>
+                    There is no reason for this to fail, you should get all queries executed
+                    successfully.
+                </para>
+                <para>
+                    Step 3: Run the regression tests. This takes about 5-10 minutes on a machine
+                    with multiple CPUs/cores. There is an optimum value on the number of parallel
+                    queries, not necessarily more is faster. Start with 50 and work your way up
+                    and down to a better number for your machine.
+                    <programlisting>./runregress -pq 50 hthor_suite</programlisting>
+                    If some of the queries gets locked, CTRL+C them won't help. You need to abort
+                    them from the EclWatch interface, or restart the service.
+                </para>
+                <para>
+                    If, after it finishes, you want to see the report again, just run:
+                    <programlisting>./runregress -n -report Summary hthor_suite</programlisting>
+                </para>
+                <para>
+                    If you want to re-run a simgle test, just run:
+                    <programlisting>./runregress -n -query anytest.ecl hthor_suite</programlisting>
+                </para>
+                <para>
+                    All test results and their expected files are in the suite's directory (like
+                    hthor_suite), on 'out' and 'key' respectively.
+                </para>
+            </sect2>
+            <sect2>
+                <title>Compiler Tests</title>
+                <para>
+                    TODO: Describe compiler tests at 'ecl/regress'.
+                </para>
+            </sect2>
+        </sect1>
+        <sect1>
+            <title>Debugging the system</title>
             <para>
                 On linux systems, the makefile generated by cmake will build a specific
                 version (debug or release) of the system depending on the options selected 
@@ -234,9 +321,46 @@
         <sect1>
             <title>C++ coding conventions</title>
             <para>
-                For the most part out coding style conventions match those described at 
-                http://geosoft.no/development/cppstyle.html, with a few exceptions or extensions as 
-                noted below.
+                Unlike most software projects around, HPCC has some very specific
+                constraints that makes most basic design decisions difficult, and often
+                the results are odd to developers getting acquainted with its code base.
+                For example, when HPCC was initially developed, most common-place
+                libraries we have today (like STL and Boost) weren't available or stable
+                enough at the time.
+            </para>
+            <para>
+                Also, at the beginning, both C++ and Java were being considered as
+                the language of choice, but development started with C++. So a C++
+                library that copied most behaviour of the Java standard library (At the
+                time, Java 1.4) was created (see jlib below) to make the transition, if
+                ever taken, easier. The transition never happened, but the decisions
+                were taken and the whole platform is designed on those terms.
+            </para>
+            <para>
+                Most importantly, the performance constraints in HPCC can make
+                no-brainer decisions look impossible in HPCC. One example is the use of
+                traditional smart pointers implementations (such as boost::shared_ptr or
+                C++'s auto_ptr), that can lead to up to 20% performance hit if used
+                instead of our internal shared pointer implementation.
+            </para>
+            <para>
+                The last important point to consider is that some
+                libraries/systems were designed to replace older ones but haven't got
+                replaced yet. There is a slow movement to deprecate old systems in
+                favour of consolidating a few ones as the elected official ways to use
+                HPCC (Thor, Roxie) but old systems still could be used for years in
+                tests or legacy sub-systems.
+            </para>
+            <para>
+                In a nutshell, expect re-implementation of well-known containers
+                and algorithms, expect duplicated functionality of sub-systems and
+                expect to be required to use less-friendly libraries for the sake of
+                performance, stability and longevity.
+            </para>
+            <para>
+                For the most part out coding style conventions match those
+                described at http://geosoft.no/development/cppstyle.html, with a few
+                exceptions or extensions as noted below.
             </para>
             <sect2>
                 <title>Source files</title>
@@ -255,21 +379,186 @@
                 </para>
             </sect2>
             <sect2>
+                <title>Java-style</title>
+                <para>
+                    We adopted a Java-like inheritance model, with macro
+                    substitution for the basic Java keywords. This changes nothing on the
+                    code, but make it clearer for the reader on what's the recipient of
+                    the inheritance doing with it's base.
+                </para>
+                <para>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                interface (struct): declares an interface (pure virtual class)
+                            </para>
+                        </listitem>
+
+                        <listitem>
+                            <para>
+                                extends (public): One interface extending another, both are pure virtual
+                            </para>
+                        </listitem>
+
+                        <listitem>
+                            <para>
+                                implements (public): Concrete class implementing an interface
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                </para>
+                <para>
+                    There is no semantic check, which makes it difficult to enforce
+                    such scheme, which has led to code not using it intermixed with code
+                    using it. You should use it when possible, most importantly on code
+                    that already uses it.
+                </para>
+                <para>
+                    We also tend to write methods inline, which matches well with
+                    C++ Templates requirements. We, however, do not enforce the
+                    one-class-per-file rule.
+                </para>
+                <para>
+                    See chapter 3.2 for more information on our implementation of
+                    interfaces.
+                </para>
+            </sect2>
+            <sect2>
                 <title>Identifiers</title>
                 <para>
-                    We generally follow the java conventions for identifier naming and formatting.
+                    Class and interface names are in CamelCase with a leading
+                    capital letter. Interface names should be prefixed capital I followed
+                    by another capital. Class names may be prefixed with a C if there is a
+                    corresponding I-prefixed interface name, but need not be
+                    otherwise.
                 </para>
-                
                 <para>
-                    Class and interface names are in CamelCase with a leading capital letter. 
-                    Interface names should be prefixed capital I followed by another capital.
-                    Class names may be prefixed with a C if there is a corresponding I-prefixed
-                    interface name, but need not be otherwise.
+                    Variables, function and method names, and parameters use
+                    camelCase starting with a lower case letter. Parameters may be
+                    prefixed with underscore, normally when overwritten by local
+                    variables.
+                </para>
+                <para>Example:</para>
+                <para>
+                  <programlisting>    class MySQLSuperClass {
+        void mySQLFunctionIsCool(int _haslocalcopy, bool enablewrite) {
+        bool haslocalcopy = false;
+            if (enablewrite)
+                haslocalcopy = _haslocalcopy;
+        }
+    };
+                  </programlisting>
+                </para>
+            </sect2>
+            <sect2>
+                <title>Pointers</title>
+                <para>
+                    Use real pointers when you can, and smart pointers when you have
+                    to. Take extra care on understanding the needs of your pointers and
+                    their scope. Most programs can afford a few dangling pointers, but a
+                    high-performance clustering platform cannot.
+                </para>
+                <para>
+                    Most importantly, use common sense and a lot of thought. Here
+                    are a few guidelines:
+                </para>
+                <para>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Use real pointers for return values, parameter passing
+                            </para>
+                        </listitem>
+                        <listitem>
+                          <para>
+                              For local variables use real pointers if their lifetime is
+                              guaranteed to be longer than the function (and no exception
+                              is thrown from functions you call), shared pointers otherwise.
+                          </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Use Shared pointers for member variables - unless there is
+                                a strong guarantee the object has a longer lifetime.
+                            </para>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Create Shared&lt;&gt; with either:
+                            </para>
+                            <itemizedlist>
+                                <listitem>
+                                    <para>
+                                        Owned&lt;&gt;: if your new pointer will own the
+                                        pointer alone (transfer)
+                                    </para>
+                                </listitem>
+                                <listitem>
+                                    <para>
+                                        Linked&lt;&gt;: if you still want to share the
+                                        ownership (shared)
+                                    </para>
+                                </listitem>
+                            </itemizedlist>
+                        </listitem>
+                        <listitem>
+                            <para>
+                                Consider whether your code is critical and use
+                                link/release when necessary
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                </para>
+                <para>
+                    Warning: Direct manipulation of the ownership might
+                    cause Shared&lt;&gt; pointers to lose the pointers, so subsequent
+                    calls to it (like o2-&gt;doIt() after o3 gets ownership) *will* cause
+                    segmentation faults.
+                  </para>
+                <para>
+                    Refer to chapter 5.3 for more information on our smart pointer
+                    implementation, Shared&lt;&gt;.
+                </para>
+                <para>
+                    Methods that return Shared&lt;&gt; pointers, or that use them,
+                    should have a common naming standard.
                 </para>
-
                 <para>
-                    Variables, function and method names, and parameters use camelCase starting with a
-                    lower case letter. Parameters may be prefixed with underscore.
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Foo * queryFoo(): does not return a linked pointer since
+                                lifetime is guaranteed for a set period. Caller should link if it
+                                needs to retain it for longer.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                Foo * getFoo(): returned values is linked - should be
+                                assigned to an owned, or returned directly.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                void setFoo(Foo * x): generally parameters to functions are
+                                assumed to not be linked, the callee needs to link them if they
+                                are retained.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
+                    <itemizedlist>
+                        <listitem>
+                            <para>
+                                void setownFoo(Foo * ownedX): Some functions do take
+                                pointers that are linked - where you are implicitly transferring
+                                ownership.
+                            </para>
+                        </listitem>
+                    </itemizedlist>
                 </para>
             </sect2>
             <sect2>
@@ -338,8 +627,112 @@
                 abstract class with no data members and all functions pure virtual can be used
                 in the same way.
             </para>
+            <para>
+                Interfaces are pure virtual classes. They are similar concepts to
+                Java's interfaces and should be used on public APIs. If you need common
+                code, use policies (see below).
+            </para>
+            <para>
+                An interface's name must start with an 'I' and the base class for
+                its concrete implementations should start with a 'C' and have the same
+                name, ex:
+            </para>
+            <programlisting>    CFoo : implements IFoo { };</programlisting>
+            <para>
+                When an interface has multiple implementations, try to stay as
+                close as possible from this rule. Ex:
+            </para>
+            <programlisting>    CFooCool : implements IFoo { };
+    CFooWarm : implements IFoo { };
+    CFooALot : implements IFoo { };
+            </programlisting>
+            <para>
+                Or, for partial implementation, use something like this:
+            </para>
+            <programlisting>    CFoo : implements IFoo { };
+    CFooCool : public CFoo { };
+    CFooWarm : public CFoo { };
+            </programlisting>
+            <para>
+                Extend current interfaces only on a 'is-a' approach, not to
+                aggregate functionality. Avoid pollution of public interfaces by having
+                only the public methods on the most-base interface in the header, and
+                internal implementation in the source file. Prefer pImpl idiom
+                (pointer-to-implementation) for functionality-only requirements and
+                policy based design for interface requirements.
+            </para>
+            <para>
+                Example 1: You want to decouple part of the implementation from
+                your class, and this part does not implements the interface your
+                contract requires.
+            </para>
+            <programlisting>    interface IFoo {
+        virtual void foo()=0;
+    };
+    class CFoo : implements IFoo {
+        MyImpl *pImpl;
+    public:
+        void foo() { pImpl-&gt;doSomething(); }
+    };
+            </programlisting>
+            <para>
+                Example2: You want to implement the common part of one (or more)
+                interface(s) in a range of sub-classes.
+            </para>
+            <programlisting>    interface ICommon {
+        virtual void common()=0;
+    };
+    interface IFoo : extends ICommon {
+        virtual void foo()=0;
+    };
+    interface IBar : extends ICommon {
+        virtual void bar()=0;
+    };
+
+    template &lt;class IFACE&gt;
+    class Base : implements IFACE {
+        void common() { ... };
+    }; // Still virtual
+
+    class CFoo : Base&lt;IFoo&gt; {
+        void foo() { 1+1; };
+    };
+    class CBar : Base&lt;IBar&gt; {
+        void bar() { 2+2; };
+    };
+            </programlisting>
+        </sect1>
+        <sect1>
+            <title>Reference counted objects</title>
+            <para>
+                Shared&lt;&gt; is an in-house smart pointer implementation. It's
+                close to boost's intrusive_ptr. It has two derived implementations:
+                Linked and Owned, which are used to control whether the pointer is
+                linked when a shared pointer is created from a real pointer or not,
+                respectively. Ex:
+            </para>
+            <programlisting>    Owned&lt;Foo&gt; = new Foo; // Owns the pointers
+    Linked&lt;Foo&gt; = myFooParmeter; // Shared ownership
+            </programlisting>
+            <para>
+                Shared&lt;&gt; is thread-safe and uses atomic reference count
+                handled by each object (rather than by the smart pointer itself, like
+                boost's shared_ptr).
+            </para>
+            <para>
+                This means that, to use Shared&lt;&gt;, your class must implement
+                the IInterface interface, most commonly by extending the CInterface
+                class (and using the IMPLEMENT_IINTERFACE macro in the public section of
+                your class declaration).
+            </para>
+            <para>
+                This interface controls how you Link() and Release() the pointer.
+                This is necessary because in some inner parts of HPCC, the use of a
+                "really smart" smart pointer would add too many links and releases (on
+                temporaries, local variables, members, etc) that could add to a
+                significant performance hit.
+            </para>
         </sect1>
-        <sect1><title>Reference counted objects</title><para/></sect1>
         <sect1><title>STL</title><para/></sect1>
     </chapter>