|
@@ -193,9 +193,96 @@
|
|
|
Ubuntu). If the operating system is not one of the above, or is not recognized,
|
|
|
make package will create a tarball.
|
|
|
</para>
|
|
|
+ <para>
|
|
|
+ The package installation does not start the service on the machine, so if you
|
|
|
+ want to give it a go or test it (see below), make sure to start the service manually
|
|
|
+ and wait until all services are up (mainly wait for EclWatch to come up on port 8010).
|
|
|
+ </para>
|
|
|
</sect1>
|
|
|
<sect1>
|
|
|
- <title>Debugging a system</title>
|
|
|
+ <title>Testing the system</title>
|
|
|
+ <para>
|
|
|
+ After compiling, installing the package and starting the services, you can test
|
|
|
+ the HPCC platform on a single-node setup.
|
|
|
+ </para>
|
|
|
+ <sect2>
|
|
|
+ <title>Unit Tests</title>
|
|
|
+ <para>
|
|
|
+ Some components have their own unit-tests. Once you have compiled (no need to
|
|
|
+ start the services), you can already run them. Supposing you build a Debug
|
|
|
+ version, from the build directory you can run:
|
|
|
+ <programlisting>./Debug/bin/roxie -selftest</programlisting>
|
|
|
+ and
|
|
|
+ <programlisting>./Debug/bin/eclagent -selftest</programlisting>
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ You can also run the Dali regression self-tests:
|
|
|
+ <programlisting>./Debug/bin/daregress localhost</programlisting>
|
|
|
+ </para>
|
|
|
+ </sect2>
|
|
|
+ <sect2>
|
|
|
+ <title>Regression Tests</title>
|
|
|
+ <para>
|
|
|
+ After the initial batch of unit-tests, which are quick and show only the most
|
|
|
+ basic errors in the system, you can run the more complete regressions' test.
|
|
|
+ These tests are located in the source directory 'testing/ecl' and you'll need
|
|
|
+ the HPCC platform up and running to execute them.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Step 1: Configure your regression suites. This only need to be done once.
|
|
|
+ <programlisting>./runregress -ini=environment.xml</programlisting>
|
|
|
+ The file 'environment.xml' is normally located in your '/etc/HPCCPlatform'
|
|
|
+ directory and contains information on how your cluster is set-up, so the
|
|
|
+ regression engine can reach it. You should see a new file, 'regress.ini'.
|
|
|
+ Edit it to accommodate to your preferred setup.
|
|
|
+ </para>
|
|
|
+ Note: There is a current issue with Roxie tests, so you should comment out
|
|
|
+ the 'roxie' from 'setup_clusters'. That will leave you about 650 tests to run.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Note 2: There is another issue with eclplus having to live in the current
|
|
|
+ testing directory. For now, you have to copy or symlink 'eclplus' into that
|
|
|
+ directory. You can get it from your build directory.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Step 2: Create test files. You'll need some files created as part of the
|
|
|
+ tests. You should also need this to be run once, too, unless you have cleaned
|
|
|
+ the files for any reason.
|
|
|
+ <programlisting>./runregress -setup</programlisting>
|
|
|
+ There is no reason for this to fail, you should get all queries executed
|
|
|
+ successfully.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Step 3: Run the regression tests. This takes about 5-10 minutes on a machine
|
|
|
+ with multiple CPUs/cores. There is an optimum value on the number of parallel
|
|
|
+ queries, not necessarily more is faster. Start with 50 and work your way up
|
|
|
+ and down to a better number for your machine.
|
|
|
+ <programlisting>./runregress -pq 50 hthor_suite</programlisting>
|
|
|
+ If some of the queries gets locked, CTRL+C them won't help. You need to abort
|
|
|
+ them from the EclWatch interface, or restart the service.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ If, after it finishes, you want to see the report again, just run:
|
|
|
+ <programlisting>./runregress -n -report Summary hthor_suite</programlisting>
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ If you want to re-run a simgle test, just run:
|
|
|
+ <programlisting>./runregress -n -query anytest.ecl hthor_suite</programlisting>
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ All test results and their expected files are in the suite's directory (like
|
|
|
+ hthor_suite), on 'out' and 'key' respectively.
|
|
|
+ </para>
|
|
|
+ </sect2>
|
|
|
+ <sect2>
|
|
|
+ <title>Compiler Tests</title>
|
|
|
+ <para>
|
|
|
+ TODO: Describe compiler tests at 'ecl/regress'.
|
|
|
+ </para>
|
|
|
+ </sect2>
|
|
|
+ </sect1>
|
|
|
+ <sect1>
|
|
|
+ <title>Debugging the system</title>
|
|
|
<para>
|
|
|
On linux systems, the makefile generated by cmake will build a specific
|
|
|
version (debug or release) of the system depending on the options selected
|
|
@@ -234,9 +321,46 @@
|
|
|
<sect1>
|
|
|
<title>C++ coding conventions</title>
|
|
|
<para>
|
|
|
- For the most part out coding style conventions match those described at
|
|
|
- http://geosoft.no/development/cppstyle.html, with a few exceptions or extensions as
|
|
|
- noted below.
|
|
|
+ Unlike most software projects around, HPCC has some very specific
|
|
|
+ constraints that makes most basic design decisions difficult, and often
|
|
|
+ the results are odd to developers getting acquainted with its code base.
|
|
|
+ For example, when HPCC was initially developed, most common-place
|
|
|
+ libraries we have today (like STL and Boost) weren't available or stable
|
|
|
+ enough at the time.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Also, at the beginning, both C++ and Java were being considered as
|
|
|
+ the language of choice, but development started with C++. So a C++
|
|
|
+ library that copied most behaviour of the Java standard library (At the
|
|
|
+ time, Java 1.4) was created (see jlib below) to make the transition, if
|
|
|
+ ever taken, easier. The transition never happened, but the decisions
|
|
|
+ were taken and the whole platform is designed on those terms.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Most importantly, the performance constraints in HPCC can make
|
|
|
+ no-brainer decisions look impossible in HPCC. One example is the use of
|
|
|
+ traditional smart pointers implementations (such as boost::shared_ptr or
|
|
|
+ C++'s auto_ptr), that can lead to up to 20% performance hit if used
|
|
|
+ instead of our internal shared pointer implementation.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ The last important point to consider is that some
|
|
|
+ libraries/systems were designed to replace older ones but haven't got
|
|
|
+ replaced yet. There is a slow movement to deprecate old systems in
|
|
|
+ favour of consolidating a few ones as the elected official ways to use
|
|
|
+ HPCC (Thor, Roxie) but old systems still could be used for years in
|
|
|
+ tests or legacy sub-systems.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ In a nutshell, expect re-implementation of well-known containers
|
|
|
+ and algorithms, expect duplicated functionality of sub-systems and
|
|
|
+ expect to be required to use less-friendly libraries for the sake of
|
|
|
+ performance, stability and longevity.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ For the most part out coding style conventions match those
|
|
|
+ described at http://geosoft.no/development/cppstyle.html, with a few
|
|
|
+ exceptions or extensions as noted below.
|
|
|
</para>
|
|
|
<sect2>
|
|
|
<title>Source files</title>
|
|
@@ -255,21 +379,186 @@
|
|
|
</para>
|
|
|
</sect2>
|
|
|
<sect2>
|
|
|
+ <title>Java-style</title>
|
|
|
+ <para>
|
|
|
+ We adopted a Java-like inheritance model, with macro
|
|
|
+ substitution for the basic Java keywords. This changes nothing on the
|
|
|
+ code, but make it clearer for the reader on what's the recipient of
|
|
|
+ the inheritance doing with it's base.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ interface (struct): declares an interface (pure virtual class)
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ extends (public): One interface extending another, both are pure virtual
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ implements (public): Concrete class implementing an interface
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ There is no semantic check, which makes it difficult to enforce
|
|
|
+ such scheme, which has led to code not using it intermixed with code
|
|
|
+ using it. You should use it when possible, most importantly on code
|
|
|
+ that already uses it.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ We also tend to write methods inline, which matches well with
|
|
|
+ C++ Templates requirements. We, however, do not enforce the
|
|
|
+ one-class-per-file rule.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ See chapter 3.2 for more information on our implementation of
|
|
|
+ interfaces.
|
|
|
+ </para>
|
|
|
+ </sect2>
|
|
|
+ <sect2>
|
|
|
<title>Identifiers</title>
|
|
|
<para>
|
|
|
- We generally follow the java conventions for identifier naming and formatting.
|
|
|
+ Class and interface names are in CamelCase with a leading
|
|
|
+ capital letter. Interface names should be prefixed capital I followed
|
|
|
+ by another capital. Class names may be prefixed with a C if there is a
|
|
|
+ corresponding I-prefixed interface name, but need not be
|
|
|
+ otherwise.
|
|
|
</para>
|
|
|
-
|
|
|
<para>
|
|
|
- Class and interface names are in CamelCase with a leading capital letter.
|
|
|
- Interface names should be prefixed capital I followed by another capital.
|
|
|
- Class names may be prefixed with a C if there is a corresponding I-prefixed
|
|
|
- interface name, but need not be otherwise.
|
|
|
+ Variables, function and method names, and parameters use
|
|
|
+ camelCase starting with a lower case letter. Parameters may be
|
|
|
+ prefixed with underscore, normally when overwritten by local
|
|
|
+ variables.
|
|
|
+ </para>
|
|
|
+ <para>Example:</para>
|
|
|
+ <para>
|
|
|
+ <programlisting> class MySQLSuperClass {
|
|
|
+ void mySQLFunctionIsCool(int _haslocalcopy, bool enablewrite) {
|
|
|
+ bool haslocalcopy = false;
|
|
|
+ if (enablewrite)
|
|
|
+ haslocalcopy = _haslocalcopy;
|
|
|
+ }
|
|
|
+ };
|
|
|
+ </programlisting>
|
|
|
+ </para>
|
|
|
+ </sect2>
|
|
|
+ <sect2>
|
|
|
+ <title>Pointers</title>
|
|
|
+ <para>
|
|
|
+ Use real pointers when you can, and smart pointers when you have
|
|
|
+ to. Take extra care on understanding the needs of your pointers and
|
|
|
+ their scope. Most programs can afford a few dangling pointers, but a
|
|
|
+ high-performance clustering platform cannot.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Most importantly, use common sense and a lot of thought. Here
|
|
|
+ are a few guidelines:
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Use real pointers for return values, parameter passing
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ For local variables use real pointers if their lifetime is
|
|
|
+ guaranteed to be longer than the function (and no exception
|
|
|
+ is thrown from functions you call), shared pointers otherwise.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Use Shared pointers for member variables - unless there is
|
|
|
+ a strong guarantee the object has a longer lifetime.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Create Shared<> with either:
|
|
|
+ </para>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Owned<>: if your new pointer will own the
|
|
|
+ pointer alone (transfer)
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Linked<>: if you still want to share the
|
|
|
+ ownership (shared)
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ </listitem>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Consider whether your code is critical and use
|
|
|
+ link/release when necessary
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Warning: Direct manipulation of the ownership might
|
|
|
+ cause Shared<> pointers to lose the pointers, so subsequent
|
|
|
+ calls to it (like o2->doIt() after o3 gets ownership) *will* cause
|
|
|
+ segmentation faults.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Refer to chapter 5.3 for more information on our smart pointer
|
|
|
+ implementation, Shared<>.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Methods that return Shared<> pointers, or that use them,
|
|
|
+ should have a common naming standard.
|
|
|
</para>
|
|
|
-
|
|
|
<para>
|
|
|
- Variables, function and method names, and parameters use camelCase starting with a
|
|
|
- lower case letter. Parameters may be prefixed with underscore.
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Foo * queryFoo(): does not return a linked pointer since
|
|
|
+ lifetime is guaranteed for a set period. Caller should link if it
|
|
|
+ needs to retain it for longer.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ Foo * getFoo(): returned values is linked - should be
|
|
|
+ assigned to an owned, or returned directly.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ void setFoo(Foo * x): generally parameters to functions are
|
|
|
+ assumed to not be linked, the callee needs to link them if they
|
|
|
+ are retained.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
+ <itemizedlist>
|
|
|
+ <listitem>
|
|
|
+ <para>
|
|
|
+ void setownFoo(Foo * ownedX): Some functions do take
|
|
|
+ pointers that are linked - where you are implicitly transferring
|
|
|
+ ownership.
|
|
|
+ </para>
|
|
|
+ </listitem>
|
|
|
+ </itemizedlist>
|
|
|
</para>
|
|
|
</sect2>
|
|
|
<sect2>
|
|
@@ -338,8 +627,112 @@
|
|
|
abstract class with no data members and all functions pure virtual can be used
|
|
|
in the same way.
|
|
|
</para>
|
|
|
+ <para>
|
|
|
+ Interfaces are pure virtual classes. They are similar concepts to
|
|
|
+ Java's interfaces and should be used on public APIs. If you need common
|
|
|
+ code, use policies (see below).
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ An interface's name must start with an 'I' and the base class for
|
|
|
+ its concrete implementations should start with a 'C' and have the same
|
|
|
+ name, ex:
|
|
|
+ </para>
|
|
|
+ <programlisting> CFoo : implements IFoo { };</programlisting>
|
|
|
+ <para>
|
|
|
+ When an interface has multiple implementations, try to stay as
|
|
|
+ close as possible from this rule. Ex:
|
|
|
+ </para>
|
|
|
+ <programlisting> CFooCool : implements IFoo { };
|
|
|
+ CFooWarm : implements IFoo { };
|
|
|
+ CFooALot : implements IFoo { };
|
|
|
+ </programlisting>
|
|
|
+ <para>
|
|
|
+ Or, for partial implementation, use something like this:
|
|
|
+ </para>
|
|
|
+ <programlisting> CFoo : implements IFoo { };
|
|
|
+ CFooCool : public CFoo { };
|
|
|
+ CFooWarm : public CFoo { };
|
|
|
+ </programlisting>
|
|
|
+ <para>
|
|
|
+ Extend current interfaces only on a 'is-a' approach, not to
|
|
|
+ aggregate functionality. Avoid pollution of public interfaces by having
|
|
|
+ only the public methods on the most-base interface in the header, and
|
|
|
+ internal implementation in the source file. Prefer pImpl idiom
|
|
|
+ (pointer-to-implementation) for functionality-only requirements and
|
|
|
+ policy based design for interface requirements.
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ Example 1: You want to decouple part of the implementation from
|
|
|
+ your class, and this part does not implements the interface your
|
|
|
+ contract requires.
|
|
|
+ </para>
|
|
|
+ <programlisting> interface IFoo {
|
|
|
+ virtual void foo()=0;
|
|
|
+ };
|
|
|
+ class CFoo : implements IFoo {
|
|
|
+ MyImpl *pImpl;
|
|
|
+ public:
|
|
|
+ void foo() { pImpl->doSomething(); }
|
|
|
+ };
|
|
|
+ </programlisting>
|
|
|
+ <para>
|
|
|
+ Example2: You want to implement the common part of one (or more)
|
|
|
+ interface(s) in a range of sub-classes.
|
|
|
+ </para>
|
|
|
+ <programlisting> interface ICommon {
|
|
|
+ virtual void common()=0;
|
|
|
+ };
|
|
|
+ interface IFoo : extends ICommon {
|
|
|
+ virtual void foo()=0;
|
|
|
+ };
|
|
|
+ interface IBar : extends ICommon {
|
|
|
+ virtual void bar()=0;
|
|
|
+ };
|
|
|
+
|
|
|
+ template <class IFACE>
|
|
|
+ class Base : implements IFACE {
|
|
|
+ void common() { ... };
|
|
|
+ }; // Still virtual
|
|
|
+
|
|
|
+ class CFoo : Base<IFoo> {
|
|
|
+ void foo() { 1+1; };
|
|
|
+ };
|
|
|
+ class CBar : Base<IBar> {
|
|
|
+ void bar() { 2+2; };
|
|
|
+ };
|
|
|
+ </programlisting>
|
|
|
+ </sect1>
|
|
|
+ <sect1>
|
|
|
+ <title>Reference counted objects</title>
|
|
|
+ <para>
|
|
|
+ Shared<> is an in-house smart pointer implementation. It's
|
|
|
+ close to boost's intrusive_ptr. It has two derived implementations:
|
|
|
+ Linked and Owned, which are used to control whether the pointer is
|
|
|
+ linked when a shared pointer is created from a real pointer or not,
|
|
|
+ respectively. Ex:
|
|
|
+ </para>
|
|
|
+ <programlisting> Owned<Foo> = new Foo; // Owns the pointers
|
|
|
+ Linked<Foo> = myFooParmeter; // Shared ownership
|
|
|
+ </programlisting>
|
|
|
+ <para>
|
|
|
+ Shared<> is thread-safe and uses atomic reference count
|
|
|
+ handled by each object (rather than by the smart pointer itself, like
|
|
|
+ boost's shared_ptr).
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ This means that, to use Shared<>, your class must implement
|
|
|
+ the IInterface interface, most commonly by extending the CInterface
|
|
|
+ class (and using the IMPLEMENT_IINTERFACE macro in the public section of
|
|
|
+ your class declaration).
|
|
|
+ </para>
|
|
|
+ <para>
|
|
|
+ This interface controls how you Link() and Release() the pointer.
|
|
|
+ This is necessary because in some inner parts of HPCC, the use of a
|
|
|
+ "really smart" smart pointer would add too many links and releases (on
|
|
|
+ temporaries, local variables, members, etc) that could add to a
|
|
|
+ significant performance hit.
|
|
|
+ </para>
|
|
|
</sect1>
|
|
|
- <sect1><title>Reference counted objects</title><para/></sect1>
|
|
|
<sect1><title>STL</title><para/></sect1>
|
|
|
</chapter>
|
|
|
|