9 years ago · 4dd0628be8
--- a/ecl/eclcc/DOCUMENTATION.rst
+++ b/ecl/eclcc/DOCUMENTATION.rst
@@ -69,6 +69,129 @@ In practice the progression is not so clear cut.  There tends to be some overlap
 
				 different stages, and some of them may occur in slightly different orders.  However the order broadly
			
 
				 holds.
			
 
				 
			
 
				+*****************************
			
 
				+Working on the code generator
			
 
				+*****************************
			
 
				+
			
 
				+The regression suite
			
 
				+====================
			
 
				+
			
 
				+Before any change is accepted for the code generator it is always run against several regression suites to ensure that
			
 
				+it doesn't introduce any problems, and that the change has the desired effect.  There are several different regression suites:
			
 
				+
			
 
				+* testing/regress/ecl  - The run time regression suite.
			
 
				+* ecl/regress          - a compiler regression suite.  This contains tests that cannot run and error tests.
			
 
				+* LN private suite     - This contains a large selection (>10Gb) of archived queries.  The contain proprietary code so unfortunately cannot be released as open source.
			
 
				+
			
 
				+The ecl/regress directory contains a script 'regress.sh' that is used for running the regression tests.  It should be
			
 
				+executed in the directory containing the ecl files.  The script generates the c++ code (and workunits) for each of the source
			
 
				+files to a target directory, and then executes a comparison program to compare the new results with a previous "golden"
			
 
				+reference set.
			
 
				+
			
 
				+Before making any changes to the compiler, a reference set should be created by running the regression script and copying the
			
 
				+generated files to the reference directory.
			
 
				+
			
 
				+Here is a sample command line
			
 
				+
			
 
				+``~/dev/hpcc/ecl/regress/regress.sh -t /regress/hpcc -e /home/<user>/buildr/Release/bin/eclcc -I /home/<user>/dev/hpcc/ecl/regress/modules -I /home/<user>/dev/hpcc/plugins/javaembed -I /home/<user>/dev/hpcc/plugins/v8embed -c /regress/hpcc.master -d bcompare``
			
 
				+
			
 
				+(A version of this command resides in a shell script in each of my regression suite directories, with the -t and -c options adapted for each suite.)
			
 
				+
			
 
				+For a full list of options execute the script with no parameters, or take a look at the script itself. A couple of useful options are:
			
 
				+
			
 
				+* The script can be run on a single file by using the -q option.
			
 
				+
			
 
				+* The (-e) option selects the path of the eclcc.  This is particularly useful when running from the build
			
 
				+  directory (see below), or using multiple build directories to compare behaviour between different versions.
			
 
				+
			
 
				+We strongly recommend using a comparison program which allows rules to be defined to ignore certain differences (e.g., beyond compare).
			
 
				+
			
 
				+Running directly from the build directory
			
 
				+=========================================
			
 
				+
			
 
				+It is much quicker to run eclcc directly from the build directory, rather than deploying a system and running eclcc
			
 
				+from there.  To do this you need to configure some options that eclcc requires, e.g. where the include files are found.  The
			
 
				+options can be set by either setting environment variables or by specifiying options in an eclcc.ini
			
 
				+file.   The following are the names of the different options:
			
 
				+
			
 
				++-----------------------+-------------------+
			
 
				+| Environment flag      | Ini file option   |
			
 
				++=======================+===================+
			
 
				+| CL_PATH               | compilerPath      |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_LIBRARY_PATH    | libraryPath       |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_INCLUDE_PATH    | includePath       |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_PLUGIN_PATH     | plugins           |
			
 
				++-----------------------+-------------------+
			
 
				+| HPCC_FILEHOOKS_PATH   | filehooks         |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_TPL_PATH        | templatePath      |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_ECLLIBRARY_PATH | eclLibrariesPath  |
			
 
				++-----------------------+-------------------+
			
 
				+| ECLCC_ECLBUNDLE_PATH  | eclBundlesPath    |
			
 
				++-----------------------+-------------------+
			
 
				+
			
 
				+The eclcc.ini can either be a file in the local directory, or specified on the eclcc command line with -specs.
			
 
				+Including the settings in a local eclcc.ini file also it easy to debug eclcc directly from the build directory
			
 
				+within the eclipse environment.
			
 
				+
			
 
				+Hints and tips
			
 
				+==============
			
 
				+
			
 
				+* Logging
			
 
				+
			
 
				+  There is an option for eclcc to output a logging file, and another to specify the level of detail in that logging
			
 
				+  file.  If the detail level is above 500 then the expresssion tree for the query is output to the logging file after
			
 
				+  each of the code transformations.  The tracing is very useful for tracking down at which stage inconsistencies are
			
 
				+  introduced in the expression graph, and also for learning how each transformation affects the query.
			
 
				+
			
 
				+  The output format defaults to ECL - which is regenerated from the expression tree.  (This ECL cannot generally be
			
 
				+  compiled without editing - partly because it contains extra annoations.)   Use either of the following:
			
 
				+
			
 
				+  ``eclcc myfile.ecl --logfile myfile.log --logdetail 999``
			
 
				+
			
 
				+  ``regress.sh -q myfile.ecl -l myfile.log``
			
 
				+
			
 
				+* -ftraceIR
			
 
				+
			
 
				+  There is a debug option (-ftraceIR) that generates an intermediate representation of the expression graph rather than
			
 
				+  regenerating ECL.  The output tends to be less compact and harder to read quickly, but has the advantage of being
			
 
				+  better structured, and contains more details of the internal representation.  ecl/hql/hqlir.cpp contains
			
 
				+  more details of the format.
			
 
				+
			
 
				+* Adding extra logging into the source code
			
 
				+
			
 
				+  If you want to add tracing of expressions at any point in the code generation then adding either of the following
			
 
				+  calls will include the expression details in the log file:
			
 
				+
			
 
				+  ``dbglogExpr(expr); // regenerate the ecl for an expression.  See other functions in ecl/hql/hqlthql.hpp``
			
 
				+
			
 
				+  ``EclIR::dbglogIR(expr);  // regenerate the IR for an expression.  See other functions in ecl/hql/hqlir.hpp``
			
 
				+
			
 
				+* Logging while debugging
			
 
				+
			
 
				+  If you are debugging inside gdb it is often useful to be able to dump out details of an expression.  Calling
			
 
				+  EclIR:dump_ir(expr); will generate the IR to stdout.
			
 
				+
			
 
				+  ``p EclIR::dump_ir(expr)``
			
 
				+
			
 
				+  The function can also be used with multiple parameters.  Each expression will be dumped out, but common child nodes
			
 
				+  will only be generated once.  This can be very useful when trying to determine the difference between two expressions.
			
 
				+  The quickest way is to call ``EclIR::dump_ir(expr1, expr2)``.  The first difference between the expressions will
			
 
				+  be the expression that follows the first "return".
			
 
				+
			
 
				+* Expression sequence ids.
			
 
				+
			
 
				+  Sometimes it can be hard to determine where a particular IHqlExpression node was created.  If that is the case, then
			
 
				+  defining ``DEBUG_TRACK_INSTANCEID`` (in ecl/hql/hqlexpr.ipp) will add a unique sequence number to each IHqlExpression
			
 
				+  that is created.  There is also a function checkSeqId() at the start of ecl/hql/hqlexpr.cpp which is called whenever
			
 
				+  an expression is created, linked, released etc..  Setting a breakpoint in that function can allow you to trace back
			
 
				+  exactly when and why a particular node was created.
			
 
				+
			
 
				+
			
 
				 ***********
			
 
				 Expressions
			
 
				 ***********