File Layout Resolution at Compile Time
When reading a disk file in ECL, the layout of the file is specified
in the ECL code. This allows the code to be compiled to access the data very
efficiently, but can cause issues if the file on disk is actually using a
different layout.
In particular, it can present a challenge to the version control
process, if you have ECL queries that are being changed to add
functionality, but which need to be applied without modification to data
files whose layout is changing on a different timeline.
There has been a partial solution to this dilemma available in Roxie
for index files--the ability to apply runtime translation from the fields in
the physical index file to the fields specified in the index. However, that
has significant potential overhead and is not available for flat files or on
Thor. This feature supports flat files and Thor files.
A new feature, added in the HPCC Systems 6.4.0 release, allows file
resolution to be performed at compile time, which provides the following
advantages:
Code changes can be insulated from file layout changes - you only
need to declare the fields you actually want to use from a
datafile.
File layout mismatches can be detected sooner.
The compiler can use information about file sizes to guide code
optimization decisions.
There are two language constructs associated with this feature:
Using a LOOKUP attribute on DATASET or INDEX declarations.
Using a LOOKUP attribute in a RECORDOF function.
Using LOOKUP on a DATASET
Adding the LOOKUP attribute to a DATASET declaration indicates that
the file layout should be looked up at compile time:
myrecord := RECORD
STRING field1;
STRING field2;
END;
f := DATASET(‘myfilename’, myrecord, FLAT);
// This will fail at runtime if file layout does not match myrecord
f := DATASET(‘myfilename’, myrecord, FLAT, LOOKUP);
// This will automatically project from the actual to the requested layout
If we assume that the actual layout of the file on disk is:
myactualrecord := RECORD
STRING field1;
STRING field2;
STRING field3;
END;
Then the effect of the LOOKUP attribute will be as if your code
was:
actualfile := DATASET(‘myfilename’, myactualrecord, FLAT);
f := PROJECT(actualfile, TRANSFORM(myrecord, SELF := LEFT; SELF := []));
Fields that are present in both record structures are assigned
across, fields that are present only in the disk version are dropped and
fields that are present only in the ECL version receive their default
value (a warning will be issued in this latter case).
There is also a compiler directive that can be used to specify
translation for all files:
#OPTION('translateDFSlayouts',TRUE);
The LOOKUP attribute accepts a parameter (TRUE or FALSE) to allow
easier control of where and when you want translation to occur. Any
Boolean expression that can be evaluated at compile time can be
supplied.
When using the #OPTION for translateDFSlayouts,
you may want to use LOOKUP(FALSE) to override the default on some specific
datasets.
Using LOOKUP in a RECORDOF function
Using a LOOKUP attribute in a RECORDOF function is useful when
fields were present in the original and later dropped or when you want to
write to a file that matches the layout of an existing file, but you don't
know the layout.
The LOOKUP attribute in the RECORDOF function takes a filename
rather than a dataset. The result is expanded at compile time to the
record layout stored in the named file’s metadata. There are several forms
of this construct:
RECORDOF(‘myfile’, LOOKUP);
RECORDOF(‘myfile', defaultstructure, LOOKUP);
RECORDOF(‘myfile’, defaultstructure, LOOKUP, OPT);You can
also specify a DATASET as the first parameter instead of a filename (a
syntactic convenience) and the filename specified on the dataset will be
used for the lookup.
The defaultstructure is useful for situations
where the file layout information may not be available (for example, when
syntax-checking locally or creating an archive). It is also useful when
the file being looked up may not exist--this is where OPT should be
used.
The compiler checks that the actual record structure retrieved from
the distributed file system lookup contains all the fields specified, and
that the types are compatible.
For example, to read a file whose structure is unknown other than
that it contains an ID field, and create an output file containing all
records that matched a supplied value, you could write:
myfile := DATASET(‘myinputfile’, RECORDOF(‘myinputfile’, { STRING id },
LOOKUP), FLAT);
filtered := myfile(id=‘123’);
OUTPUT(filtered,,’myfilteredfile’);
Additional Details
The syntax is designed so that it is not necessary to perform
file resolution to be able to syntax-check or create archives. This is
important for local-repository mode to work.
Foreign file resolution works the same way - just use the
standard filename syntax for foreign filename resolution.
You can also use the LOOKUP attribute on INDEX declarations as
well as DATASET.
When using the RECORDOF form and supplying a default layout, you
may need to use the => form of the record layout syntax to specify
both keyed and payload fields in the same record.
Files that have been sprayed rather than created by ECL jobs may
not have record information (metadata) available in the distributed
file system.
There are some new parameters to eclcc that can be used if you
want to use this functionality for local compiles:
-dfs=ip
Use specified Dali IP for filename
resolution.
-scope=prefix
Use specified scope prefix in filename
resolution.
-user=id
Use specified username in filename
resolution.
password=xxx
Use specified password in filename resolution (Leave
blank to prompt)