File Layout Resolution at Compile Time When reading a disk file in ECL, the layout of the file is specified in the ECL code. This allows the code to be compiled to access the data very efficiently, but can cause issues if the file on disk is actually using a different layout. In particular, it can present a challenge to the version control process, if you have ECL queries that are being changed to add functionality, but which need to be applied without modification to data files whose layout is changing on a different timeline. There has been a partial solution to this dilemma available in Roxie for index files--the ability to apply runtime translation from the fields in the physical index file to the fields specified in the index. However, that has significant potential overhead and is not available for flat files or on Thor. This feature supports flat files and Thor files. A new feature, added in the HPCC Systems 6.4.0 release, allows file resolution to be performed at compile time, which provides the following advantages: Code changes can be insulated from file layout changes - you only need to declare the fields you actually want to use from a datafile. File layout mismatches can be detected sooner. The compiler can use information about file sizes to guide code optimization decisions. There are two language constructs associated with this feature: Using a LOOKUP attribute on DATASET or INDEX declarations. Using a LOOKUP attribute in a RECORDOF function. Using LOOKUP on a DATASET Adding the LOOKUP attribute to a DATASET declaration indicates that the file layout should be looked up at compile time: myrecord := RECORD STRING field1; STRING field2; END; f := DATASET(‘myfilename’, myrecord, FLAT); // This will fail at runtime if file layout does not match myrecord f := DATASET(‘myfilename’, myrecord, FLAT, LOOKUP); // This will automatically project from the actual to the requested layout If we assume that the actual layout of the file on disk is: myactualrecord := RECORD STRING field1; STRING field2; STRING field3; END; Then the effect of the LOOKUP attribute will be as if your code was: actualfile := DATASET(‘myfilename’, myactualrecord, FLAT); f := PROJECT(actualfile, TRANSFORM(myrecord, SELF := LEFT; SELF := [])); Fields that are present in both record structures are assigned across, fields that are present only in the disk version are dropped and fields that are present only in the ECL version receive their default value (a warning will be issued in this latter case). There is also a compiler directive that can be used to specify translation for all files: #OPTION('translateDFSlayouts',TRUE); The LOOKUP attribute accepts a parameter (TRUE or FALSE) to allow easier control of where and when you want translation to occur. Any Boolean expression that can be evaluated at compile time can be supplied. When using the #OPTION for translateDFSlayouts, you may want to use LOOKUP(FALSE) to override the default on some specific datasets. Using LOOKUP in a RECORDOF function Using a LOOKUP attribute in a RECORDOF function is useful when fields were present in the original and later dropped or when you want to write to a file that matches the layout of an existing file, but you don't know the layout. The LOOKUP attribute in the RECORDOF function takes a filename rather than a dataset. The result is expanded at compile time to the record layout stored in the named file’s metadata. There are several forms of this construct: RECORDOF(‘myfile’, LOOKUP); RECORDOF(‘myfile', defaultstructure, LOOKUP); RECORDOF(‘myfile’, defaultstructure, LOOKUP, OPT);You can also specify a DATASET as the first parameter instead of a filename (a syntactic convenience) and the filename specified on the dataset will be used for the lookup. The defaultstructure is useful for situations where the file layout information may not be available (for example, when syntax-checking locally or creating an archive). It is also useful when the file being looked up may not exist--this is where OPT should be used. The compiler checks that the actual record structure retrieved from the distributed file system lookup contains all the fields specified, and that the types are compatible. For example, to read a file whose structure is unknown other than that it contains an ID field, and create an output file containing all records that matched a supplied value, you could write: myfile := DATASET(‘myinputfile’, RECORDOF(‘myinputfile’, { STRING id }, LOOKUP), FLAT); filtered := myfile(id=‘123’); OUTPUT(filtered,,’myfilteredfile’); Additional Details The syntax is designed so that it is not necessary to perform file resolution to be able to syntax-check or create archives. This is important for local-repository mode to work. Foreign file resolution works the same way - just use the standard filename syntax for foreign filename resolution. You can also use the LOOKUP attribute on INDEX declarations as well as DATASET. When using the RECORDOF form and supplying a default layout, you may need to use the => form of the record layout syntax to specify both keyed and payload fields in the same record. Files that have been sprayed rather than created by ECL jobs may not have record information (metadata) available in the distributed file system. There are some new parameters to eclcc that can be used if you want to use this functionality for local compiles: -dfs=ip Use specified Dali IP for filename resolution. -scope=prefix Use specified scope prefix in filename resolution. -user=id Use specified username in filename resolution. password=xxx Use specified password in filename resolution (Leave blank to prompt)