|
@@ -0,0 +1,232 @@
|
|
|
|
+<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
+<!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
|
|
|
|
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
|
|
|
|
+<sect1 id="File_layout_resolution">
|
|
|
|
+ <title>File Layout Resolution at Compile Time</title>
|
|
|
|
+
|
|
|
|
+ <para>When reading a disk file in ECL, the layout of the file is specified
|
|
|
|
+ in the ECL code. This allows the code to be compiled to access the data very
|
|
|
|
+ efficiently, but can cause issues if the file on disk is actually using a
|
|
|
|
+ different layout.</para>
|
|
|
|
+
|
|
|
|
+ <para>In particular, it can present a challenge to the version control
|
|
|
|
+ process, if you have ECL queries that are being changed to add
|
|
|
|
+ functionality, but which need to be applied without modification to data
|
|
|
|
+ files whose layout is changing on a different timeline.</para>
|
|
|
|
+
|
|
|
|
+ <para>There has been a partial solution to this dilemma available in Roxie
|
|
|
|
+ for index files--the ability to apply runtime translation from the fields in
|
|
|
|
+ the physical index file to the fields specified in the index. However, that
|
|
|
|
+ has significant potential overhead and is not available for flat files or on
|
|
|
|
+ Thor. This feature supports flat files and Thor files.</para>
|
|
|
|
+
|
|
|
|
+ <para>A new feature, added in the HPCC Systems 6.4.0 release, allows file
|
|
|
|
+ resolution to be performed at compile time, which provides the following
|
|
|
|
+ advantages:</para>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>Code changes can be insulated from file layout changes - you only
|
|
|
|
+ need to declare the fields you actually want to use from a
|
|
|
|
+ datafile.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>File layout mismatches can be detected sooner.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>The compiler can use information about file sizes to guide code
|
|
|
|
+ optimization decisions.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+
|
|
|
|
+ <para>There are two language constructs associated with this feature:</para>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>Using a LOOKUP attribute on DATASET or INDEX declarations.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>Using a LOOKUP attribute in a RECORDOF function.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+
|
|
|
|
+ <sect2>
|
|
|
|
+ <title>Using LOOKUP on a DATASET</title>
|
|
|
|
+
|
|
|
|
+ <para>Adding the LOOKUP attribute to a DATASET declaration indicates that
|
|
|
|
+ the file layout should be looked up at compile time:</para>
|
|
|
|
+
|
|
|
|
+ <programlisting>myrecord := RECORD
|
|
|
|
+ STRING field1;
|
|
|
|
+ STRING field2;
|
|
|
|
+END;
|
|
|
|
+
|
|
|
|
+f := DATASET(‘myfilename’, myrecord, FLAT);
|
|
|
|
+ // This will fail at runtime if file layout does not match myrecord
|
|
|
|
+f := DATASET(‘myfilename’, myrecord, FLAT, LOOKUP);
|
|
|
|
+ // This will automatically project from the actual to the requested layout
|
|
|
|
+</programlisting>
|
|
|
|
+
|
|
|
|
+ <para>If we assume that the actual layout of the file on disk is:</para>
|
|
|
|
+
|
|
|
|
+ <programlisting>myactualrecord := RECORD
|
|
|
|
+ STRING field1;
|
|
|
|
+ STRING field2;
|
|
|
|
+ STRING field3;
|
|
|
|
+END;</programlisting>
|
|
|
|
+
|
|
|
|
+ <?hard-pagebreak ?>
|
|
|
|
+
|
|
|
|
+ <para>Then the effect of the LOOKUP attribute will be as if your code
|
|
|
|
+ was:</para>
|
|
|
|
+
|
|
|
|
+ <programlisting>actualfile := DATASET(‘myfilename’, myactualrecord, FLAT);
|
|
|
|
+f := PROJECT(actualfile, TRANSFORM(myrecord, SELF := LEFT; SELF := []));
|
|
|
|
+</programlisting>
|
|
|
|
+
|
|
|
|
+ <para>Fields that are present in both record structures are assigned
|
|
|
|
+ across, fields that are present only in the disk version are dropped and
|
|
|
|
+ fields that are present only in the ECL version receive their default
|
|
|
|
+ value (a warning will be issued in this latter case).</para>
|
|
|
|
+
|
|
|
|
+ <para>There is also a compiler directive that can be used to specify
|
|
|
|
+ translation for all files:</para>
|
|
|
|
+
|
|
|
|
+ <para><programlisting>#OPTION('translateDFSlayouts',TRUE);</programlisting></para>
|
|
|
|
+
|
|
|
|
+ <para>The LOOKUP attribute accepts a parameter (TRUE or FALSE) to allow
|
|
|
|
+ easier control of where and when you want translation to occur. Any
|
|
|
|
+ Boolean expression that can be evaluated at compile time can be
|
|
|
|
+ supplied.</para>
|
|
|
|
+
|
|
|
|
+ <para>When using the #OPTION for <emphasis>translateDFSlayouts</emphasis>,
|
|
|
|
+ you may want to use LOOKUP(FALSE) to override the default on some specific
|
|
|
|
+ datasets.</para>
|
|
|
|
+ </sect2>
|
|
|
|
+
|
|
|
|
+ <sect2>
|
|
|
|
+ <title>Using LOOKUP in a RECORDOF function</title>
|
|
|
|
+
|
|
|
|
+ <para>Using a LOOKUP attribute in a RECORDOF function is useful when
|
|
|
|
+ fields were present in the original and later dropped or when you want to
|
|
|
|
+ write to a file that matches the layout of an existing file, but you don't
|
|
|
|
+ know the layout.</para>
|
|
|
|
+
|
|
|
|
+ <para>The LOOKUP attribute in the RECORDOF function takes a filename
|
|
|
|
+ rather than a dataset. The result is expanded at compile time to the
|
|
|
|
+ record layout stored in the named file’s metadata. There are several forms
|
|
|
|
+ of this construct:</para>
|
|
|
|
+
|
|
|
|
+ <para><programlisting>RECORDOF(‘myfile’, LOOKUP);
|
|
|
|
+RECORDOF(‘myfile', defaultstructure, LOOKUP);
|
|
|
|
+RECORDOF(‘myfile’, defaultstructure, LOOKUP, OPT);</programlisting>You can
|
|
|
|
+ also specify a DATASET as the first parameter instead of a filename (a
|
|
|
|
+ syntactic convenience) and the filename specified on the dataset will be
|
|
|
|
+ used for the lookup.</para>
|
|
|
|
+
|
|
|
|
+ <para>The <emphasis>defaultstructure</emphasis> is useful for situations
|
|
|
|
+ where the file layout information may not be available (for example, when
|
|
|
|
+ syntax-checking locally or creating an archive). It is also useful when
|
|
|
|
+ the file being looked up may not exist--this is where OPT should be
|
|
|
|
+ used.</para>
|
|
|
|
+
|
|
|
|
+ <para>The compiler checks that the actual record structure retrieved from
|
|
|
|
+ the distributed file system lookup contains all the fields specified, and
|
|
|
|
+ that the types are compatible.</para>
|
|
|
|
+
|
|
|
|
+ <para>For example, to read a file whose structure is unknown other than
|
|
|
|
+ that it contains an ID field, and create an output file containing all
|
|
|
|
+ records that matched a supplied value, you could write:</para>
|
|
|
|
+
|
|
|
|
+ <para><programlisting>myfile := DATASET(‘myinputfile’, RECORDOF(‘myinputfile’, { STRING id },
|
|
|
|
+ LOOKUP), FLAT);
|
|
|
|
+filtered := myfile(id=‘123’);
|
|
|
|
+OUTPUT(filtered,,’myfilteredfile’);</programlisting></para>
|
|
|
|
+ </sect2>
|
|
|
|
+
|
|
|
|
+ <sect2>
|
|
|
|
+ <title>Additional Details</title>
|
|
|
|
+
|
|
|
|
+ <itemizedlist>
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>The syntax is designed so that it is not necessary to perform
|
|
|
|
+ file resolution to be able to syntax-check or create archives. This is
|
|
|
|
+ important for local-repository mode to work.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>Foreign file resolution works the same way - just use the
|
|
|
|
+ standard filename syntax for foreign filename resolution.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>You can also use the LOOKUP attribute on INDEX declarations as
|
|
|
|
+ well as DATASET.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>When using the RECORDOF form and supplying a default layout, you
|
|
|
|
+ may need to use the => form of the record layout syntax to specify
|
|
|
|
+ both keyed and payload fields in the same record.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>Files that have been sprayed rather than created by ECL jobs may
|
|
|
|
+ not have record information (metadata) available in the distributed
|
|
|
|
+ file system.</para>
|
|
|
|
+ </listitem>
|
|
|
|
+
|
|
|
|
+ <listitem>
|
|
|
|
+ <para>There are some new parameters to eclcc that can be used if you
|
|
|
|
+ want to use this functionality for local compiles:</para>
|
|
|
|
+
|
|
|
|
+ <para><informaltable colsep="1" frame="all" rowsep="1">
|
|
|
|
+ <tgroup cols="2">
|
|
|
|
+ <colspec align="left" colwidth="125.55pt" />
|
|
|
|
+
|
|
|
|
+ <colspec />
|
|
|
|
+
|
|
|
|
+ <tbody>
|
|
|
|
+ <row>
|
|
|
|
+ <entry>-dfs=ip</entry>
|
|
|
|
+
|
|
|
|
+ <entry>Use specified Dali IP for filename
|
|
|
|
+ resolution.</entry>
|
|
|
|
+ </row>
|
|
|
|
+
|
|
|
|
+ <row>
|
|
|
|
+ <entry>-scope=prefix</entry>
|
|
|
|
+
|
|
|
|
+ <entry>Use specified scope prefix in filename
|
|
|
|
+ resolution.</entry>
|
|
|
|
+ </row>
|
|
|
|
+
|
|
|
|
+ <row>
|
|
|
|
+ <entry>-user=id</entry>
|
|
|
|
+
|
|
|
|
+ <entry>Use specified username in filename
|
|
|
|
+ resolution.</entry>
|
|
|
|
+ </row>
|
|
|
|
+
|
|
|
|
+ <row>
|
|
|
|
+ <entry>password=xxx</entry>
|
|
|
|
+
|
|
|
|
+ <entry>Use specified password in filename resolution (Leave
|
|
|
|
+ blank to prompt)</entry>
|
|
|
|
+ </row>
|
|
|
|
+ </tbody>
|
|
|
|
+ </tgroup>
|
|
|
|
+ </informaltable></para>
|
|
|
|
+ </listitem>
|
|
|
|
+ </itemizedlist>
|
|
|
|
+ </sect2>
|
|
|
|
+</sect1>
|