PrG_File_layout_resolution.xml 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="File_layout_resolution">
  5. <title>File Layout Resolution at Compile Time</title>
  6. <para>When reading a disk file in ECL, the layout of the file is specified
  7. in the ECL code. This allows the code to be compiled to access the data very
  8. efficiently, but can cause issues if the file on disk is actually using a
  9. different layout.</para>
  10. <para>In particular, it can present a challenge to the version control
  11. process, if you have ECL queries that are being changed to add
  12. functionality, but which need to be applied without modification to data
  13. files whose layout is changing on a different timeline.</para>
  14. <para>There has been a partial solution to this dilemma available in Roxie
  15. for index files--the ability to apply runtime translation from the fields in
  16. the physical index file to the fields specified in the index. However, that
  17. has significant potential overhead and is not available for flat files or on
  18. Thor. This feature supports flat files and Thor files.</para>
  19. <para>A new feature, added in the HPCC Systems 6.4.0 release, allows file
  20. resolution to be performed at compile time, which provides the following
  21. advantages:</para>
  22. <itemizedlist>
  23. <listitem>
  24. <para>Code changes can be insulated from file layout changes - you only
  25. need to declare the fields you actually want to use from a
  26. datafile.</para>
  27. </listitem>
  28. </itemizedlist>
  29. <itemizedlist>
  30. <listitem>
  31. <para>File layout mismatches can be detected sooner.</para>
  32. </listitem>
  33. </itemizedlist>
  34. <itemizedlist>
  35. <listitem>
  36. <para>The compiler can use information about file sizes to guide code
  37. optimization decisions.</para>
  38. </listitem>
  39. </itemizedlist>
  40. <para>There are two language constructs associated with this feature:</para>
  41. <itemizedlist>
  42. <listitem>
  43. <para>Using a LOOKUP attribute on DATASET or INDEX declarations.</para>
  44. </listitem>
  45. </itemizedlist>
  46. <itemizedlist>
  47. <listitem>
  48. <para>Using a LOOKUP attribute in a RECORDOF function.</para>
  49. </listitem>
  50. </itemizedlist>
  51. <sect2 id="Using_LOOKUP_on_DATASET">
  52. <title>Using LOOKUP on a DATASET</title>
  53. <para>Adding the LOOKUP attribute to a DATASET declaration indicates that
  54. the file layout should be looked up at compile time:</para>
  55. <programlisting>myrecord := RECORD
  56. STRING field1;
  57. STRING field2;
  58. END;
  59. f := DATASET(‘myfilename’, myrecord, FLAT);
  60. // This will fail at runtime if file layout does not match myrecord
  61. f := DATASET(‘myfilename’, myrecord, FLAT, LOOKUP);
  62. // This will automatically project from the actual to the requested layout
  63. </programlisting>
  64. <para>If we assume that the actual layout of the file on disk is:</para>
  65. <programlisting>myactualrecord := RECORD
  66. STRING field1;
  67. STRING field2;
  68. STRING field3;
  69. END;</programlisting>
  70. <?hard-pagebreak ?>
  71. <para>Then the effect of the LOOKUP attribute will be as if your code
  72. was:</para>
  73. <programlisting>actualfile := DATASET(‘myfilename’, myactualrecord, FLAT);
  74. f := PROJECT(actualfile, TRANSFORM(myrecord, SELF := LEFT; SELF := []));
  75. </programlisting>
  76. <para>Fields that are present in both record structures are assigned
  77. across, fields that are present only in the disk version are dropped and
  78. fields that are present only in the ECL version receive their default
  79. value (a warning will be issued in this latter case).</para>
  80. <para>There is also a compiler directive that can be used to specify
  81. translation for all files:</para>
  82. <para><programlisting>#OPTION('translateDFSlayouts',TRUE);</programlisting></para>
  83. <para>The LOOKUP attribute accepts a parameter (TRUE or FALSE) to allow
  84. easier control of where and when you want translation to occur. Any
  85. Boolean expression that can be evaluated at compile time can be
  86. supplied.</para>
  87. <para>When using the #OPTION for <emphasis>translateDFSlayouts</emphasis>,
  88. you may want to use LOOKUP(FALSE) to override the default on some specific
  89. datasets.</para>
  90. </sect2>
  91. <sect2 id="Using_LOOKUP_in_RECORDOF">
  92. <title>Using LOOKUP in a RECORDOF function</title>
  93. <para>Using a LOOKUP attribute in a RECORDOF function is useful when
  94. fields were present in the original and later dropped or when you want to
  95. write to a file that matches the layout of an existing file, but you don't
  96. know the layout.</para>
  97. <para>The LOOKUP attribute in the RECORDOF function takes a filename
  98. rather than a dataset. The result is expanded at compile time to the
  99. record layout stored in the named file’s metadata. There are several forms
  100. of this construct:</para>
  101. <para><programlisting>RECORDOF(‘myfile’, LOOKUP);
  102. RECORDOF(‘myfile', defaultstructure, LOOKUP);
  103. RECORDOF(‘myfile’, defaultstructure, LOOKUP, OPT);</programlisting>You can
  104. also specify a DATASET as the first parameter instead of a filename (a
  105. syntactic convenience) and the filename specified on the dataset will be
  106. used for the lookup.</para>
  107. <para>The <emphasis>defaultstructure</emphasis> is useful for situations
  108. where the file layout information may not be available (for example, when
  109. syntax-checking locally or creating an archive). It is also useful when
  110. the file being looked up may not exist--this is where OPT should be
  111. used.</para>
  112. <para>The compiler checks that the actual record structure retrieved from
  113. the distributed file system lookup contains all the fields specified, and
  114. that the types are compatible.</para>
  115. <para>For example, to read a file whose structure is unknown other than
  116. that it contains an ID field, and create an output file containing all
  117. records that matched a supplied value, you could write:</para>
  118. <para><programlisting>myfile := DATASET(‘myinputfile’, RECORDOF(‘myinputfile’, { STRING id },
  119. LOOKUP), FLAT);
  120. filtered := myfile(id=‘123’);
  121. OUTPUT(filtered,,’myfilteredfile’);</programlisting></para>
  122. </sect2>
  123. <sect2 id="LOOKUP-Additional_Details">
  124. <title>Additional Details</title>
  125. <itemizedlist>
  126. <listitem>
  127. <para>The syntax is designed so that it is not necessary to perform
  128. file resolution to be able to syntax-check or create archives. This is
  129. important for local-repository mode to work.</para>
  130. </listitem>
  131. <listitem>
  132. <para>Foreign file resolution works the same way - just use the
  133. standard filename syntax for foreign filename resolution.</para>
  134. </listitem>
  135. <listitem>
  136. <para>You can also use the LOOKUP attribute on INDEX declarations as
  137. well as DATASET.</para>
  138. </listitem>
  139. <listitem>
  140. <para>When using the RECORDOF form and supplying a default layout, you
  141. may need to use the =&gt; form of the record layout syntax to specify
  142. both keyed and payload fields in the same record.</para>
  143. </listitem>
  144. <listitem>
  145. <para>Files that have been sprayed rather than created by ECL jobs may
  146. not have record information (metadata) available in the distributed
  147. file system.</para>
  148. </listitem>
  149. <listitem>
  150. <para>There are some new parameters to eclcc that can be used if you
  151. want to use this functionality for local compiles:</para>
  152. <para><informaltable colsep="1" frame="all" rowsep="1">
  153. <tgroup cols="2">
  154. <colspec align="left" colwidth="125.55pt" />
  155. <colspec />
  156. <tbody>
  157. <row>
  158. <entry>-dfs=ip</entry>
  159. <entry>Use specified Dali IP for filename
  160. resolution.</entry>
  161. </row>
  162. <row>
  163. <entry>-scope=prefix</entry>
  164. <entry>Use specified scope prefix in filename
  165. resolution.</entry>
  166. </row>
  167. <row>
  168. <entry>-user=id</entry>
  169. <entry>Use specified username in filename
  170. resolution.</entry>
  171. </row>
  172. <row>
  173. <entry>password=xxx</entry>
  174. <entry>Use specified password in filename resolution (Leave
  175. blank to prompt)</entry>
  176. </row>
  177. </tbody>
  178. </tgroup>
  179. </informaltable></para>
  180. </listitem>
  181. </itemizedlist>
  182. </sect2>
  183. </sect1>