PrG_CreateMaintain_Superfiles.xml 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="Creating_and_Maintaining_SuperFiles">
  5. <title>Creating and Maintaining SuperFiles</title>
  6. <sect2 id="Creating_Data">
  7. <title>Creating Data</title>
  8. <para>First, we need to create some logical files to put into a
  9. SuperFile.</para>
  10. <para>The following filenames for the new sub-files are declared in the
  11. DeclareData MODULE structure:</para>
  12. <programlisting>EXPORT BaseFile := '~PROGGUIDE::SUPERFILE::Base';
  13. EXPORT SubFile1 := '~PROGGUIDE::SUPERFILE::People1';
  14. EXPORT SubFile2 := '~PROGGUIDE::SUPERFILE::People2';
  15. EXPORT SubFile3 := '~PROGGUIDE::SUPERFILE::People3';
  16. EXPORT SubFile4 := '~PROGGUIDE::SUPERFILE::People4';
  17. EXPORT SubFile5 := '~PROGGUIDE::SUPERFILE::People5';
  18. EXPORT SubFile6 := '~PROGGUIDE::SUPERFILE::People6';
  19. </programlisting>
  20. <para>The following code (in SuperFile1.ECL) creates the files that we'll
  21. use to build SuperFiles:</para>
  22. <programlisting>IMPORT $;
  23. IMPORT Std;
  24. s1 := $.DeclareData.Person.File(firstname[1] = 'A');
  25. s2 := $.DeclareData.Person.File(firstname[1] BETWEEN 'B' AND 'C');
  26. s3 := $.DeclareData.Person.File(firstname[1] BETWEEN 'D' AND 'J');
  27. s4 := $.DeclareData.Person.File(firstname[1] BETWEEN 'K' AND 'N');
  28. s5 := $.DeclareData.Person.File(firstname[1] BETWEEN 'O' AND 'R');
  29. s6 := $.DeclareData.Person.File(firstname[1] BETWEEN 'S' AND 'Z');
  30. Rec := $.DeclareData.Layout_Person;
  31. IF(~Std.File.FileExists($.DeclareData.SubFile1),
  32. OUTPUT(s1,,$.DeclareData.SubFile1));
  33. IF(~Std.File.FileExists($.DeclareData.SubFile2),
  34. OUTPUT(s2,,$.DeclareData.SubFile2));
  35. IF(~Std.File.FileExists($.DeclareData.SubFile3),
  36. OUTPUT(s3,,$.DeclareData.SubFile3));
  37. IF(~Std.File.FileExists($.DeclareData.SubFile4),
  38. OUTPUT(s4,,$.DeclareData.SubFile4));
  39. IF(~Std.File.FileExists($.DeclareData.SubFile5),
  40. OUTPUT(s5,,$.DeclareData.SubFile5));
  41. IF(~Std.File.FileExists($.DeclareData.SubFile6),
  42. OUTPUT(s6,,$.DeclareData.SubFile6));
  43. </programlisting>
  44. <para>This code takes data from the ProgGuide.Person.File dataset (created
  45. by the code in GenData.ECL and declared in the ProgGuide MODULE structure
  46. attribute in the Default module) and writes six separate discrete samples
  47. to their own logical files, but only if they do not already exist. We'll
  48. use these logical files to build some SuperFiles.</para>
  49. </sect2>
  50. <sect2 id="A_Simple_Example">
  51. <title>A Simple Example</title>
  52. <para>We'll start with a simple example of how to create and use a
  53. SuperFile. This dataset declaration is in the ProgGuide MODULE structure
  54. (contained in the Default module). This declares the SuperFile as a
  55. DATASET that can be referenced in ECL code:</para>
  56. <programlisting>EXPORT SuperFile1 := DATASET(BaseFile,Layout_Person,FLAT);
  57. </programlisting>
  58. <para>Then we'll create and add sub-files to a SuperFile (this code is
  59. contained in SuperFile2.ECL):</para>
  60. <programlisting>IMPORT $;
  61. IMPORT Std;
  62. SEQUENTIAL(
  63. Std.File.CreateSuperFile($.DeclareData.BaseFile),
  64. Std.File.StartSuperFileTransaction(),
  65. Std.File.AddSuperFile($.DeclareData.BaseFile,$.DeclareData.SubFile1),
  66. Std.File.AddSuperFile($.DeclareData.BaseFile,$.DeclareData.SubFile2),
  67. Std.File.FinishSuperFileTransaction());
  68. </programlisting>
  69. <para>If the workunit failed with a “logical name
  70. progguide::superfile::base already exists” error message, then open the
  71. SuperFileRestart.ECL file and run it, then re-try the above code. Once
  72. you've successfully executed this code in a builder window, you've created
  73. the SuperFile and added two sub-files into it.</para>
  74. <para>The SuperFile1 DATASET declaration attribute makes the SuperFile
  75. available for use just as any other DATASET would be—this is the key to
  76. using SuperFiles. That means the following types of actions can be
  77. executed against the SuperFile, just as with any other dataset:</para>
  78. <programlisting>IMPORT $;
  79. COUNT($.DeclareData.SuperFile1(PersonID &lt;&gt; 0));
  80. OUTPUT($.DeclareData.SuperFile1);
  81. </programlisting>
  82. <para>Given the logical files previously built, the results of the COUNT
  83. should be 317,000. The filter condition will always be true, so the COUNT
  84. returned will be the total number of records in the SuperFile. The
  85. (PersonID &lt;&gt; 0) record filter is necessary so that the actual COUNT
  86. is performed each time and the result is not a shortcut value stored
  87. internally by the ECL Agent. Of course, the OUTPUT produces the first 100
  88. records in the SuperFile.</para>
  89. </sect2>
  90. <sect2 id="Nesting_SuperFiles">
  91. <title>Nesting SuperFiles</title>
  92. <para>Nesting SuperFiles (a SuperFile containing a sub-file that is itself
  93. another SuperFile) is a technique that allows new data coming in on a
  94. periodic basis (every day, or every hour, or ....) to be “instantly”
  95. available to the system. Since the ECL code that refers to a SuperFile
  96. always references the DATASET declaration, the only change necessary to
  97. make new data available to queries is to add the new data as a sub-file.
  98. Since adding a new sub-file always takes place within a SuperFile
  99. transaction, any queries are locked out while the update is in
  100. progress.</para>
  101. <para>Implicit in this technique is also the periodic roll up and
  102. consolidation of the new data into composite files. This is necessary
  103. because the practical limit to the number of physical files you should
  104. combine into a SuperFile is about one hundred (100), since every time you
  105. reference the SuperFile every sub-file must be physically opened and read
  106. from disk, and the more sub-files there are the more operating system
  107. resources are used just to get at the data.</para>
  108. <para>Therefore, you need to periodically run a process that physically
  109. combines all the incremental logical files and combines them into a single
  110. logical file that replaces them all. Periodic SuperFile data consolidation
  111. is a simple process of using OUTPUT to write the complete contents of the
  112. SuperFile to a new, single logical file. Once all data is in a single
  113. file, a SuperFile transaction can clear the old set of sub-files then add
  114. in the new “base” logical file.</para>
  115. </sect2>
  116. <sect2 id="Nested_SuperFile_Example">
  117. <title>Nested SuperFile Example</title>
  118. <para>Here is an example of how to nest SuperFiles. This example assumes
  119. you have new data coming every day. It also assumes you want to roll up
  120. the new data daily and weekly. The following filenames for the new
  121. sub-files are declared in the DeclareData MODULE structure
  122. attribute:</para>
  123. <programlisting>EXPORT AllPeople := '~PROGGUIDE::SUPERFILE::AllPeople';
  124. EXPORT WeeklyFile := '~PROGGUIDE::SUPERFILE::Weekly';
  125. EXPORT DailyFile := '~PROGGUIDE::SUPERFILE::Daily';
  126. </programlisting>
  127. <para>Creating three more SuperFiles has to be done just once, then you
  128. need to add the sub-files to them (this code is contained in
  129. SuperFile3.ECL):</para>
  130. <programlisting>IMPORT $;
  131. IMPORT Std;
  132. SEQUENTIAL(
  133. Std.File.CreateSuperFile($.DeclareData.AllPeople),
  134. Std.File.CreateSuperFile($.DeclareData.WeeklyFile),
  135. Std.File.CreateSuperFile($.DeclareData.DailyFile),
  136. Std.File.StartSuperFileTransaction(),
  137. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.BaseFile),
  138. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.WeeklyFile),
  139. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.DailyFile),
  140. Std.File.FinishSuperFileTransaction());
  141. </programlisting>
  142. <para>Now the AllPeople SuperFile contains the BaseFile, WeeklyFile, and
  143. DailyFile Superfiles as sub-files, creating a hierarchy of SuperFiles,
  144. only one of which yet contains any actual data. The Base SuperFile
  145. contains all the currently known data, as of the time of the build of the
  146. logical files. The Weekly and Daily SuperFiles will contain the interim
  147. data updates as they come in the door, precluding the need to rebuild the
  148. entire database every time a new set of data comes in.</para>
  149. <para>One important caveat to this scheme is that a given actual logical
  150. file (real data file) should be contained in exactly one of the nested
  151. SuperFiles at a time, otherwise you would have duplicate records in the
  152. base SuperFile. Therefore, you have to be careful how you maintain your
  153. hierarchy so as not to allow the same logical file to be referenced by
  154. more than one of the nested SuperFiles at once, outside of a transaction
  155. frame.</para>
  156. <para>As you get new logical files in during the day, you can add them to
  157. the Daily SuperFile like this (this code is contained in
  158. SuperFile4.ECL):</para>
  159. <programlisting>IMPORT $;
  160. IMPORT Std;
  161. SEQUENTIAL(
  162. Std.File.StartSuperFileTransaction(),
  163. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile3),
  164. Std.File.FinishSuperFileTransaction());
  165. </programlisting>
  166. <para>This appends the ProgGuide.SubFile3 logical file to the list of
  167. sub-files in the DailyFile SuperFile. This means that the very next query
  168. using the SuperFile1 dataset will be using the very latest
  169. up-to-the-minute data.</para>
  170. <para>This dataset declaration is in the DeclareData MODULE structure
  171. (contained in the Default module). This declares the nested SuperFile as a
  172. DATASET that can be referenced in ECL code:</para>
  173. <programlisting>EXPORT SuperFile2 := DATASET(AllPeople,Layout_Person,FLAT);</programlisting>
  174. <para>Execute the following action:</para>
  175. <programlisting>IMPORT ProgrammersGuide AS PG;
  176. COUNT(PG.DeclareData.SuperFile2(PersonID &lt;&gt; 0));
  177. </programlisting>
  178. <para>The result of the COUNT should now be 451,000.</para>
  179. <para>Edit the code from SuperFile4.ECL to add in ProgGuide.SubFile4, like
  180. this:</para>
  181. <programlisting>IMPORT $;
  182. IMPORT Std;
  183. SEQUENTIAL(
  184. Std.File.StartSuperFileTransaction(),
  185. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile4),
  186. Std.File.FinishSuperFileTransaction());
  187. </programlisting>
  188. <para>Re-running the above COUNT action should now result in
  189. 620,000.</para>
  190. <para>Once a day, you can roll all the sub-files up into the WeeklyFile
  191. and clear out the DailyFile for the next day's data ingest processing,
  192. like this (this code is contained in SuperFile5.ECL):</para>
  193. <programlisting>IMPORT $;
  194. IMPORT Std;
  195. SEQUENTIAL(
  196. Std.File.StartSuperFileTransaction(),
  197. Std.File.AddSuperFile($.DeclareData.WeeklyFile,$.DeclareData.DailyFile,,TRUE),
  198. Std.File.ClearSuperFile($.DeclareData.DailyFile),
  199. Std.File.FinishSuperFileTransaction());
  200. </programlisting>
  201. <para>This moves the references to all the sub-files from the DailyFile to
  202. the WeeklyFile (the fourth parameter to the AddSuperFile function being
  203. TRUE copies the references from one SuperFile to another), then clears out
  204. the DailyFile.</para>
  205. </sect2>
  206. <sect2 id="Data_Consolidation">
  207. <title>Data Consolidation</title>
  208. <para>Since the practical limit to the number of logical files you should
  209. combine into a SuperFile is about a hundred, you'll need to periodically
  210. run a process that physically combines all the incremental logical files
  211. and combines them into a single logical file that replaces them all, like
  212. this:</para>
  213. <programlisting>IMPORT $;
  214. IMPORT Std;
  215. OUTPUT($.DeclareData.SuperFile2,,'~$.DeclareData::SUPERFILE::People14',OVERWRITE);
  216. </programlisting>
  217. <para>This will write a new file containing all the records from all the
  218. sub-files in the SuperFile.</para>
  219. <para>Once you've done that, you'll need to clear all the component
  220. SuperFiles and add the new all-the-data-there-is data file into the
  221. BaseFile, like this (this code is contained in SuperFile6.ECL):</para>
  222. <programlisting>IMPORT $;
  223. IMPORT Std;
  224. SEQUENTIAL(
  225. Std.File.StartSuperFileTransaction(),
  226. Std.File.ClearSuperFile($.DeclareData.BaseFile),
  227. Std.File.ClearSuperFile($.DeclareData.WeeklyFile),
  228. Std.File.ClearSuperFile($.DeclareData.DailyFile),
  229. Std.File.AddSuperFile($.DeclareData.BaseFile,'~$.DeclareData::SUPERFILE::People14'),
  230. Std.File.FinishSuperFileTransaction());
  231. </programlisting>
  232. <para>This action clears out the Base SuperFile, adds the reference to the
  233. new all-inclusive logical file, then clears all the incremental
  234. SuperFiles.</para>
  235. <para>Re-running the above COUNT action should still result in
  236. 620,000.</para>
  237. <para>Once again, edit the code from SuperFile4.ECL to add
  238. ProgGuide.SubFile5 and ProgGuide.SubFile6 to the DailyFile, like
  239. this:</para>
  240. <programlisting>IMPORT $;
  241. IMPORT Std;
  242. SEQUENTIAL(
  243. Std.File.StartSuperFileTransaction(),
  244. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile5),
  245. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile6),
  246. Std.File.FinishSuperFileTransaction());
  247. </programlisting>
  248. <para>Once you've done that, re-running the above COUNT action should now
  249. result in 1,000,000.</para>
  250. </sect2>
  251. <sect2 id="Getting_SuperFile_Components">
  252. <title>Getting SuperFile Components</title>
  253. <para>This macro (in the DeclareData MODULE structure attribute)
  254. demonstrates one technique to list the component sub-files of a
  255. SuperFile:</para>
  256. <programlisting>IMPORT STD;
  257. EXPORT MAC_ListSFsubfiles(SuperFile) := MACRO
  258. #UNIQUENAME(SeedRec)
  259. %SeedRec% := DATASET([{''}], {STRING name});
  260. #UNIQUENAME(Xform)
  261. TYPEOF(%SeedRec%) %Xform%(%SeedRec% L, INTEGER C) :=
  262. TRANSFORM
  263. SELF.name :=
  264. Std.File.GetSuperFileSubName(SuperFile,C);
  265. END;
  266. OUTPUT(NORMALIZE(%SeedRec%,
  267. Std.File.GetSuperFileSubCount(SuperFile),
  268. %Xform%(LEFT,COUNTER)));
  269. ENDMACRO;
  270. </programlisting>
  271. <para>The interesting technique here is the use of NORMALIZE to call the
  272. TRANSFORM function iteratively until all sub-files in the SuperFile are
  273. listed. You can call this macro in a builder window like this (this code
  274. is contained in SuperFile7.ECL):</para>
  275. <programlisting>IMPORT $;
  276. IMPORT Std;
  277. $.DeclareData.MAC_ListSFsubfiles($.DeclareData.AllPeople);</programlisting>
  278. <para>This will return a list of all the sub-files in the specified
  279. SuperFile. However, this type of code is no longer necessary, since the
  280. default mode of the SuperFileContents() function now returns exactly the
  281. same result, like this:</para>
  282. <programlisting>IMPORT $;
  283. IMPORT Std;
  284. OUTPUT(Std.File.SuperFileContents($.DeclareData.AllPeople));</programlisting>
  285. <para>The SuperFileContents() function has an advantage over the macro—it
  286. has an option to return the sub-files from any nested SuperFile (which the
  287. macro can't do). That form looks like this:</para>
  288. <programlisting>IMPORT $;
  289. IMPORT Std;
  290. OUTPUT(Std.File.SuperFileContents($.DeclareData.AllPeople,TRUE));</programlisting>
  291. </sect2>
  292. </sect1>