PrG_CreateMaintain_Superfiles.xml 15 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!DOCTYPE sect1 PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
  3. "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd">
  4. <sect1 id="Creating_and_Maintaining_SuperFiles">
  5. <title><emphasis role="bold">Creating and Maintaining
  6. SuperFiles</emphasis></title>
  7. <sect2 id="Creating_Data">
  8. <title>Creating Data</title>
  9. <para>First, we need to create some logical files to put into a
  10. SuperFile.</para>
  11. <para>The following filenames for the new sub-files are declared in the
  12. DeclareData MODULE structure:</para>
  13. <programlisting>EXPORT BaseFile := '~PROGGUIDE::SUPERFILE::Base';
  14. EXPORT SubFile1 := '~PROGGUIDE::SUPERFILE::People1';
  15. EXPORT SubFile2 := '~PROGGUIDE::SUPERFILE::People2';
  16. EXPORT SubFile3 := '~PROGGUIDE::SUPERFILE::People3';
  17. EXPORT SubFile4 := '~PROGGUIDE::SUPERFILE::People4';
  18. EXPORT SubFile5 := '~PROGGUIDE::SUPERFILE::People5';
  19. EXPORT SubFile6 := '~PROGGUIDE::SUPERFILE::People6';
  20. </programlisting>
  21. <para>The following code (in SuperFile1.ECL) creates the files that we'll
  22. use to build SuperFiles:</para>
  23. <programlisting>IMPORT $;
  24. IMPORT Std;
  25. s1 := $.DeclareData.Person.File(firstname[1] = 'A');
  26. s2 := $.DeclareData.Person.File(firstname[1] BETWEEN 'B' AND 'C');
  27. s3 := $.DeclareData.Person.File(firstname[1] BETWEEN 'D' AND 'J');
  28. s4 := $.DeclareData.Person.File(firstname[1] BETWEEN 'K' AND 'N');
  29. s5 := $.DeclareData.Person.File(firstname[1] BETWEEN 'O' AND 'R');
  30. s6 := $.DeclareData.Person.File(firstname[1] BETWEEN 'S' AND 'Z');
  31. Rec := $.DeclareData.Layout_Person;
  32. IF(~Std.File.FileExists($.DeclareData.SubFile1),
  33. OUTPUT(s1,,$.DeclareData.SubFile1));
  34. IF(~Std.File.FileExists($.DeclareData.SubFile2),
  35. OUTPUT(s2,,$.DeclareData.SubFile2));
  36. IF(~Std.File.FileExists($.DeclareData.SubFile3),
  37. OUTPUT(s3,,$.DeclareData.SubFile3));
  38. IF(~Std.File.FileExists($.DeclareData.SubFile4),
  39. OUTPUT(s4,,$.DeclareData.SubFile4));
  40. IF(~Std.File.FileExists($.DeclareData.SubFile5),
  41. OUTPUT(s5,,$.DeclareData.SubFile5));
  42. IF(~Std.File.FileExists($.DeclareData.SubFile6),
  43. OUTPUT(s6,,$.DeclareData.SubFile6));
  44. </programlisting>
  45. <para>This code takes data from the ProgGuide.Person.File dataset (created
  46. by the code in GenData.ECL and declared in the ProgGuide MODULE structure
  47. attribute in the Default module) and writes six separate discrete samples
  48. to their own logical files, but only if they do not already exist. We'll
  49. use these logical files to build some SuperFiles.</para>
  50. </sect2>
  51. <sect2 id="A_Simple_Example">
  52. <title>A Simple Example</title>
  53. <para>We'll start with a simple example of how to create and use a
  54. SuperFile. This dataset declaration is in the ProgGuide MODULE structure
  55. (contained in the Default module). This declares the SuperFile as a
  56. DATASET that can be referenced in ECL code:</para>
  57. <programlisting>EXPORT SuperFile1 := DATASET(BaseFile,Layout_Person,FLAT);
  58. </programlisting>
  59. <para>Then we'll create and add sub-files to a SuperFile (this code is
  60. contained in SuperFile2.ECL):</para>
  61. <programlisting>IMPORT $;
  62. IMPORT Std;
  63. SEQUENTIAL(
  64. Std.File.CreateSuperFile($.DeclareData.BaseFile),
  65. Std.File.StartSuperFileTransaction(),
  66. Std.File.AddSuperFile($.DeclareData.BaseFile,$.DeclareData.SubFile1),
  67. Std.File.AddSuperFile($.DeclareData.BaseFile,$.DeclareData.SubFile2),
  68. Std.File.FinishSuperFileTransaction());
  69. </programlisting>
  70. <para>If the workunit failed with a “logical name
  71. progguide::superfile::base already exists” error message, then open the
  72. SuperFileRestart.ECL file and run it, then re-try the above code. Once
  73. you've successfully executed this code in a builder window, you've created
  74. the SuperFile and added two sub-files into it.</para>
  75. <para>The SuperFile1 DATASET declaration attribute makes the SuperFile
  76. available for use just as any other DATASET would be—this is the key to
  77. using SuperFiles. That means the following types of actions can be
  78. executed against the SuperFile, just as with any other dataset:</para>
  79. <programlisting>IMPORT $;
  80. COUNT($.DeclareData.SuperFile1(PersonID &lt;&gt; 0));
  81. OUTPUT($.DeclareData.SuperFile1);
  82. </programlisting>
  83. <para>Given the logical files previously built, the results of the COUNT
  84. should be 317,000. The filter condition will always be true, so the COUNT
  85. returned will be the total number of records in the SuperFile. The
  86. (PersonID &lt;&gt; 0) record filter is necessary so that the actual COUNT
  87. is performed each time and the result is not a shortcut value stored
  88. internally by the ECL Agent. Of course, the OUTPUT produces the first 100
  89. records in the SuperFile.</para>
  90. </sect2>
  91. <sect2 id="Nesting_SuperFiles">
  92. <title>Nesting SuperFiles</title>
  93. <para>Nesting SuperFiles (a SuperFile containing a sub-file that is itself
  94. another SuperFile) is a technique that allows new data coming in on a
  95. periodic basis (every day, or every hour, or ....) to be “instantly”
  96. available to the system. Since the ECL code that refers to a SuperFile
  97. always references the DATASET declaration, the only change necessary to
  98. make new data available to queries is to add the new data as a sub-file.
  99. Since adding a new sub-file always takes place within a SuperFile
  100. transaction, any queries are locked out while the update is in
  101. progress.</para>
  102. <para>Implicit in this technique is also the periodic roll up and
  103. consolidation of the new data into composite files. This is necessary
  104. because the practical limit to the number of physical files you should
  105. combine into a SuperFile is about one hundred (100), since every time you
  106. reference the SuperFile every sub-file must be physically opened and read
  107. from disk, and the more sub-files there are the more operating system
  108. resources are used just to get at the data.</para>
  109. <para>Therefore, you need to periodically run a process that physically
  110. combines all the incremental logical files and combines them into a single
  111. logical file that replaces them all. Periodic SuperFile data consolidation
  112. is a simple process of using OUTPUT to write the complete contents of the
  113. SuperFile to a new, single logical file. Once all data is in a single
  114. file, a SuperFile transaction can clear the old set of sub-files then add
  115. in the new “base” logical file.</para>
  116. </sect2>
  117. <sect2 id="Nested_SuperFile_Example">
  118. <title>Nested SuperFile Example</title>
  119. <para>Here is an example of how to nest SuperFiles. This example assumes
  120. you have new data coming every day. It also assumes you want to roll up
  121. the new data daily and weekly. The following filenames for the new
  122. sub-files are declared in the DeclareData MODULE structure
  123. attribute:</para>
  124. <programlisting>EXPORT AllPeople := '~PROGGUIDE::SUPERFILE::AllPeople';
  125. EXPORT WeeklyFile := '~PROGGUIDE::SUPERFILE::Weekly';
  126. EXPORT DailyFile := '~PROGGUIDE::SUPERFILE::Daily';
  127. </programlisting>
  128. <para>Creating three more SuperFiles has to be done just once, then you
  129. need to add the sub-files to them (this code is contained in
  130. SuperFile3.ECL):</para>
  131. <programlisting>IMPORT $;
  132. IMPORT Std;
  133. SEQUENTIAL(
  134. Std.File.CreateSuperFile($.DeclareData.AllPeople),
  135. Std.File.CreateSuperFile($.DeclareData.WeeklyFile),
  136. Std.File.CreateSuperFile($.DeclareData.DailyFile),
  137. Std.File.StartSuperFileTransaction(),
  138. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.BaseFile),
  139. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.WeeklyFile),
  140. Std.File.AddSuperFile($.DeclareData.AllPeople,$.DeclareData.DailyFile),
  141. Std.File.FinishSuperFileTransaction());
  142. </programlisting>
  143. <para>Now the AllPeople SuperFile contains the BaseFile, WeeklyFile, and
  144. DailyFile Superfiles as sub-files, creating a hierarchy of SuperFiles,
  145. only one of which yet contains any actual data. The Base SuperFile
  146. contains all the currently known data, as of the time of the build of the
  147. logical files. The Weekly and Daily SuperFiles will contain the interim
  148. data updates as they come in the door, precluding the need to rebuild the
  149. entire database every time a new set of data comes in.</para>
  150. <para>One important caveat to this scheme is that a given actual logical
  151. file (real data file) should be contained in exactly one of the nested
  152. SuperFiles at a time, otherwise you would have duplicate records in the
  153. base SuperFile. Therefore, you have to be careful how you maintain your
  154. hierarchy so as not to allow the same logical file to be referenced by
  155. more than one of the nested SuperFiles at once, outside of a transaction
  156. frame.</para>
  157. <para>As you get new logical files in during the day, you can add them to
  158. the Daily SuperFile like this (this code is contained in
  159. SuperFile4.ECL):</para>
  160. <programlisting>IMPORT $;
  161. IMPORT Std;
  162. SEQUENTIAL(
  163. Std.File.StartSuperFileTransaction(),
  164. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile3),
  165. Std.File.FinishSuperFileTransaction());
  166. </programlisting>
  167. <para>This appends the ProgGuide.SubFile3 logical file to the list of
  168. sub-files in the DailyFile SuperFile. This means that the very next query
  169. using the SuperFile1 dataset will be using the very latest
  170. up-to-the-minute data.</para>
  171. <para>This dataset declaration is in the DeclareData MODULE structure
  172. (contained in the Default module). This declares the nested SuperFile as a
  173. DATASET that can be referenced in ECL code:</para>
  174. <programlisting>EXPORT SuperFile2 := DATASET(AllPeople,Layout_Person,FLAT);</programlisting>
  175. <para>Execute the following action:</para>
  176. <programlisting>IMPORT ProgrammersGuide AS PG;
  177. COUNT(PG.DeclareData.SuperFile2(PersonID &lt;&gt; 0));
  178. </programlisting>
  179. <para>The result of the COUNT should now be 451,000.</para>
  180. <para>Edit the code from SuperFile4.ECL to add in ProgGuide.SubFile4, like
  181. this:</para>
  182. <programlisting>IMPORT $;
  183. IMPORT Std;
  184. SEQUENTIAL(
  185. Std.File.StartSuperFileTransaction(),
  186. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile4),
  187. Std.File.FinishSuperFileTransaction());
  188. </programlisting>
  189. <para>Re-running the above COUNT action should now result in
  190. 620,000.</para>
  191. <para>Once a day, you can roll all the sub-files up into the WeeklyFile
  192. and clear out the DailyFile for the next day's data ingest processing,
  193. like this (this code is contained in SuperFile5.ECL):</para>
  194. <programlisting>IMPORT $;
  195. IMPORT Std;
  196. SEQUENTIAL(
  197. Std.File.StartSuperFileTransaction(),
  198. Std.File.AddSuperFile($.DeclareData.WeeklyFile,$.DeclareData.DailyFile,,TRUE),
  199. Std.File.ClearSuperFile($.DeclareData.DailyFile),
  200. Std.File.FinishSuperFileTransaction());
  201. </programlisting>
  202. <para>This moves the references to all the sub-files from the DailyFile to
  203. the WeeklyFile (the fourth parameter to the AddSuperFile function being
  204. TRUE copies the references from one SuperFile to another), then clears out
  205. the DailyFile.</para>
  206. </sect2>
  207. <sect2 id="Data_Consolidation">
  208. <title>Data Consolidation</title>
  209. <para>Since the practical limit to the number of logical files you should
  210. combine into a SuperFile is about a hundred, you'll need to periodically
  211. run a process that physically combines all the incremental logical files
  212. and combines them into a single logical file that replaces them all, like
  213. this:</para>
  214. <programlisting>IMPORT $;
  215. IMPORT Std;
  216. OUTPUT($.DeclareData.SuperFile2,,'~$.DeclareData::SUPERFILE::People14',OVERWRITE);
  217. </programlisting>
  218. <para>This will write a new file containing all the records from all the
  219. sub-files in the SuperFile.</para>
  220. <para>Once you've done that, you'll need to clear all the component
  221. SuperFiles and add the new all-the-data-there-is data file into the
  222. BaseFile, like this (this code is contained in SuperFile6.ECL):</para>
  223. <programlisting>IMPORT $;
  224. IMPORT Std;
  225. SEQUENTIAL(
  226. Std.File.StartSuperFileTransaction(),
  227. Std.File.ClearSuperFile($.DeclareData.BaseFile),
  228. Std.File.ClearSuperFile($.DeclareData.WeeklyFile),
  229. Std.File.ClearSuperFile($.DeclareData.DailyFile),
  230. Std.File.AddSuperFile($.DeclareData.BaseFile,'~$.DeclareData::SUPERFILE::People14'),
  231. Std.File.FinishSuperFileTransaction());
  232. </programlisting>
  233. <para>This action clears out the Base SuperFile, adds the reference to the
  234. new all-inclusive logical file, then clears all the incremental
  235. SuperFiles.</para>
  236. <para>Re-running the above COUNT action should still result in
  237. 620,000.</para>
  238. <para>Once again, edit the code from SuperFile4.ECL to add
  239. ProgGuide.SubFile5 and ProgGuide.SubFile6 to the DailyFile, like
  240. this:</para>
  241. <programlisting>IMPORT $;
  242. IMPORT Std;
  243. SEQUENTIAL(
  244. Std.File.StartSuperFileTransaction(),
  245. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile5),
  246. Std.File.AddSuperFile($.DeclareData.DailyFile,$.DeclareData.SubFile6),
  247. Std.File.FinishSuperFileTransaction());
  248. </programlisting>
  249. <para>Once you've done that, re-running the above COUNT action should now
  250. result in 1,000,000.</para>
  251. </sect2>
  252. <sect2 id="Getting_SuperFile_Components">
  253. <title>Getting SuperFile Components</title>
  254. <para>This macro (in the DeclareData MODULE structure attribute)
  255. demonstrates one technique to list the component sub-files of a
  256. SuperFile:</para>
  257. <programlisting>IMPORT STD;
  258. EXPORT MAC_ListSFsubfiles(SuperFile) := MACRO
  259. #UNIQUENAME(SeedRec)
  260. %SeedRec% := DATASET([{''}], {STRING name});
  261. #UNIQUENAME(Xform)
  262. TYPEOF(%SeedRec%) %Xform%(%SeedRec% L, INTEGER C) :=
  263. TRANSFORM
  264. SELF.name :=
  265. Std.File.GetSuperFileSubName(SuperFile,C);
  266. END;
  267. OUTPUT(NORMALIZE(%SeedRec%,
  268. Std.File.GetSuperFileSubCount(SuperFile),
  269. %Xform%(LEFT,COUNTER)));
  270. ENDMACRO;
  271. </programlisting>
  272. <para>The interesting technique here is the use of NORMALIZE to call the
  273. TRANSFORM function iteratively until all sub-files in the SuperFile are
  274. listed. You can call this macro in a builder window like this (this code
  275. is contained in SuperFile7.ECL):</para>
  276. <programlisting>IMPORT $;
  277. IMPORT Std;
  278. $.DeclareData.MAC_ListSFsubfiles($.DeclareData.AllPeople);</programlisting>
  279. <para>This will return a list of all the sub-files in the specified
  280. SuperFile. However, this type of code is no longer necessary, since the
  281. default mode of the SuperFileContents() function now returns exactly the
  282. same result, like this:</para>
  283. <programlisting>IMPORT $;
  284. IMPORT Std;
  285. OUTPUT(Std.File.SuperFileContents($.DeclareData.AllPeople));</programlisting>
  286. <para>The SuperFileContents() function has an advantage over the macro—it
  287. has an option to return the sub-files from any nested SuperFile (which the
  288. macro can't do). That form looks like this:</para>
  289. <programlisting>IMPORT $;
  290. IMPORT Std;
  291. OUTPUT(Std.File.SuperFileContents($.DeclareData.AllPeople,TRUE));</programlisting>
  292. </sect2>
  293. </sect1>