segmentlib.dox 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410
  1. /*! \page segmentlib GRASS Segment Library
  2. <!-- doxygenized from "GRASS 5 Programmer's Manual"
  3. by M. Neteler 8/2005
  4. -->
  5. \author CERL
  6. \section segmentintro Introduction
  7. <P>
  8. Large data files which contain data in a matrix format often need to be
  9. accessed in a nonsequential or random manner. This requirement complicates
  10. the programming.
  11. <P>
  12. Methods for accessing the data are to:
  13. <P>
  14. (1) read the entire data file into memory and process the data as a
  15. two-dimensional matrix,
  16. <P>
  17. (2) perform direct access i/o to the data file for every data value to be
  18. accessed, or
  19. <P>
  20. (3) read only portions of the data file into memory as needed.
  21. <P>
  22. Method (1) greatly simplifies the programming effort since i/o is done once
  23. and data access is simple array referencing. However, it has the
  24. disadvantage that large amounts of memory may be required to hold the data.
  25. The memory may not be available, or if it is, system paging of the module
  26. may severely degrade performance. Method (2) is not much more complicated to
  27. code and requires no significant amount of memory to hold the data. But the
  28. i/o involved will certainly degrade performance. Method (3) is a mixture of
  29. (1) and (2) . Memory requirements are fixed and data is read from the data
  30. file only when not already in memory. However the programming is more
  31. complex.
  32. <P>
  33. The routines provided in this library are an implementation of method (3) .
  34. They are based on the idea that if the original matrix were segmented or
  35. partitioned into smaller matrices these segments could be managed to reduce
  36. both the memory required and the i/o. Data access along connected paths
  37. through the matrix, (i.e., moving up or down one row and left or right one
  38. column) should benefit.
  39. <P>
  40. In most applications, the original data is not in the segmented format. The
  41. data must be transformed from the nonsegmented format to the segmented
  42. format. This means reading the original data matrix row by row and writing
  43. each row to a new file with the segmentation organization. This step
  44. corresponds to the i/o step of method (1) .
  45. <P>
  46. Then data can be retrieved from the segment file through routines by
  47. specifying the row and column of the original matrix. Behind the scenes, the
  48. data is paged into memory as needed and the requested data is returned to
  49. the caller.
  50. \note All routines and global variables in this library, documented
  51. or undocumented, start with the prefix \c Segment_. To avoid name
  52. conflicts, programmers should not create variables or routines in their own
  53. modules which use this prefix.
  54. \section Segment_Routines Segment Routines
  55. <P>
  56. The routines in the <I>Segment Library</I> are described below, more or
  57. less in the order they would logically be used in a module. They use a data
  58. structure called SEGMENT which is defined in the header file
  59. \c grass/segment.h that must be included in any code using these
  60. routines:
  61. \code
  62. #include <grass/segment.h>
  63. \endcode
  64. \see \ref Loading_the_Segment_Library.
  65. <P>
  66. A temporary file needs to be prepared and a SEGMENT structure needs to
  67. be initialized before any data can be transferred to the segment file.
  68. This can be done with the <I>Segment Library</I> routine:
  69. <P>
  70. <I>int Segment_open(SEGMENT *SEG, char *fname, off_t nrows, off_t ncols,
  71. int srows, int scols, int len, int nseg)</I>, open a new segment structure.
  72. <P>
  73. A new file with full path name <B>fname</B> will be created and
  74. formatted. The original nonsegmented data matrix consists of
  75. <B>nrows</B> and <B>ncols</B>. The segments consist of <B>srows</B> by
  76. <B>scols</B>. The data items have length <B>len</B> bytes. The number
  77. of segments to be retained in memory is given by <B>nseg</B>. This
  78. routine calls Segment_format() and Segment_init(), see below. If
  79. Segment_open() is used, the routines Segment_format() and Segment_init()
  80. must not be used.
  81. <P>
  82. Return codes are: 1 ok; else a negative number between -1 and -6 encoding
  83. the error type.
  84. <P>
  85. Alternatively, the first step is to create a file which is properly
  86. formatted for use by the <I>Segment Library</I> routines:
  87. <P>
  88. <I>int Segment_format (int fd, int nrows, off_t ncols,off_t srows, int scols,
  89. int len)</I>, format a segment file
  90. <P>
  91. The segmentation routines require a disk file to be used for paging
  92. segments in and out of memory. This routine formats the file open for
  93. write on file descriptor <B>fd</B> for use as a segment file. A segment
  94. file must be formatted before it can be processed by other segment
  95. routines. The configuration parameters <B>nrows, ncols, srows, scols</B>,
  96. and <B>len</B> are written to the beginning of the segment file which is
  97. then filled with zeros.
  98. <P>
  99. The corresponding nonsegmented data matrix, which is to be transferred to the
  100. segment file, is <B>nrows</B> by <B>ncols.</B> The segment file is to be
  101. formed of segments which are <B>srows</B> by <B>scols</B>. The data items
  102. have length <B>len</B> bytes. For example, if the data type is <I>int</I>,
  103. <B><I>len</I></B> is <I>sizeof(int)</I>.
  104. <P>
  105. Return codes are: 1 ok; else -1 could not seek or write <I>fd</I>, or -3
  106. illegal configuration parameter(s) .
  107. <P>
  108. The next step is to initialize a SEGMENT structure to be associated with a
  109. segment file formatted by Segment_format().
  110. <P>
  111. <I>int Segment_init (SEGMENT *seg, int fd, int nsegs)</I>, initialize segment
  112. structure
  113. <P>
  114. Initializes the <B>seg</B> structure. The file on <B>fd</B> is
  115. a segment file created by Segment_format() and must be open for
  116. reading and writing. The segment file configuration parameters <I>nrows,
  117. ncols, srows, scols</I>, and <I>len</I>, as written to the file by
  118. <I>Segment_format</I>, are read from the file and stored in the
  119. <B>seg</B> structure. <B>Nsegs</B> specifies the number of segments that
  120. will be retained in memory. The minimum value allowed is 1.
  121. \note The size of a segment is <I>scols*srows*len</I> plus a few
  122. bytes for managing each segment.
  123. <P>
  124. Return codes are: 1 if ok; else -1 could not seek or read segment file, or -2 out of memory.
  125. <P>
  126. Then data can be written from another file to the segment file row by row:
  127. <P>
  128. <I>int Segment_put_row (SEGMENT *seg, char *buf, int row)</I>, write row to
  129. segment file
  130. <P>
  131. Transfers nonsegmented matrix data, row by row, into a segment
  132. file. <B>Seg</B> is the segment structure that was configured from a call
  133. to Segment_init(). <B>Buf</B> should contain <I>ncols*len</I>
  134. bytes of data to be transferred to the segment file. <B>Row</B> specifies
  135. the row from the data matrix being transferred.
  136. <P>
  137. Return codes are: 1 if ok; else -1 could not seek or write segment file.
  138. <P>
  139. Then data can be read or written to the segment file randomly:
  140. <P>
  141. <I>int Segment_get (SEGMENT *seg, char *value, int row, int col)</I>, get value
  142. from segment file
  143. <P>
  144. Provides random read access to the segmented data. It gets
  145. <I>len</I> bytes of data into <B>value</B> from the segment file
  146. <B>seg</B> for the corresponding <B>row</B> and <B>col</B> in the
  147. original data matrix.
  148. <P>
  149. Return codes are: 1 if ok; else -1 could not seek or read segment file.
  150. <P>
  151. <I>int Segment_put (SEGMENT *seg, char *value, int row, int col)</I>, put
  152. value to segment file
  153. <P>
  154. Provides random write access to the segmented data. It
  155. copies <I>len</I> bytes of data from <B>value</B> into the segment
  156. structure <B>seg</B> for the corresponding <B>row</B> and <B>col</B> in
  157. the original data matrix.
  158. <P>
  159. The data is not written to disk immediately. It is stored in a memory segment
  160. until the segment routines decide to page the segment to disk.
  161. <P>
  162. Return codes are: 1 if ok; else -1 could not seek or write segment file.
  163. <P>
  164. After random reading and writing is finished, the pending updates must be
  165. flushed to disk:
  166. <P>
  167. <I>int Segment_flush (SEGMENT *seg)</I>, flush pending updates to disk
  168. <P>
  169. Forces all pending updates generated by Segment_put() to be
  170. written to the segment file <B>seg.</B> Must be called after the final
  171. Segment_put() to force all pending updates to disk. Must also be called
  172. before the first call to Segment_get_row().
  173. <P>
  174. Now the data in segment file can be read row by row and transferred to a normal
  175. sequential data file:
  176. <P>
  177. <I>int Segment_get_row (SEGMENT *seg, char *buf, int row)</I>, read row from
  178. segment file
  179. <P>
  180. Transfers data from a segment file, row by row, into memory
  181. (which can then be written to a regular matrix file) . <B>Seg</B> is the
  182. segment structure that was configured from a call to Segment_init().
  183. <B>Buf</B> will be filled with <I>ncols*len</I> bytes of data
  184. corresponding to the <B>row</B> in the data matrix.
  185. <P>
  186. Return codes are: 1 if ok; else -1 could not seek or read segment file.
  187. <P>
  188. Finally, memory allocated in the SEGMENT structure is freed:
  189. <P>
  190. <I>int Segment_release (SEGMENT *seg)</I>, free allocated memory
  191. <P>
  192. Releases the allocated memory associated with the segment file
  193. <B>seg.</B> Does not close the file. Does not flush the data which may
  194. be pending from previous Segment_put() calls.
  195. <P>
  196. The following routine both deletes the segment file and releases allocated
  197. memory:
  198. <P>
  199. <I>int Segment_close (SEGMENT *seg)</I>, close segment structure
  200. <P>
  201. Deletes the segment file and uses Segment_release() to release the
  202. allocated memory. No further cleaing up is required.
  203. <P>
  204. \section How_to_Use_the_Library_Routines How to Use the Library Routines
  205. The following should provide the programmer with a good idea of how to use the
  206. <I>Segment Library</I> routines. The examples assume that the data is integer.
  207. Creation of a segment file and initialization of the segment structure
  208. at once:
  209. \code
  210. SEGMENT seg;
  211. Segment_open (&seg, G_tempfile(), nrows, ncols, srows, scols, sizeof(int), nseg);
  212. \endcode
  213. Alternatively, the first step is the creation and formatting of a segment
  214. file. A file is created, formatted and then closed:
  215. \code
  216. fd = creat (file, 0666);
  217. Segment_format (fd, nrows, ncols, srows, scols, sizeof(int));
  218. close(fd);
  219. \endcode
  220. <P>
  221. The next step is the conversion of the nonsegmented matrix data into segment
  222. file format. The segment file is reopened for read and write and initialized:
  223. \code
  224. #include <fcntl.h>
  225. SEGMENT seg;
  226. fd = open (file, O_RDWR);
  227. Segment_init (&seg, fd, nseg);
  228. \endcode
  229. <P>
  230. Both the segment file and the segment structure are now ready to use, and
  231. data can be read row by row from the original data file and put into the
  232. segment file:
  233. \code
  234. for (row = 0; row < nrows; row++)
  235. {
  236. // code to get original matrix data for row into buf
  237. Segment_put_row (&seg, buf, row);
  238. }
  239. \endcode
  240. <P>
  241. Of course if the intention is only to add new values rather than update existing
  242. values, the step which transfers data from the original matrix to the segment
  243. file, using Segment_put_row(), could be omitted, since
  244. Segment_format() will fill the segment file with zeros.
  245. <P>
  246. The data can now be accessed directly using Segment_get(). For example,
  247. to get the value at a given row and column:
  248. \code
  249. int value;
  250. SEGMENT seg;
  251. Segment_get (&seg, &value, row, col);
  252. \endcode
  253. <P>
  254. Similarly Segment_put() can be used to change data values in the
  255. segment file:
  256. \code
  257. int value;
  258. SEGMENT seg;
  259. value = 10;
  260. Segment_put (&seg, &value, row, col);
  261. \endcode
  262. \warning It is an easy mistake to pass a value directly to
  263. Segment_put(). The following should be avoided:
  264. \code
  265. Segment_put (&seg, 10, row, col); // this will not work
  266. \endcode
  267. <P>
  268. Once the random access processing is complete, the data would be extracted
  269. from the segment file and written to a nonsegmented matrix data file as
  270. follows:
  271. \code
  272. Segment_flush (&seg);
  273. for (row = 0; row < nrows; row++)
  274. {
  275. Segment_get_row (&seg, buf, row);
  276. // code to put buf into a matrix data file for row
  277. }
  278. \endcode
  279. <P>
  280. Finally, the memory allocated for use by the segment routines would be
  281. released and the file closed:
  282. \code
  283. Segment_release (&seg);
  284. close (fd);
  285. \endcode
  286. \note The <I>Segment Library</I> does not know the name of the
  287. segment file. It does not attempt to remove the file. If the file is only
  288. temporary, the programmer should remove the file after closing it.
  289. <P>
  290. \section Segment_Library_Performance Segment Library Performance
  291. Performance of the <I>Segment Library</I> routines can be improved by
  292. about 10% if <B>srows, scols</B> are each powers of 2; in this case a
  293. faster alternative is used to access the segment file. An additional
  294. improvement can be achieved if <B>len</B> is also a power of 2. For
  295. scattered access to a large dataset, smaller segments, i.e. values
  296. for <B>srows, scols</B> of 32, 64, or 128 seem to provide
  297. the best performance. Calculating segment size as a fraction of the
  298. data matrix size, e.g. srows = nrows / 4 + 1, will result in very poor
  299. performance, particularly for larger datasets.
  300. \section Loading_the_Segment_Library Loading the Segment Library
  301. <P>
  302. The library is loaded by specifying
  303. \code
  304. $(SEGMENTLIB)
  305. \endcode
  306. in the Makefile.
  307. <P>
  308. See \ref Compiling_and_Installing_GRASS_Modules for a complete
  309. discussion of Makefiles.
  310. */