123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410 |
- /*! \page segmentlib GRASS Segment Library
- <!-- doxygenized from "GRASS 5 Programmer's Manual"
- by M. Neteler 8/2005
- -->
- \author CERL
- \section segmentintro Introduction
- <P>
- Large data files which contain data in a matrix format often need to be
- accessed in a nonsequential or random manner. This requirement complicates
- the programming.
- <P>
- Methods for accessing the data are to:
- <P>
- (1) read the entire data file into memory and process the data as a
- two-dimensional matrix,
- <P>
- (2) perform direct access i/o to the data file for every data value to be
- accessed, or
- <P>
- (3) read only portions of the data file into memory as needed.
- <P>
- Method (1) greatly simplifies the programming effort since i/o is done once
- and data access is simple array referencing. However, it has the
- disadvantage that large amounts of memory may be required to hold the data.
- The memory may not be available, or if it is, system paging of the module
- may severely degrade performance. Method (2) is not much more complicated to
- code and requires no significant amount of memory to hold the data. But the
- i/o involved will certainly degrade performance. Method (3) is a mixture of
- (1) and (2) . Memory requirements are fixed and data is read from the data
- file only when not already in memory. However the programming is more
- complex.
- <P>
- The routines provided in this library are an implementation of method (3) .
- They are based on the idea that if the original matrix were segmented or
- partitioned into smaller matrices these segments could be managed to reduce
- both the memory required and the i/o. Data access along connected paths
- through the matrix, (i.e., moving up or down one row and left or right one
- column) should benefit.
- <P>
- In most applications, the original data is not in the segmented format. The
- data must be transformed from the nonsegmented format to the segmented
- format. This means reading the original data matrix row by row and writing
- each row to a new file with the segmentation organization. This step
- corresponds to the i/o step of method (1) .
- <P>
- Then data can be retrieved from the segment file through routines by
- specifying the row and column of the original matrix. Behind the scenes, the
- data is paged into memory as needed and the requested data is returned to
- the caller.
- \note All routines and global variables in this library, documented
- or undocumented, start with the prefix \c Segment_. To avoid name
- conflicts, programmers should not create variables or routines in their own
- modules which use this prefix.
- \section Segment_Routines Segment Routines
- <P>
- The routines in the <I>Segment Library</I> are described below, more or
- less in the order they would logically be used in a module. They use a data
- structure called SEGMENT which is defined in the header file
- \c grass/segment.h that must be included in any code using these
- routines:
- \code
- #include <grass/segment.h>
- \endcode
- \see \ref Loading_the_Segment_Library.
- <P>
- A temporary file needs to be prepared and a SEGMENT structure needs to
- be initialized before any data can be transferred to the segment file.
- This can be done with the <I>Segment Library</I> routine:
- <P>
- <I>int Segment_open(SEGMENT *SEG, char *fname, off_t nrows, off_t ncols,
- int srows, int scols, int len, int nseg)</I>, open a new segment structure.
- <P>
- A new file with full path name <B>fname</B> will be created and
- formatted. The original nonsegmented data matrix consists of
- <B>nrows</B> and <B>ncols</B>. The segments consist of <B>srows</B> by
- <B>scols</B>. The data items have length <B>len</B> bytes. The number
- of segments to be retained in memory is given by <B>nseg</B>. This
- routine calls Segment_format() and Segment_init(), see below. If
- Segment_open() is used, the routines Segment_format() and Segment_init()
- must not be used.
- <P>
- Return codes are: 1 ok; else a negative number between -1 and -6 encoding
- the error type.
- <P>
- Alternatively, the first step is to create a file which is properly
- formatted for use by the <I>Segment Library</I> routines:
- <P>
- <I>int Segment_format (int fd, int nrows, off_t ncols,off_t srows, int scols,
- int len)</I>, format a segment file
- <P>
- The segmentation routines require a disk file to be used for paging
- segments in and out of memory. This routine formats the file open for
- write on file descriptor <B>fd</B> for use as a segment file. A segment
- file must be formatted before it can be processed by other segment
- routines. The configuration parameters <B>nrows, ncols, srows, scols</B>,
- and <B>len</B> are written to the beginning of the segment file which is
- then filled with zeros.
- <P>
- The corresponding nonsegmented data matrix, which is to be transferred to the
- segment file, is <B>nrows</B> by <B>ncols.</B> The segment file is to be
- formed of segments which are <B>srows</B> by <B>scols</B>. The data items
- have length <B>len</B> bytes. For example, if the data type is <I>int</I>,
- <B><I>len</I></B> is <I>sizeof(int)</I>.
- <P>
- Return codes are: 1 ok; else -1 could not seek or write <I>fd</I>, or -3
- illegal configuration parameter(s) .
- <P>
- The next step is to initialize a SEGMENT structure to be associated with a
- segment file formatted by Segment_format().
- <P>
- <I>int Segment_init (SEGMENT *seg, int fd, int nsegs)</I>, initialize segment
- structure
- <P>
- Initializes the <B>seg</B> structure. The file on <B>fd</B> is
- a segment file created by Segment_format() and must be open for
- reading and writing. The segment file configuration parameters <I>nrows,
- ncols, srows, scols</I>, and <I>len</I>, as written to the file by
- <I>Segment_format</I>, are read from the file and stored in the
- <B>seg</B> structure. <B>Nsegs</B> specifies the number of segments that
- will be retained in memory. The minimum value allowed is 1.
- \note The size of a segment is <I>scols*srows*len</I> plus a few
- bytes for managing each segment.
- <P>
- Return codes are: 1 if ok; else -1 could not seek or read segment file, or -2 out of memory.
- <P>
- Then data can be written from another file to the segment file row by row:
- <P>
- <I>int Segment_put_row (SEGMENT *seg, char *buf, int row)</I>, write row to
- segment file
- <P>
- Transfers nonsegmented matrix data, row by row, into a segment
- file. <B>Seg</B> is the segment structure that was configured from a call
- to Segment_init(). <B>Buf</B> should contain <I>ncols*len</I>
- bytes of data to be transferred to the segment file. <B>Row</B> specifies
- the row from the data matrix being transferred.
- <P>
- Return codes are: 1 if ok; else -1 could not seek or write segment file.
- <P>
- Then data can be read or written to the segment file randomly:
- <P>
- <I>int Segment_get (SEGMENT *seg, char *value, int row, int col)</I>, get value
- from segment file
- <P>
- Provides random read access to the segmented data. It gets
- <I>len</I> bytes of data into <B>value</B> from the segment file
- <B>seg</B> for the corresponding <B>row</B> and <B>col</B> in the
- original data matrix.
- <P>
- Return codes are: 1 if ok; else -1 could not seek or read segment file.
- <P>
- <I>int Segment_put (SEGMENT *seg, char *value, int row, int col)</I>, put
- value to segment file
- <P>
- Provides random write access to the segmented data. It
- copies <I>len</I> bytes of data from <B>value</B> into the segment
- structure <B>seg</B> for the corresponding <B>row</B> and <B>col</B> in
- the original data matrix.
- <P>
- The data is not written to disk immediately. It is stored in a memory segment
- until the segment routines decide to page the segment to disk.
- <P>
- Return codes are: 1 if ok; else -1 could not seek or write segment file.
- <P>
- After random reading and writing is finished, the pending updates must be
- flushed to disk:
- <P>
- <I>int Segment_flush (SEGMENT *seg)</I>, flush pending updates to disk
- <P>
- Forces all pending updates generated by Segment_put() to be
- written to the segment file <B>seg.</B> Must be called after the final
- Segment_put() to force all pending updates to disk. Must also be called
- before the first call to Segment_get_row().
- <P>
- Now the data in segment file can be read row by row and transferred to a normal
- sequential data file:
- <P>
- <I>int Segment_get_row (SEGMENT *seg, char *buf, int row)</I>, read row from
- segment file
- <P>
- Transfers data from a segment file, row by row, into memory
- (which can then be written to a regular matrix file) . <B>Seg</B> is the
- segment structure that was configured from a call to Segment_init().
- <B>Buf</B> will be filled with <I>ncols*len</I> bytes of data
- corresponding to the <B>row</B> in the data matrix.
- <P>
- Return codes are: 1 if ok; else -1 could not seek or read segment file.
- <P>
- Finally, memory allocated in the SEGMENT structure is freed:
- <P>
- <I>int Segment_release (SEGMENT *seg)</I>, free allocated memory
- <P>
- Releases the allocated memory associated with the segment file
- <B>seg.</B> Does not close the file. Does not flush the data which may
- be pending from previous Segment_put() calls.
- <P>
- The following routine both deletes the segment file and releases allocated
- memory:
- <P>
- <I>int Segment_close (SEGMENT *seg)</I>, close segment structure
- <P>
- Deletes the segment file and uses Segment_release() to release the
- allocated memory. No further cleaing up is required.
- <P>
- \section How_to_Use_the_Library_Routines How to Use the Library Routines
- The following should provide the programmer with a good idea of how to use the
- <I>Segment Library</I> routines. The examples assume that the data is integer.
- Creation of a segment file and initialization of the segment structure
- at once:
- \code
- SEGMENT seg;
- Segment_open (&seg, G_tempfile(), nrows, ncols, srows, scols, sizeof(int), nseg);
- \endcode
- Alternatively, the first step is the creation and formatting of a segment
- file. A file is created, formatted and then closed:
- \code
- fd = creat (file, 0666);
- Segment_format (fd, nrows, ncols, srows, scols, sizeof(int));
- close(fd);
- \endcode
- <P>
- The next step is the conversion of the nonsegmented matrix data into segment
- file format. The segment file is reopened for read and write and initialized:
- \code
- #include <fcntl.h>
- SEGMENT seg;
- fd = open (file, O_RDWR);
- Segment_init (&seg, fd, nseg);
- \endcode
- <P>
- Both the segment file and the segment structure are now ready to use, and
- data can be read row by row from the original data file and put into the
- segment file:
- \code
- for (row = 0; row < nrows; row++)
- {
- // code to get original matrix data for row into buf
- Segment_put_row (&seg, buf, row);
- }
- \endcode
- <P>
- Of course if the intention is only to add new values rather than update existing
- values, the step which transfers data from the original matrix to the segment
- file, using Segment_put_row(), could be omitted, since
- Segment_format() will fill the segment file with zeros.
- <P>
- The data can now be accessed directly using Segment_get(). For example,
- to get the value at a given row and column:
- \code
- int value;
- SEGMENT seg;
- Segment_get (&seg, &value, row, col);
- \endcode
- <P>
- Similarly Segment_put() can be used to change data values in the
- segment file:
- \code
- int value;
- SEGMENT seg;
- value = 10;
- Segment_put (&seg, &value, row, col);
- \endcode
- \warning It is an easy mistake to pass a value directly to
- Segment_put(). The following should be avoided:
- \code
- Segment_put (&seg, 10, row, col); // this will not work
- \endcode
- <P>
- Once the random access processing is complete, the data would be extracted
- from the segment file and written to a nonsegmented matrix data file as
- follows:
- \code
- Segment_flush (&seg);
- for (row = 0; row < nrows; row++)
- {
- Segment_get_row (&seg, buf, row);
- // code to put buf into a matrix data file for row
- }
- \endcode
- <P>
- Finally, the memory allocated for use by the segment routines would be
- released and the file closed:
- \code
- Segment_release (&seg);
- close (fd);
- \endcode
- \note The <I>Segment Library</I> does not know the name of the
- segment file. It does not attempt to remove the file. If the file is only
- temporary, the programmer should remove the file after closing it.
- <P>
- \section Segment_Library_Performance Segment Library Performance
- Performance of the <I>Segment Library</I> routines can be improved by
- about 10% if <B>srows, scols</B> are each powers of 2; in this case a
- faster alternative is used to access the segment file. An additional
- improvement can be achieved if <B>len</B> is also a power of 2. For
- scattered access to a large dataset, smaller segments, i.e. values
- for <B>srows, scols</B> of 32, 64, or 128 seem to provide
- the best performance. Calculating segment size as a fraction of the
- data matrix size, e.g. srows = nrows / 4 + 1, will result in very poor
- performance, particularly for larger datasets.
- \section Loading_the_Segment_Library Loading the Segment Library
- <P>
- The library is loaded by specifying
- \code
- $(SEGMENTLIB)
- \endcode
- in the Makefile.
- <P>
- See \ref Compiling_and_Installing_GRASS_Modules for a complete
- discussion of Makefiles.
- */
|