TODO 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259
  1. GRASS 6 vector TODO
  2. ---------------------
  3. (Radim Blazek, May 2006)
  4. This document is summary of my ideas on how vector part of GRASS GIS
  5. could be improved.
  6. It can be that you come to conclusion that vectors in GRASS are bad
  7. and it is necessary to start from scratch. In that case I would
  8. recommend to leave current library and modules intact and start the
  9. new work in parallel (the new modules could start with w.* or v2.*). I
  10. was thinking for example about completely new vector library based on
  11. OGC standard, using either OGR directly or an abstraction layer and
  12. OGR as an option (driver). That does not mean that I prefer simple
  13. feature specification over current GRASS implementation, I am not sure
  14. which one is better. In any case it would be pitty to drop current
  15. topological format with all its flexibility. Each approach has
  16. advantages and disadvantages. I think that it the best to have in OS
  17. GIS all alternatives file/database and topology/simple feature.
  18. Historical notes
  19. ----------------
  20. The current implementation of vectors is based on previous work which
  21. was present in GRASS5 (the vector library and modules and DBMI
  22. library). We started this work together with David D. Gray in autumn
  23. 2000 (IIRC) but David had to leave GRASS project soon so that I almost
  24. all responsibility for vector development in GRASS6 and its results is
  25. mine.
  26. The current design of GRASS vectors is result of these factor:
  27. + very limited resources for development (necessity to use existing
  28. free code/libraries/applications whenever possible)
  29. + relatively little experience with development of GIS application
  30. + respect for certain features of GRASS5 vector model and for existing
  31. community which is using it
  32. + bad experience with quality of data produced in simple feature based
  33. applications (ArcView)
  34. 1. Library
  35. ----------
  36. 1.1 Geometry
  37. ------------
  38. Keep topology and spatial index in file instead of in memory
  39. ------------------------------------------------------------
  40. Scalability seems to be currently the biggest problem of GRASS
  41. vectors. The geometry of GRASS vectors (coor file) is never loaded
  42. whole to the memory. OTOH the support structures (topology and spatial
  43. index) are loaded to memory on runtime. It should be possible to use
  44. files for topology and spatial index also on runtime and that way
  45. decrease the memory occupied by running module (practicaly to
  46. zero). The speed will decrease a bit but not significantly because
  47. files are usually cached by system.
  48. * Update: implemented in r38385 (2009/07) by Markus Metz
  49. Temporary vector
  50. ----------------
  51. Analytical modules process data in the output vector (for example
  52. v.overlay and v.buffer). Because many lines can be deleted (broken
  53. lines for example) and new lines are written at the end of coor file
  54. the output file can contain many 'dead' lines (not used space). It
  55. would be better to do processing in a temporary vector and copy only
  56. alive lines to the output when processing finished. That means
  57. implement Vect_open_temporary() which will work like Vect_open_new()
  58. but the files will be opened in temporary directory (it should not
  59. write to $MAPSET/vector).
  60. Recycle deleted lines
  61. ---------------------
  62. The space which was occupied by a line in coor file is lost after call
  63. to Vect_delete_line(). A list of the free positions be kept and
  64. Vect_write_line() should write in that free space if possible instead
  65. of to append a new line to the end of the file. There is already empty
  66. structure 'recycle' in 'dig_head' where the list could be imlemented
  67. (without changing 'dig_head' size, to keep binary compatibility).
  68. * Note: currently wxGUI vector digitizer 'undo' depends on this 'feature'
  69. Vect_rewrite_line
  70. -----------------
  71. Implement properly Vect_rewrite_line(). Currently it simply calls
  72. Vect_delete_line() and Vect_write_line(). It should be implemented so
  73. that if the new size of the line is the same as the old size it will
  74. be written in the same place in the coor file where the original line
  75. existed.
  76. * Note: see above
  77. Remove bounding box from support structures (?)
  78. -----------------------------------------------
  79. The vector structures (P_line, P_area, P_isle) store bounding box in
  80. N,S,E,W,T,B (doubles). Especially in case of element type GV_POINT the
  81. bounding box occupies a lot of space (2-3 times more than the point
  82. itself). I am not sure if this is realy good idea, it is necesssary to
  83. valutate also how often Vect_line_box() is called and the impact of
  84. the necessity to calculate always the box on the fly (when it is not
  85. stored in the structure) which can be time consuming for example for
  86. areas or long lines.
  87. * See also http://trac.osgeo.org/grass/ticket/542
  88. * Update: implemented in r46898 (2011/07) by Markus Metz
  89. Switching to update mode
  90. ------------------------
  91. It would be useful to have a possibility to switch to 'update' mode a
  92. map which was opened by Vect_open_old/new() and similarly to switch
  93. back to 'normal' mode. Currently it is necessary to call Vect_close()
  94. and Vect_open_update().
  95. Layer names
  96. -----------
  97. The layers are currently identified only by numbers but it is possible
  98. to assign to each layer number a name. The library can read these
  99. names but it is not possible to use the name as parameter for
  100. modules. It is necessary to write int Vect_get_layer_by_name ( struct
  101. Map_info *map, char *name) which will accept both names and numbers
  102. and use this function in vector modules. This is also important for
  103. OGR interface improvements (see below).
  104. * Update: implemented in r38548 (2009/07) by Martin Landa
  105. OGR interface
  106. -------------
  107. It is important to enable direct access to OGR data sources without
  108. v.external and without necessity to store anything in files. The
  109. problem of v.external is that topology is stored in file that means it
  110. can be wrong when the source is opened next time. It should be
  111. relatively easy to call Vect_build_ogr() whenever an OGR vector is
  112. opened with level2 (topology) requested and topology will be built on
  113. the fly. OGR vectors would be specified by virtual mapset name
  114. 'OGR'. Each OGR datasource will be equivalent to GRASS vector and each
  115. OGR layer will be equivalent to GRASS layer (it is necessary to
  116. implement layer names, see above). It would be for example possible to
  117. display a shapefile or PostGIS layers directly:
  118. d.vect map=./shapefiles/@OGR layer=roads # display shapefile ./shapefiles/roads.shp
  119. d.vect map=PG:dbname=test@OGR layer=roads # display table roads from database test
  120. * Update: in progess,
  121. see http://trac.osgeo.org/grass/wiki/Grass7/VectorLib/OGRInterface#DirectOGRreadaccess
  122. Simple feature API and sequential reading
  123. -----------------------------------------
  124. Most GRASS modules are currently using random access to the data which
  125. reflects GRASS format. This works well with GRASS data but it can
  126. become very slow or even impossible with OGR data sources because some
  127. OGR drivers don't support random access or random access is very
  128. slow. Because conversion from topological format to simple feature is
  129. very simple and sequential reading of GRASS vectors is not problem it
  130. would be desirable to implement in GRASS vector library 'simple
  131. feature' API to GRASS vectors and map it directly to OGR API in case
  132. of OGR data sources. Then many GRASS modules can be modified to use
  133. sequentil reading and simple feature API and that will make more
  134. efficient processing of data directly read from OGR data sources.
  135. 1.2 Attributes
  136. --------------
  137. In general I found the use of true RDBMS for attributes as a
  138. problem. The data are stored in two distinct places (vector files +
  139. database) and it makes it difficult to keep them consistent and manage
  140. (move, backup). Another problem is random access to the data in RDBMS
  141. from an application which is terribly slow (due to communication with
  142. server). RDBMS is not bad, bad is the combination of files and
  143. RDBMS. I think that either everything must be stored in RDBMS
  144. (PostGIS) or nothing. Eric G. Miller (IIRC) was right when he said
  145. that data are 'too distant' when RDBMS is used with geometry in file.
  146. I think that more work should be done on the drivers which are using
  147. embeded databases stored in files (SQLite,MySQL,DBF) with scope to
  148. reach similar functionality (functions, queries) which are in true
  149. RDBMS without penalty of communication with server. It should be also
  150. considered the possibility to change the default location of database
  151. files to vector directory ($MAPSET/vector/test). That means to keep
  152. all the data of one vector in a single directory. It is already
  153. possible but it is not the default settings, for example:
  154. db.connect driver=dbf database='$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/'
  155. db.connect driver=sqlite database='$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/db.sqlite'
  156. db.connect driver=mesql database='$GISDBASE/$LOCATION_NAME/$MAPSET/vector/$MAP/'
  157. Implement insert/update cursors
  158. -------------------------------
  159. GRASS modules are currently sending all data to database drivers as
  160. individual SQL insert/update statements. This makes the update process
  161. slow (cunstructing and parsing queries) and number precision can be
  162. lost. The solution is to implement db_open_insert/update_cursor() and
  163. db_insert/update() in database drivers and use these functions in
  164. modules. The drivers should then use precompiled statements
  165. (e.g. SQLite) or they could update the database directly (DBF).
  166. Note that it is not necessary to implement these functions in all
  167. drivers at the same time. You can implement lib/db/stubs functions
  168. which will create SQL statement and send it to db_execute() which is
  169. implemented in all drivers until the functions are properly
  170. implemented in all drivers.
  171. SQLite driver
  172. -------------
  173. Current implementation is very slow with large updates/inserts. I
  174. think that it is because all statemets are parsed and it should be
  175. possible to improve by insert/update cursors (see above).
  176. DBF driver
  177. ----------
  178. Add on the fly index for select/update.
  179. Implent db_copy_table() in drivers
  180. ----------------------------------
  181. db_copy_table() is implement in client library and it always reads and
  182. writes all the data which is slow. It would be better to send this
  183. request to the driver (if possible, i.e. input and output driver are
  184. the same) which can copy tables much faster. For example true RDBMS
  185. can use 'create table new as select * from old' and DBF driver can
  186. simply copy files.
  187. Load drivers as dynamic libraries
  188. ---------------------------------
  189. Database drivers are implemented as executables which communicate with
  190. modules via pipes. This implementation creates some problems with
  191. portability (especially on Windows) and it makes comunication slow (I
  192. am not sure how much). It would be probably desirable to implement
  193. drivers as loadable modules (dlopen() and equivalents).
  194. 2. Modules
  195. ----------
  196. v.overlay
  197. ---------
  198. Select only relevant features which will be written to the output if
  199. 'and,not,nor' operators are used. An inspiration is available in
  200. v.select.
  201. v.pack/v.unpack
  202. ---------------
  203. Write it. New modules to pack/unpack a vector to/from single file
  204. (probably tar). I am not sure about format. Originaly I was thinking
  205. about ASCII+DBF as it can be read also without GRASS but ASCII and DBF
  206. can lose precision and DBF has other limitations. It whould be
  207. probably better to use copy of 'coor' file and attributes written to
  208. SQLite database.
  209. Update: see
  210. http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.pack
  211. and
  212. http://trac.osgeo.org/grass/browser/grass-addons/grass7/vector/v.unpack
  213. by Luca Delucchi
  214. 1/2009: Other suggestions moved to
  215. http://trac.osgeo.org/grass/