r.in.xyz.html 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357
  1. <h2>DESCRIPTION</h2>
  2. The <em>r.in.xyz</em> module will load and bin ungridded x,y,z ASCII data
  3. into a new raster map. The user may choose from a variety of statistical
  4. methods in creating the new raster. Gridded data provided as a stream of
  5. x,y,z points may also be imported.
  6. <p>
  7. Please note that the current region extents and resolution are used for
  8. the import. It is therefore recommended to first use the <b>-s</b>
  9. flag to get the extents of the input points to be imported, then
  10. adjust the current region accordingly, and only then proceed with the
  11. actual import.
  12. <p>
  13. <em>r.in.xyz</em> is designed for processing massive point cloud datasets,
  14. for example raw LIDAR or sidescan sonar swath data. It has been tested with
  15. datasets as large as tens of billion of points (705GB in a single file).
  16. <!-- Doug Newcomb, US Fish & Wildlife Service -->
  17. <p>
  18. Available statistics for populating the raster are (<b>method</b>):
  19. <p>
  20. <blockquote>
  21. <table>
  22. <tr><td><em>n</em></td> <td>number of points in cell</td></tr>
  23. <tr><td><em>min</em></td> <td>minimum value of points in cell</td></tr>
  24. <tr><td><em>max</em></td> <td>maximum value of points in cell</td></tr>
  25. <tr><td><em>range</em></td> <td>range of points in cell</td></tr>
  26. <tr><td><em>sum</em></td> <td>sum of points in cell</td></tr>
  27. <tr><td><em>mean</em></td> <td>average value of points in cell</td></tr>
  28. <tr><td><em>stddev</em></td> <td>standard deviation of points in cell</td></tr>
  29. <tr><td><em>variance</em></td> <td>variance of points in cell</td></tr>
  30. <tr><td><em>coeff_var</em></td><td>coefficient of variance of points in cell</td></tr>
  31. <tr><td><em>median</em></td> <td>median value of points in cell</td></tr>
  32. <tr valign="baseline"><td><em>percentile</em>&nbsp;</td>
  33. <td>p<sup><i>th</i></sup> percentile of points in cell</td></tr>
  34. <tr><td><em>skewness</em></td> <td>skewness of points in cell</td></tr>
  35. <tr><td><em>trimmean</em></td> <td>trimmed mean of points in cell</td></tr>
  36. </table>
  37. </blockquote>
  38. <ul>
  39. <li><em>Variance</em> and derivatives use the biased estimator (n). [subject to change]
  40. <li><em>Coefficient of variance</em> is given in percentage and defined as
  41. <tt>(stddev/mean)*100</tt>.
  42. </ul>
  43. <p>
  44. It is also possible to bin and store another data column (e.g. backscatter)
  45. while simultaneously filtering and scaling both the data column values and
  46. the z range.
  47. <h2>NOTES</h2>
  48. <h3>Gridded data</h3>
  49. If data is known to be on a regular grid <em>r.in.xyz</em> can reconstruct
  50. the map perfectly as long as some care is taken to set up the region
  51. correctly and that the data's native map projection is used. A typical
  52. method would involve determining the grid resolution either by examining
  53. the data's associated documentation or by studying the text file. Next scan
  54. the data with <em>r.in.xyz</em>'s <b>-s</b> (or <b>-g</b>) flag to find the
  55. input data's bounds. GRASS uses the cell-center raster convention where
  56. data points fall within the center of a cell, as opposed to the grid-node
  57. convention. Therefore you will need to grow the region out by half a cell
  58. in all directions beyond what the scan found in the file. After the region
  59. bounds and resolution are set correctly with <em><a href="g.region.html">g.region</a></em>, run
  60. <em>r.in.xyz</em> using the <i>n</i> method and verify that n=1 at all places.
  61. <em><a href="r.univar.html">r.univar</a></em> can help. Once you are confident that the region exactly
  62. matches the data proceed to run <em>r.in.xyz</em> using one of the <i>mean,
  63. min, max</i>, or <i>median</i> methods. With n=1 throughout, the result
  64. should be identical regardless of which of those methods are used.
  65. <h3>Memory use</h3>
  66. While the <b>input</b> file can be arbitrarily large, <em>r.in.xyz</em>
  67. will use a large amount of system memory for large raster regions (10000x10000).
  68. If the module refuses to start complaining that there isn't enough memory,
  69. use the <b>percent</b> parameter to run the module in several passes.
  70. In addition using a less precise map format (<tt>CELL</tt> [integer] or
  71. <tt>FCELL</tt> [floating point]) will use less memory than a <tt>DCELL</tt>
  72. [double precision floating point] <b>output</b> map. Methods such as <em>n,
  73. min, max, sum</em> will also use less memory, while <em>stddev, variance,
  74. and coeff_var</em> will use more.
  75. The aggregate functions <em>median, percentile, skewness</em> and
  76. <em>trimmed mean</em> will use even more memory and may not be appropriate
  77. for use with arbitrarily large input files<!-- without a small value for percent= -->.
  78. <!-- explained: memory use for regular stats will be based solely on region size,
  79. but for the aggregate fns it will also depend on the number of data points. (?) -->
  80. <p>
  81. The default map <b>type</b>=<tt>FCELL</tt> is intended as compromise between
  82. preserving data precision and limiting system resource consumption.
  83. If reading data from a <tt>stdin</tt> stream, the program can only run using
  84. a single pass.
  85. <h3>Setting region bounds and resolution</h3>
  86. You can use the <b>-s</b> scan flag to find the extent of the input data
  87. (and thus point density) before performing the full import. Use
  88. <em><a href="g.region.html">g.region</a></em> to adjust the region bounds to match. The <b>-g</b> shell
  89. style flag prints the extent suitable as parameters for <em><a href="g.region.html">g.region</a></em>.
  90. A suitable resolution can be found by dividing the number of input points
  91. by the area covered. e.g.
  92. <div class="code"><pre>
  93. wc -l inputfile.txt
  94. g.region -p
  95. # points_per_cell = n_points / (rows * cols)
  96. g.region -e
  97. # UTM location:
  98. # points_per_sq_m = n_points / (ns_extent * ew_extent)
  99. # Lat/Lon location:
  100. # points_per_sq_m = n_points / (ns_extent * ew_extent*cos(lat) * (1852*60)^2)
  101. </pre></div>
  102. <p>
  103. If you only intend to interpolate the data with <em><a href="r.to.vect.html">r.to.vect</a></em> and
  104. <em><a href="v.surf.rst.html">v.surf.rst</a></em>, then there is little point to setting the region
  105. resolution so fine that you only catch one data point per cell -- you might
  106. as well use "<tt>v.in.ascii&nbsp;-zbt</tt>" directly.
  107. <h3>Filtering</h3>
  108. Points falling outside the current region will be skipped. This includes
  109. points falling <em>exactly</em> on the southern region bound.
  110. (to capture those adjust the region with "<tt>g.region s=s-0.000001</tt>";
  111. see <em><a href="g.region.html">g.region</a></em>)
  112. <p>Blank lines and comment lines starting with the hash symbol (<tt>#</tt>)
  113. will be skipped.
  114. <p>
  115. The <b>zrange</b> parameter may be used for filtering the input data by
  116. vertical extent. Example uses might include preparing multiple raster
  117. sections to be combined into a 3D raster array with <em><a href="r.to.rast3.html">r.to.rast3</a></em>, or
  118. for filtering outliers on relatively flat terrain.
  119. <p>
  120. In varied terrain the user may find that <em>min</em> maps make for a good
  121. noise filter as most LIDAR noise is from premature hits. The <em>min</em> map
  122. may also be useful to find the underlying topography in a forested or urban
  123. environment if the cells are over sampled.
  124. <p>
  125. The user can use a combination of <em>r.in.xyz</em> <b>output</b> maps to create
  126. custom filters. e.g. use <em><a href="r.mapcalc.html">r.mapcalc</a></em> to create a <tt>mean-(2*stddev)</tt>
  127. map. [In this example the user may want to include a lower bound filter in
  128. <em><a href="r.mapcalc.html">r.mapcalc</a></em> to remove highly variable points (small <em>n</em>) or run
  129. <em><a href="r.neighbors.html">r.neighbors</a></em> to smooth the stddev map before further use.]
  130. <h3>Alternate value column</h3>
  131. The <b>value_column</b> parameter can be used in specialized cases when you
  132. want to filter by z-range but bin and store another column's data. For
  133. example if you wanted to look at backscatter values between 1000 and 1500
  134. meters elevation. This is particularly useful when using <em>r.in.xyz</em>
  135. to prepare depth slices for a 3D raster &mdash; the <b>zrange</b> option defines
  136. the depth slice but the data values stored in the voxels describe an
  137. additional dimension. As with the z column, a filtering range and scaling
  138. factor may be applied.
  139. <h3>Reprojection</h3>
  140. If the raster map is to be reprojected, it may be more appropriate to reproject
  141. the input points with <em><a href="m.proj.html">m.proj</a></em> or <em>cs2cs</em> before running
  142. <em>r.in.xyz</em>.
  143. <h3>Interpolation into a DEM</h3>
  144. The vector engine's topographic abilities introduce a finite memory overhead
  145. per vector point which will typically limit a vector map to approximately
  146. 3 million points (~ 1750^2 cells). If you want more, use the <em><a href="r.to.vect.html">r.to.vect</a></em>
  147. <b>-b</b> flag to skip building topology. Without topology, however, all
  148. you'll be able to do with the vector map is display with <em><a href="d.vect.html">d.vect</a></em> and
  149. interpolate with <em><a href="v.surf.rst.html">v.surf.rst</a></em>.
  150. Run <em><a href="r.univar.html">r.univar</a></em> on your raster map to check the number of non-NULL cells
  151. and adjust bounds and/or resolution as needed before proceeding.
  152. <p>
  153. Typical commands to create a DEM using a regularized spline fit:
  154. <div class="code"><pre>
  155. r.univar lidar_min
  156. r.to.vect -z type=point in=lidar_min out=lidar_min_pt
  157. v.surf.rst in=lidar_min_pt elev=lidar_min.rst
  158. </pre></div>
  159. <h3>Import of x,y,string data</h3>
  160. <em>r.in.xyz</em> is expecting numeric values as z column. In order to
  161. perform a occurrence count operation even on x,y data with non-numeric
  162. attribute(s), the data can be imported using either the x or y
  163. coordinate as a fake z column for <b>method</b>=<tt>n</tt> (count
  164. number of points per grid cell), the z values are ignored anyway.
  165. <h2>EXAMPLES</h2>
  166. <h3>Import of x,y,z ASCII into DEM</h3>
  167. Sometimes elevation data are delivered as x,y,z ASCII files instead of a raster
  168. matrix. The import procedure consists of a few steps: calculation of the
  169. map extent, setting of the computational region accordingly with an additional
  170. extension into all directions by half a raster cell in order to register the
  171. elevation points at raster cell centers.
  172. <p>
  173. Note: if the z column is separated by several spaces from the coordinate columns,
  174. it may be sufficient to adapt the <b>z</b> position value.
  175. <!-- NC: preparation for this example
  176. g.region raster=elevation res=15 -p
  177. #
  178. r.stats -1g elevation > elevation.xyz
  179. -->
  180. <div class="code"><pre>
  181. # Important: observe the raster spacing from the ASCII file:
  182. # ASCII file format (example):
  183. # 630007.5 228492.5 141.99614
  184. # 630022.5 228492.5 141.37904
  185. # 630037.5 228492.5 142.29822
  186. # 630052.5 228492.5 143.97987
  187. # ...
  188. # In this example the distance is 15m in x and y direction.
  189. # detect extent, print result as g.region parameters
  190. r.in.xyz input=elevation.xyz separator=space -s -g
  191. # ... n=228492.5 s=215007.5 e=644992.5 w=630007.5 b=55.578793 t=156.32986
  192. # set computational region, along with the actual raster resolution
  193. # as defined by the point spacing in the ASCII file:
  194. g.region n=228492.5 s=215007.5 e=644992.5 w=630007.5 res=15 -p
  195. # now enlarge computational region by half a raster cell (here 7.5m) to
  196. # store the points as cell centers:
  197. g.region n=n+7.5 s=s-7.5 w=w-7.5 e=e+7.5 -p
  198. # import XYZ ASCII file, with z values as raster cell values
  199. r.in.xyz input=elevation.xyz separator=space method=mean output=myelev
  200. # univariate statistics for verification of raster values
  201. r.univar myelev
  202. </pre></div>
  203. <h3>Import of LiDAR data and DEM creation</h3>
  204. Import the <a href="http://www.grassbook.org/ncexternal/index.html">Jockey's
  205. Ridge, NC, LIDAR dataset</a> (compressed file "lidaratm2.txt.gz"), and process it
  206. into a clean DEM:
  207. <div class="code"><pre>
  208. # scan and set region bounds
  209. r.in.xyz -s -g separator="," in=lidaratm2.txt
  210. g.region n=35.969493 s=35.949693 e=-75.620999 w=-75.639999
  211. g.region res=0:00:00.075 -a
  212. # create "n" map containing count of points per cell for checking density
  213. r.in.xyz in=lidaratm2.txt out=lidar_n separator="," method=n zrange=-2,50
  214. # check point density [rho = n_sum / (rows*cols)]
  215. r.univar lidar_n
  216. # create "min" map (elevation filtered for premature hits)
  217. r.in.xyz in=lidaratm2.txt out=lidar_min separator="," method=min zrange=-2,50
  218. # set computational region to area of interest
  219. g.region n=35:57:56.25N s=35:57:13.575N w=75:38:23.7W e=75:37:15.675W
  220. # check number of non-null cells (try and keep under a few million)
  221. r.univar lidar_min
  222. # convert to points
  223. r.to.vect -z type=point in=lidar_min out=lidar_min_pt
  224. # interpolate using a regularized spline fit
  225. v.surf.rst in=lidar_min_pt elev=lidar_min.rst
  226. # set color scale to something interesting
  227. r.colors lidar_min.rst rule=bcyr -n -e
  228. # prepare a 1:1:1 scaled version for NVIZ visualization (for lat/lon input)
  229. r.mapcalc "lidar_min.rst_scaled = lidar_min.rst / (1852*60)"
  230. r.colors lidar_min.rst_scaled rule=bcyr -n -e
  231. </pre></div>
  232. <h2>TODO</h2>
  233. <ul>
  234. <li> Support for multiple map output from a single run.<br>
  235. <tt>method=string[,string,...] output=name[,name,...]</tt><br>
  236. This can be easily handled by a wrapper script, with the added
  237. benefit of it being very simple to parallelize that way.
  238. </ul>
  239. <h2>KNOWN ISSUES</h2>
  240. <ul>
  241. <li> "<tt>nan</tt>" can leak into <em>coeff_var</em> maps.
  242. <br>Cause unknown. Possible work-around: "<tt>r.null setnull=nan</tt>"
  243. <!-- Another method: r.mapcalc 'No_nan = if(map == map, map, null() )' -->
  244. </ul>
  245. If you encounter any problems (or solutions!) please contact the GRASS
  246. Development Team.
  247. <h2>SEE ALSO</h2>
  248. <em>
  249. <a href="g.region.html">g.region</a>,
  250. <a href="m.proj.html">m.proj</a>,
  251. <a href="r.fillnulls.html">r.fillnulls</a>,
  252. <a href="r.in.ascii.html">r.in.ascii</a>,
  253. <a href="r.in.lidar.html">r.in.lidar</a>,
  254. <a href="r3.in.xyz.html">r3.in.xyz</a>,
  255. <a href="r.mapcalc.html">r.mapcalc</a>,
  256. <a href="r.neighbors.html">r.neighbors</a>,
  257. <a href="r.out.xyz.html">r.out.xyz</a>,
  258. <a href="r.to.rast3.html">r.to.rast3</a>,
  259. <a href="r.to.vect.html">r.to.vect</a>,
  260. <a href="r.univar.html">r.univar</a>,
  261. <a href="v.in.ascii.html">v.in.ascii</a>,
  262. <a href="v.surf.rst.html">v.surf.rst</a>
  263. </em>
  264. <p>
  265. <em>
  266. <a href="v.lidar.correction.html">v.lidar.correction</a>,
  267. <a href="v.lidar.edgedetection.html">v.lidar.edgedetection</a>,
  268. <a href="v.lidar.growing.html">v.lidar.growing</a>,
  269. <a href="v.outlier.html">v.outlier</a>,
  270. <a href="v.surf.bspline.html">v.surf.bspline</a>
  271. </em>
  272. <p>
  273. <em><a href="http://www.ivarch.com/programs/pv.shtml">pv</a></em>
  274. - The UNIX pipe viewer utility
  275. <p>
  276. Overview: <a href="https://grasswiki.osgeo.org/wiki/Interpolation">Interpolation and Resampling</a> in GRASS GIS
  277. <h2>AUTHORS</h2>
  278. Hamish Bowman, Department of Marine Science, University of Otagom New Zealand
  279. <br>
  280. Extended by Volker Wichmann to support the aggregate functions
  281. <i>median, percentile, skewness</i> and <i>trimmed mean</i>.
  282. <p>
  283. <i>Last changed: $Date$</i>