CONTACT_ |_ STORE_ |_ PRIVACY NOTICE

April 12, 2006

Mike Watry
General Mgr., QCoherent Software

 
shim Downloads >Articles & White Papers > File Formats & the LAS standard
 
shim

 

shim


 

 

File formats and the LAS standard

Hang around the computer industry long enough, and you are bound to hear arguments over file formats. Which format is best and how best to exchange data can evoke emotional responses as passionate as the debates over the actual software. Some formats are proprietary, and some are open. Some formats have an inherent data structure that makes them more suitable for one task over another. With spatial data, an obvious distinction is between the raster/image and the vector/feature-based file formats. Some formats have a performance implication, i.e. they are designed for accessibility such that the software reading the data will be fast, even when the amount of data is large. Lets review some considerations in file formats that affect the decision making process.

  • Suitability - The file format must be capable of storing the data that needs to be stored. One wouldn"t choose a video file format like mpeg to store a written document. Suitability also includes whether the data is over-qualified or otherwise inappropriate. One could store audio in a movie file format, but you would not be using the video capabilities of the format.
  • Openness - How proprietary is the file format? If invented by a company or corporation, will they disclose the internal structure of the format? How much control does the inventor of the file format retain? The standardization of a format may affect the next item...
  • Sharability - Can you give the file to someone else and expect them to open or read the file? The format might be wonderful otherwise, but if an associate lacks the software/tools to read the file, then what"s the point?
  • Security - On the other hand, maybe you don"t want prying eyes to see the data without some authorization scheme. Is encryption part of the data scheme? Sometimes a binary scheme may be preferred over a text-based scheme for this very reason.
  • Supportability - How hard is it for a piece of software to support this new data format? While this question is often considered from the point of view of the software engineer, the time and effort in supporting a data format will be rolled into the cost of the software, if the software can be supported at all. Proprietary formats, for example, may be unsupportable for either technical or legal reasons.


Text vs. binary

One particular debate I have heard frequently is the debate between text and binary formats. First, understand the difference. Text-based files, or ASCII files, are files encoded to a common (English) alphabet of characters and symbols. Specifically there are 95 printable characters in the ASCII code. In terms of file formats, this means that the format is built from some structuring of text, and at the same time can be opened by any program that can read ASCII text. Binary files, on the other hand, may encode data in any manner suitable, and if binary files are opened by a text editor, the file will look mostly like gibberish. For example, the number one billion is stored as a "1" character and then nine "0" characters in a text format (with perhaps some commas thrown in for readability). In a binary format, this would be encoded to a machine-understood 4 byte representation of a number (in hexadecimal, one billion would be stored as 0x05F5E100). Note the size difference: text required 10 bytes versus 4 bytes for the binary representation.

When generally comparing text to binary as a file format scheme, consider the following general observations. Text formats are more open and shareable, as any text editor can open the file. As such, the user of the file has some control. Power users in particular may have some comfort in their ability to manipulate the data through scripts or other programmatic means. However, binary files tend to be more efficient, in general. Binary files will be smaller in size since the encoding scheme is tuned to the data that is being stored. Smaller sized files are quicker to read or load because there is less to read and the contents already has a computer-friendly encoding and structure. Binary files can be more secure, or at least are not trivially understood by a text editor.

A good example of a file as a text file is a configuration file, or in the GIS world a metadata file. On the other hand, binary formats are far superior for datasets in the hundreds of megabytes or gigabytes in size as geographic datasets typically are. A geographic dataset that is becoming more common and certainly challenges the upper limits of file sizes on any operating system is LiDAR data. Despite the size of LiDAR datasets and files, LiDAR is stored in both ASCII and binary and is worthy of further discussion.


Case Study (LiDAR data: ASCII vs. LAS)

An interesting case study of file formats is LiDAR data. LiDAR point clouds have been stored in two main formats: ASCII text, and LAS files. The first choice, using ASCII files, has the coordinates as fields along with the attributes, andthe fields of the LiDAR points are separated by commas (also described as a comma-separated-value, or CSV file). The second competing choice is LAS, a standardized binary format for storing LiDAR points. (Note well: A norm in much of the industry is converting LiDAR points (i.e., elevations) to raster. The conversion to raster, however, does yield significant drawbacks that the user must be willing to accept such as the loss of attribution, dataset richness, and precision that result from resampling a dataset. The lack of point cloud storage and for reasons previously discussed, raster will not be considered for the remainder of the case study discussion.)


File Size (ASCII vs. LAS)

Earlier we compared the size of a number stored in ASCII text as opposed to binary representation. One billion was 10 bytes in ASCII but only 4 bytes in binary. Applying this example of storage savings to the attributes of a LiDAR point, the LAS format can reduce files sizes significantly over ASCII storage. Depending on the number of attributes stored in the ASCII text file, the LAS format can reduce the file size by as much as 35% to 80%. Given that LiDAR projects are typically on the order of hundreds of gigabytes, LAS over ASCII results in huge savings in file sizes over the entire project.


Access to Points (ASCII vs. LAS)

Among other things, a reduction in file size by using the LAS format also speeds up the process of reading the files. Less bytes per point in the file means less bytes that need to be read from the file. In addition, points stored in binary form do not require conversion and are readily available by the software application. LiDAR data involves hundreds of millions of points and the collective savings of reading less bytes in a machine-readable form results in a much faster application than ASCII text.

The LAS file format has a regular structure (i.e., 227 byte header, variable length record (VLR) section, and point data of equal size in bytes) that enables the file to be spatially indexed. An example of a spatial index for an LAS file would be tiling the file into a set of virtual grids of a particular size and locating the byte offset of the points that belong within each grid. If the user zoomed into the northwest section of the file, the software application would locate the virtual grids that the view intersected and know exactly where to go in the file using the byte offsets. An ASCII text file lacks the regular structure required to index the points in the file rendering it impractical for use with such datasets as LiDAR. Without the index, the software application would need to read each point in the file to determine if the point is within the view producing a very frustrating experience at best.

Perhaps the lack of software tools using the LAS format in the past has driven the industry to use ASCII text files (later converted to raster) for LiDAR data. However, the LAS format is clearly the better file format for LiDAR data. The LAS format, thanks to the foresight of the contributing industry leaders, is an ASPRS approved standard. The file format is available from www.lasformat.org, and a mailing list is available for discussion of the file format.

More and more software packages are arriving on the market using the LAS format. Along with standalone tools for viewing LAS files, there are integrated plug-ins to Microstation and ESRI"s ArcGIS to use LAS files. QCoherent Software"s LP360 LiDAR plug-in for the ESRI environment is an exciting new option that allows users to see and use their LiDAR data with their GIS data. Such creative and innovative tools like LP360 may prevent users in the future from converting LiDAR to raster more frequently. In addition products such as LP360 could reduce or eliminate the need to deliver LiDAR data as ASCII text files. LP360 is planned for release in May of this year (downloads available at www.qcoherent.com).

LP360 is available now, see the LP360 product page.