Archive for August, 2007

Storage, Background, Amazon, FERI, mapping, WeoGeo Server

How do you deliver 100 40GB imagery files?

This is a bit tougher than the solution discussed in this earlier post. When we (FERI) first started developing HSI sensors and flying them for others, the distribution of imagery data was mainly through DVDs. As the research groups got larger, we started getting more and more requests for data. This eventually led to the WeoGeo Server solution, which allows for customization and asynchronous delivery.

However, 100 40GB files that look like Figure 2 in my HSI post means 4TB of data through our lab’s pipe in a relatively short period of time. Our bandwidth at the time we were trying to develop these solutions was a dedicated T1, or 1.5 mbits per second. To transfer 4 TBs of imagery files with full access of our pipe would require 259 days.

Clearly there are some solutions these days that would have helped this type of large file distribution effort. Akamai, Limelight Networks, or some bittorrent solution would provide capabilities to deliver large files over distributed networks. However, we were also providing search and customization solutions, which required modification of the data before delivery. This meant that we had a scalability problem in processing as well as delivery. Edge distribution solutions would solve one part of our problem, but not necessarily the processing part.

We began to explore co-location solutions, but these seemed to require a lot of upfront costs, as well as travel and maintenance expenses. As a small business, those capital expenditures were more than we could absorb. It was at this point that we were introduced to Amazon Web Services by a former co-worker who had been recruited by Amazon. AWS allowed us to build a distribution of large data files on top of a very large pipe via S3. (I’ll discuss the processing using EC2 later). It provided us scalable distribution at reasonable cost for those 100 40GB files.

To be honest, there are some devils in the details in using S3 for our operations. But (to date), the service has been more valuable than costly. The rapid ingestion of large files into S3 is a current problem that we are trying to solve. Moving forward we hope to build on the expansion of S3 as Amazon develops more physical data storage locations. This will provide us with some of the edge distribution advantages of the above solutions, while keeping us connected with our virtual computing solutions on EC2.

I’m also curious to see how others are using S3 in geospatial solutions; if you have a unique one, please let me know.

Background, Hyperspectral, Amazon, WeoGeo, grid computing, FERI, WeoGeo Server

40 GB Imagery File Redux

An obvious question that drops out of yesterday’s post on the right file format to use to distribute large raster files is, “How do you distribute a 40 GB file?” The distribution of a single 40 GB file would overwhelm the bandwidth of many small businesses. That was one of the reasons we originally developed the WeoGeo Server.

Figure 1. WeoGeo Server (click on the image to see more information)

The Server allows the mapping organization to distribute customer-defined customized products that would reduce the required file size, and thus bandwidth, to satisfy their customers’ demand. However, there is still the use case where the customer wants the whole file.

Since FERI is a small business, we couldn’t have our daily research activities impacted by an imagery request. So the first (obvious) step was to develop a customization and distribution system that processes a data request in an asynchronous manner, i.e. the order is taken during business hours, but it is processed and delivered after business hours. This allowed us to optimize our bandwidth in our labs and still reasonably satisfy customer demands (assuming they did not need instantaneous data delivery). We also tweaked the system to allow some small files and all of our own requests to be processed immediately, while larger ones for external users were processed in the evenings.

The asynchronous data delivery is also a fundamental difference between our technology and online GIS servers. We optimized for discovery, customization, and ordering in a way that allows the customer to receive near-instant gratification on the discovery and ordering, while (possibly) delaying gratification on the delivery.

While the customization of product selection and the asynchronous processing and delivery bought us some additional help in terms of distributing large geospatial content files, it still did not help us with the problem of what to do with multiple requests for 40 GB image files. This is where some of my earlier posts, where I described our use of Amazon Web Services, begin to make some sense (and maybe why Jinesh digs what we are doing).

However, I am late for dinner, so I’ll pick up this theme on a later post…

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping, BigTIFF

What file format do you use for a 40GB image? (BigTIFF!)

Large imagery files are a problem. In the hyperspectral world, we send things via ENVI’s file format (BSQ, BIL, or BIP). ENVI was designed by folks doing HSI remote sensing and was optimized to easily handle large raster images. The use of this file format allows us to deliver extremely large raster files, with a separate header that described all the channels, bands, or layers in the image.

Unfortunately, not everyone owns a copy of ENVI. It is an expensive image processing package. While other remote sensing and GIS packages claimed to handle multi-band imagery data, we found that support for imagery with bands n > 3 was difficult at best. So if our customers at FERI didn’t have ENVI, the transport of the imagery had to be accomplished in another file format. The most common format other than the ENVI format for us was GeoTIFF.

Unfortunately, the GeoTIFF format is limited to 4 GB. This is clearly problematic for the image shown here in Figure 1.

Figure 1. HSI imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This image is 156 band hyperspectral mosaic. The entire image at is native spatial resolution equals 40 GB in size. Cutting this data into 10 tiles of 4 GB a piece would be one way to deliver this data set. But this is problematic for both us and the receiver of the images, as the time, energy, and effort to tile and then re-mosaic is less than efficient.

You could also say that for the most part that HSI data is a relatively small backwater of the remote sensing community, so why worry about it. To this I would respond with this imagery that we collected at the same time in Figure 2.

Figure 2. 3-Band DSS imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This is a 3-band RGB from an Applanix DSS. The resolution was about 1/6 the spatial resolution of the HSI sensor. The higher spatial resolution makes this image nearly as large as the HSI image. We actually incurred the pain of tiling the full image set for our original customer because they had only ESRI software with which to analysis this image.

Our friends at GDAL asked us about sponsoring a new file format, BigTIFF, which would be based on extending the TIFF format. We were happy to step up to help make this happen. I believe that the other sponsors had similar file storage and distribution issues, and we look forward to broad acceptance of this file format.

It will certainly make our distribution issues easier.

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping

HyperSpectral Imaging (HSI) and the Path to a Digital Marketplace

WeoGeo was born from a need to preview, share, and distribute geospatial content. Our experience with this goes back nearly 9 years in developing a technology called environmental HyperSpectral Imaging (HSI) spectroscopy (see our non-profit research efforts at the Florida Environmental Research Institute). HSI technology is built upon collecting images at many narrow discrete wavelengths to build up a calibrated spectrum for each pixel in the image (Figure 1). Each of these discrete wavelengths is stored as a unique spectral channel yielding dozens, even hundreds, of bands of color information (as opposed to consumer cameras with three bands: Red, Green, and Blue). We created some novel techniques (including WeoGeo) to process, store, and deliver those hundreds of bands efficiently.


Figure 1. HyperSpectral Imaging Concept.

HSI is not a new field. The US government has been actively supporting it development for over 2 decades. The best known aircraft HSI instrument is run by NASA JPL. They have been operating the AVIRIS sensor since the early 1990’s for earth sciences studies. Two recent satellite HSI missions include NASA’s Hyperion and ESA’s CHRIS sensors. Our contribution to this field has been focused on dark target spectroscopy for water applications. Our primary patrons in the development of HSI for water have included the Office of Naval Research (ONR) and the National Oceanic and Atmospheric Administration (NOAA). Both agencies have an interest in finding and identifying things in the water using automated targeting and classification techniques. Basically we have been trying to “see” through the water to determine the depth of the water, the bottom habitat, and the water quality (Figure 2).

Figure 2. Imaging through the water. The color of light leaving the water is affected by the depth of the water, the stuff in the water, and stuff on the bottom.

Water is called a “dark target” because the reflectance of light from beneath the water is usually less than 1%. (“bright” land targets can be greater than 50%). This is important for signal processing where the quality of the feature map is strongly dependent on the signal to noise in the imagery, which is directly dependent on the target reflectance. The Spectrographic Aerial Mapping System with On-board Navigation (SAMSON) that FERI built and deploys is specifically designed to simultaneously handle bright and target targets.

Figure 3. FERI’s Spectrographic Aerial Mapping System with On-board Navigation (SAMSON; top image) and Ground Processing Unit (GPU; bottom image).

During September of 2006 FERI conducted a mission for NOAA to demonstrate the capabilities of HSI for detecting red tides. Figure 4 shows some results from one the largest Harmful Algal Bloom (HAB) ever recorded in the US. This three band false color composite was created with 3 narrow bands in the blue, green, and near infrared from the full 188 band hyperspectral imaging cube.


Figure 4. False color composite of red tide in Monterey Bay created from HSI image.

An example of how imaging spectroscopy is useful in quantitatively determining the extent of the HAB in this region may be seen in Figure 5 where the full spectra (uncorrected for atmospheric interference and illumination effects) is shown in comparison to a spectra collected outside of the red tide region. The biggest difference is seen in the near infrared region which is responding to increased reflectance of light by the dinoflagellates in the bloom.

Figure 5. A quantitative look at the spectra from an HSI image inside and outside of the bloom. The green line is the spectra inside of the bloom, the pink line is from outside of the bloom. The big difference around 710 nm results from the large numbers of dinoflagellates that reflect light out of the water. A different effect accounts for the difference seen in the 400 to 600 nm range where the dinoflagellates have pigments that absorb light. These pigments result in less light being reflected out of the water where high concentrations of these dinoflagellates are be found.

The more subtle differences in the blue and green regions relate to the differences in absorption of light by the pigments in the dinoflagellates. The change in relative reflectance is what gives this bloom its characteristic “red” color (Figure 6).

Figure 6. Red tide (HAB) as seen from the research vessels collecting data during the experiment. (Photo courtesy of Dr. R. Kudela, UCSC.)

An advantage of HSI is automatically rendering data into feature extracted maps. Automated, in this case, means that an algorithm (as opposed to an expert) can render the imaging data stream into maps of bathymetry, red tides, sea grass beds, wetlands vegetation, habitat maps, land use change, etc. Automated is important because these imaging data can be terabytes in size. The time requirements just to load the imagery into computer memory for viewing and editing can be onerous. Trying to manipulate and analyze the imagery for features, targets, and materials taxes the time and computer systems requirements to the point of making HSI technology and products the realm of the few.

The ideal approach is to use well calibrated sensors to remove atmospheric and illumination effects (the subjects of future blog entries) to generate HSI imagery that can be directly processed into target and feature maps during the initial image processing. This approach can render products like Figure 7 in less than 8 hours of processing on FERI’s field processing station (right side of Figure 3). These map products are much smaller in size than the original imagery data and contain valuable information for users that are unfamiliar with spectroscopy itself. Using automated feature extraction techniques with HSI provides a mechanism for mapping our world more quantitatively and more frequently than is currently being accomplished with traditional field and photogrammetry techniques. It is the future of remote sensing.

Figure 7. The concept of automated feature extraction and classification applied to the wetlands of Morro Bay, CA using HSI data.

The concept for a server that could handle TBs of HSI imagery was originally conceived as a mechanism for FERI to serve its research partners. WeoGeo Market and Server took this concept and expanded it to handle a larger number of map forms, in a more intuitive manner. The Market provides a portal where other can contribute their value-added mapping content and be compensated. Server gives an enterprise the ability to manage its geospatial content, as well as easily monetize that content. Together they help address what became one of our hardest technical challenges at FERI – How do we serve our partners the maps that they want?

Amazon, WeoGeo, geospatial, mapping

The Expansion of Geospatial Content

I ran into an interesting article on the Amazon Web Services Blog on Metropix. The company appears to be trying to help real estate agents market properties with the creation of 2 and 3-D floor plans available properties. (They are using S3 to host their data files).

What struck me was a comment at the end of the blog, which stated “that content is still king on the Net.” In this case, the content is a form of geospatial content, just not a form that we in the professional services industry might think of as geospatial content. The GIS or survey community might overlook this content as being advertising or marketing driven, not the quantitative content that you could determine water flows through a flood plain or predict this season’s forest fire.

However, Metropix’s content is a shining example of what I believe will be an explosion of “geospatial” content. Any digital content that can be tied to a point of the ground that has value to another should be considered geospatial. This wider view dramatically expands our “traditional” concept of geospatial content, and points the way to a larger future for our field.

The question will be how to monetize this digital information. The most frequently used revenue stream in the professional services field is direct consulting services sales (e.g CH2M Hill), but there are many individual companies that are also providing direct internet sales of data to their customers (e.g Digital Globe). I would put Metropix in this category of revenue. There is of course the advertising-based model that would appear to be Google Earth’s focus (in spite of the sales of Pro and Enterprise editions).

We believe that the WeoGeo model, which combines the marketing of digital data with the services of discovery, customization, hosting, and delivery, as well as the content and derivative license management will help facilitate the expansion of the geospatial content market. A bigger geomarket helps us all generate more revenues, drawing more people and better content to the field.

I’ll match the “content is king” quote with this one, “A rising tide raises all boats.”

Amazon, WeoGeo, FERI, mapping

AWS and Web 2.0 Mapping

I have been a bit delinquent in posting to this blog as of late. I am shaking the dust off of my blog because of the post that Jinesh Varia made about WeoGeo. Mapping, particularly quantitative mapping like GIS, and AWS go together like peanut butter and jelly (I have 3 small kids who have been out of school all summer, so this was the first analogy that came to mind). The utility computing of EC2 and the large web-addressable disk storage of S3 provide opportunities for developing and sharing of mapping products that previously were cost prohibitive. Being Jinesh’s favorite in this category is way cool (and I plan to send him a PB&J for lunch).

We have been very busy, with some real exciting things happening. I hope to share many of them shortly. One of the things we have been working on is the delivery of our first WeoGeo Server to the College of Ocean and Atmospheric Sciences at Oregon State University. You can see their front page here, but you have to register to get access. Access is currently limited to those involved with a red tide experiment in Monterey Bay, CA during September 2006. (We were involved in the NOAA experiment through FERI, operating our HyperSpectral Imaging (HSI) system.) In addition, we have been working on bringing the Seller site of WeoGeo Market out of Private Beta.

I know I have been remiss on posting, but between the kids’ summer vacation, the delivery of Server, beta responsibilities for WeoGeo and WeoCEO, and the scientific responsibilities of FERI, I have let the job of blog posting slide. I promise more posts on imaging sciences, GIS, and utility real soon.