Archive for the 'Hyperspectral' Category

Remote Sensing, Hyperspectral, FERI, mapping

Image Fusion and Sharpening with Multi and Hyperspectral Data

The panchromatic limitations of WorldView-1, recently launched by Digital Globe, have brought a few posts (e.g. free geography tools and the confused life) on the fusion of high spatial resolution panchromatic imagery (PAN) with lower spatial resolution multispectral imagery (MSI). I thought I would briefly comment on image fusion because over the years it has become easier to accomplish, but the results or limitation of the fused product may be difficult to understand.

There are many ways to accomplish pan-sharpening including band substitution, color space transformation and substitution, and Principle Component Substitution (Jensen,2005). As mentioned on the confused life, temporal decorrelations introduce artifacts into a fused, or PAN-sharpened image. However there are other artifacts that can be equally important if one is trying to create a quantitative product for classification mapping or target detection.

The inherent difficulty with all of the PAN sharpening methods is that they are fundamentally based on the technical and environmental conditions under which the PAN imagery was collected. Since it is difficult, if not impossible, to accurately correct for illumination and atmospheric conditions in PAN sharpened imagery (subject for a much longer post), the PAN-sharpened images may be limited to classification and detection within a scene. Inter-scene comparisons (i.e. change detection between scenes or cross scene classifications) using spectral properties require the aforementioned corrections. In addition, when the instantaneous field of view (IFOV) of the PAN and/or MSI sensors are too large, spectral and illumination changes will be present at the edges of the image, making even within scene classifications difficult. Because of these issues, PAN-sharpened multispectral images are frequently used to identify features based on relative color differences within an image, rather than target identification or environmental characterization based on a spectral signature itself.

Figure 1. The fusion of high spatial resolution MSI (left figure) with lower spatial resolution HSI (middle figure) into a high spatial resolution, high spectral resolution image (right image). The bottom row of images represents the spectral plots at the pixel located at the center of the red cross hairs in the images directly above them.

We have done some work in this area, mainly focused on sharpening hyperspectral imagery (HSI) with multispectral imagery (MSI). Figure 1 shows the results of some of our efforts. The left image is a high resolution MSI from an Applanix DSS. Underneath it is the digital value of the RGB channel of the image. The middle image is the lower spatial resolution HSI; and underneath it is the full spectrum resolution of the HSI vector (~3 nm resolution). By fusing these two images together (right image), we were able to create a high spatial resolution sharpened HSI image whose spectral vector matched reasonably well with the spectral vector from the original HSI image. The use of atmospheric- and illumination-corrected HSI imagery means that we could make classification comparisons or target detections using these spectra much more robustly across scenes in time and space.

When making fused, or derivative, mapping products the value of the map is critically determined by the base mapping material and the skill of the map producer. Understanding the limitations of the base mapping material as well as the fusion techniques themselves is a critical determinate in the value of a derivative mapping product.

References
Jensen, John J., Introductory Digital Image Processing: A Remote Sensing Perspective. Prentice-Hall, Englewood Cliffs, NJ, 2005, 526 pp.

Storage, Background, Remote Sensing, Hyperspectral, Amazon, WeoGeo, geospatial, grid computing, WeoCEO, mapping, WeoGeo Server

Image Processing and Delivery using Virtual Computing on EC2

I posted last week about bandwidth issues associated with geospatial data and our AWS S3 solution. The deciding factor for us to use Amazon’s offerings was not necessarily the edge distribution capabilities of S3, but the synergy from combining S3 data storage and distribution with virtual computing capabilities of EC2. There are multiple issues in image processing that require a ton of memory space and CPU horsepower. In both Market and Server, we offer the following basic map distribution options to our map providers -

Geo Clipping (6 zoom levels, allowing for ~125 million possible selections per data set)
Spatial Resampling (4 levels)
Layer Resampling (depends on data)
Output File types (5 - JPEG, GeoTIFF, ENVI, ESRI BIL, ERDAS IMG)
Projections (5 - UTM, Transverse Mercator, Lambert Conic, Albers Equal, Geographic)
Datums (3 - WGS84, NAD 83, NAD 27)

These options result in millions of possible map variants, which preclude the storage of each variant for distribution. So processing power for conversion is critical; and this processing power needs to be connected to a large, web-addressable, temporary data storage array to house the unique variant that a map user has selected. Now for a true mapping marketplace, this infrastructure needs to support 100s to possibly 1000s of simultaneous map requests from the same base map like the 40 GB image in Figure 1. Doing our NeoMapping Market correctly requires the creation of enormous processing, storage, and bandwidth infrastructure.

Figure 1. 40 GB, 156 layer HyperSpectral Imagery (HSI) map listed on WeoGeo Market. (Click on image to go to the listing in the Market).

However, who could afford that infrastructure upfront? Our original estimates for acquiring base computation needs and placing them into a co-location facility were around $500K. While not a lot of money in the scale of today’s internet operations, it was big for us. In addition, we were trying to develop the software architecture to support the Market and Server, and these expenses were large in it of themselves. AWS provided a unique and simultaneous answer to many of our immediate storage, processing, and distribution needs.

Developing our infrastructure on the scalable AWS solution allows us to say we can support the 1000s of map requests required for a functioning digital marketplace. The user experience is vital to the service’s credibility and therefore our success. However, there is a true (and in a number of cases unexpectedly high) cost in this decision. We traded high capital expenditures for high operating expenditures. In an upcoming post, I’ll talk about the Total Cost of Operations (TCO) on AWS, and some of the ways we are moving to reduce these high operating expenses through stability and scaling solutions. Some of these solutions we have turned into products that we provide to others (e.g WeoCEO)..

I would be interested in hearing about the actual experience of others on AWS and whether S3 and EC2 could or could not meet their needs.

Background, Hyperspectral, Amazon, WeoGeo, grid computing, FERI, WeoGeo Server

40 GB Imagery File Redux

An obvious question that drops out of yesterday’s post on the right file format to use to distribute large raster files is, “How do you distribute a 40 GB file?” The distribution of a single 40 GB file would overwhelm the bandwidth of many small businesses. That was one of the reasons we originally developed the WeoGeo Server.

Figure 1. WeoGeo Server (click on the image to see more information)

The Server allows the mapping organization to distribute customer-defined customized products that would reduce the required file size, and thus bandwidth, to satisfy their customers’ demand. However, there is still the use case where the customer wants the whole file.

Since FERI is a small business, we couldn’t have our daily research activities impacted by an imagery request. So the first (obvious) step was to develop a customization and distribution system that processes a data request in an asynchronous manner, i.e. the order is taken during business hours, but it is processed and delivered after business hours. This allowed us to optimize our bandwidth in our labs and still reasonably satisfy customer demands (assuming they did not need instantaneous data delivery). We also tweaked the system to allow some small files and all of our own requests to be processed immediately, while larger ones for external users were processed in the evenings.

The asynchronous data delivery is also a fundamental difference between our technology and online GIS servers. We optimized for discovery, customization, and ordering in a way that allows the customer to receive near-instant gratification on the discovery and ordering, while (possibly) delaying gratification on the delivery.

While the customization of product selection and the asynchronous processing and delivery bought us some additional help in terms of distributing large geospatial content files, it still did not help us with the problem of what to do with multiple requests for 40 GB image files. This is where some of my earlier posts, where I described our use of Amazon Web Services, begin to make some sense (and maybe why Jinesh digs what we are doing).

However, I am late for dinner, so I’ll pick up this theme on a later post…

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping, BigTIFF

What file format do you use for a 40GB image? (BigTIFF!)

Large imagery files are a problem. In the hyperspectral world, we send things via ENVI’s file format (BSQ, BIL, or BIP). ENVI was designed by folks doing HSI remote sensing and was optimized to easily handle large raster images. The use of this file format allows us to deliver extremely large raster files, with a separate header that described all the channels, bands, or layers in the image.

Unfortunately, not everyone owns a copy of ENVI. It is an expensive image processing package. While other remote sensing and GIS packages claimed to handle multi-band imagery data, we found that support for imagery with bands n > 3 was difficult at best. So if our customers at FERI didn’t have ENVI, the transport of the imagery had to be accomplished in another file format. The most common format other than the ENVI format for us was GeoTIFF.

Unfortunately, the GeoTIFF format is limited to 4 GB. This is clearly problematic for the image shown here in Figure 1.

Figure 1. HSI imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This image is 156 band hyperspectral mosaic. The entire image at is native spatial resolution equals 40 GB in size. Cutting this data into 10 tiles of 4 GB a piece would be one way to deliver this data set. But this is problematic for both us and the receiver of the images, as the time, energy, and effort to tile and then re-mosaic is less than efficient.

You could also say that for the most part that HSI data is a relatively small backwater of the remote sensing community, so why worry about it. To this I would respond with this imagery that we collected at the same time in Figure 2.

Figure 2. 3-Band DSS imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This is a 3-band RGB from an Applanix DSS. The resolution was about 1/6 the spatial resolution of the HSI sensor. The higher spatial resolution makes this image nearly as large as the HSI image. We actually incurred the pain of tiling the full image set for our original customer because they had only ESRI software with which to analysis this image.

Our friends at GDAL asked us about sponsoring a new file format, BigTIFF, which would be based on extending the TIFF format. We were happy to step up to help make this happen. I believe that the other sponsors had similar file storage and distribution issues, and we look forward to broad acceptance of this file format.

It will certainly make our distribution issues easier.

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping

HyperSpectral Imaging (HSI) and the Path to a Digital Marketplace

WeoGeo was born from a need to preview, share, and distribute geospatial content. Our experience with this goes back nearly 9 years in developing a technology called environmental HyperSpectral Imaging (HSI) spectroscopy (see our non-profit research efforts at the Florida Environmental Research Institute). HSI technology is built upon collecting images at many narrow discrete wavelengths to build up a calibrated spectrum for each pixel in the image (Figure 1). Each of these discrete wavelengths is stored as a unique spectral channel yielding dozens, even hundreds, of bands of color information (as opposed to consumer cameras with three bands: Red, Green, and Blue). We created some novel techniques (including WeoGeo) to process, store, and deliver those hundreds of bands efficiently.


Figure 1. HyperSpectral Imaging Concept.

HSI is not a new field. The US government has been actively supporting it development for over 2 decades. The best known aircraft HSI instrument is run by NASA JPL. They have been operating the AVIRIS sensor since the early 1990’s for earth sciences studies. Two recent satellite HSI missions include NASA’s Hyperion and ESA’s CHRIS sensors. Our contribution to this field has been focused on dark target spectroscopy for water applications. Our primary patrons in the development of HSI for water have included the Office of Naval Research (ONR) and the National Oceanic and Atmospheric Administration (NOAA). Both agencies have an interest in finding and identifying things in the water using automated targeting and classification techniques. Basically we have been trying to “see” through the water to determine the depth of the water, the bottom habitat, and the water quality (Figure 2).

Figure 2. Imaging through the water. The color of light leaving the water is affected by the depth of the water, the stuff in the water, and stuff on the bottom.

Water is called a “dark target” because the reflectance of light from beneath the water is usually less than 1%. (“bright” land targets can be greater than 50%). This is important for signal processing where the quality of the feature map is strongly dependent on the signal to noise in the imagery, which is directly dependent on the target reflectance. The Spectrographic Aerial Mapping System with On-board Navigation (SAMSON) that FERI built and deploys is specifically designed to simultaneously handle bright and target targets.

Figure 3. FERI’s Spectrographic Aerial Mapping System with On-board Navigation (SAMSON; top image) and Ground Processing Unit (GPU; bottom image).

During September of 2006 FERI conducted a mission for NOAA to demonstrate the capabilities of HSI for detecting red tides. Figure 4 shows some results from one the largest Harmful Algal Bloom (HAB) ever recorded in the US. This three band false color composite was created with 3 narrow bands in the blue, green, and near infrared from the full 188 band hyperspectral imaging cube.


Figure 4. False color composite of red tide in Monterey Bay created from HSI image.

An example of how imaging spectroscopy is useful in quantitatively determining the extent of the HAB in this region may be seen in Figure 5 where the full spectra (uncorrected for atmospheric interference and illumination effects) is shown in comparison to a spectra collected outside of the red tide region. The biggest difference is seen in the near infrared region which is responding to increased reflectance of light by the dinoflagellates in the bloom.

Figure 5. A quantitative look at the spectra from an HSI image inside and outside of the bloom. The green line is the spectra inside of the bloom, the pink line is from outside of the bloom. The big difference around 710 nm results from the large numbers of dinoflagellates that reflect light out of the water. A different effect accounts for the difference seen in the 400 to 600 nm range where the dinoflagellates have pigments that absorb light. These pigments result in less light being reflected out of the water where high concentrations of these dinoflagellates are be found.

The more subtle differences in the blue and green regions relate to the differences in absorption of light by the pigments in the dinoflagellates. The change in relative reflectance is what gives this bloom its characteristic “red” color (Figure 6).

Figure 6. Red tide (HAB) as seen from the research vessels collecting data during the experiment. (Photo courtesy of Dr. R. Kudela, UCSC.)

An advantage of HSI is automatically rendering data into feature extracted maps. Automated, in this case, means that an algorithm (as opposed to an expert) can render the imaging data stream into maps of bathymetry, red tides, sea grass beds, wetlands vegetation, habitat maps, land use change, etc. Automated is important because these imaging data can be terabytes in size. The time requirements just to load the imagery into computer memory for viewing and editing can be onerous. Trying to manipulate and analyze the imagery for features, targets, and materials taxes the time and computer systems requirements to the point of making HSI technology and products the realm of the few.

The ideal approach is to use well calibrated sensors to remove atmospheric and illumination effects (the subjects of future blog entries) to generate HSI imagery that can be directly processed into target and feature maps during the initial image processing. This approach can render products like Figure 7 in less than 8 hours of processing on FERI’s field processing station (right side of Figure 3). These map products are much smaller in size than the original imagery data and contain valuable information for users that are unfamiliar with spectroscopy itself. Using automated feature extraction techniques with HSI provides a mechanism for mapping our world more quantitatively and more frequently than is currently being accomplished with traditional field and photogrammetry techniques. It is the future of remote sensing.

Figure 7. The concept of automated feature extraction and classification applied to the wetlands of Morro Bay, CA using HSI data.

The concept for a server that could handle TBs of HSI imagery was originally conceived as a mechanism for FERI to serve its research partners. WeoGeo Market and Server took this concept and expanded it to handle a larger number of map forms, in a more intuitive manner. The Market provides a portal where other can contribute their value-added mapping content and be compensated. Server gives an enterprise the ability to manage its geospatial content, as well as easily monetize that content. Together they help address what became one of our hardest technical challenges at FERI – How do we serve our partners the maps that they want?

Background, Remote Sensing, Hyperspectral, Amazon, FERI

Mapping with Amazon’s Mechanical Turk

I was saddened today by the news of Jim Gray. I heard about it from my colleague who pointed me to the efforts of Michael Arrington at TechCrunch and Werner Vogels at Amazon. I feel somewhat connected to the effort because of the hours spent on Michael’s site, and our development of a new internet business using Amazon’s S3/EC2 systems. Mostly I feel connected because finding things in the ocean using imagery is what we do.

My first thought was we can help, particularly after I saw that the NASA ER2 flew with a hyperspectral imager. This is what we do. We recently demonstrated (see here as well) the capability to NOAA NESDIS to collect and process nearly 4000 square kilometers of coastal ocean hyperspectral (5 m resolution, 256 channels in the visible and near infrared) and multispectral (0.8 m resolution, 3 channels) data in less than 18 hours. Our flight imagery is ~1 TB in raw form, and up to 5 TB processed, and we are some of the best people I know at the imagery and processing game. I figured that since we have an EC2/S3 account for WeoGeo, so we could upload some of our image processing software and get in there and help.

It was then that my colleague had to rein me in. Jim Gray had been missing since last Sunday, and the ER2 data was very limited. The oceanographer in me took a deep breath, and after reviewing more about the availability of the imagery, I realized there was probably very little that we could do to help. The ocean is a big place, and while the amount of imagery was large, the ocean was a lot larger.

In addition, the visible imagery was limited to just a few bands. Just a few bands means that there are limited degrees of freedom to use automated feature extraction techniques (that is a techie term that just means to use the computer to sift through the imagery to yield the information for which you are searching). The fewer the bands, the more that sensor, illumination, and environmental noise dominate the imagery, the less likely you will be able to find the object of your search.

Werner Vogels sought to use one of the best tools he had available, the Mechanical Turk. It was one of the quickest methods to put eyeballs on the imagery. By using S3, they had the means to store and distribute large volumes of imagery. Unfortunately, people’s eyes are just not that sensitive to noisy, low spectral information. It is very hard to “see” something in ocean imagery. Particularly if it has been compressed in some part of the processing, which frequently removes all the targets you are interested in finding. That’s why we use high resolution spectral and spatial data and develop the processing algorithms to have the computer render these volumes of data into the maps that tell us something important. In military parlance, it is call actionable geospatial intelligence. In this case, it is about saving lives.

Spectral imaging is not the only means to find things on the water. There are other systems that can be used for ship tracking. Microsoft’s Vexcel has the capabilities to use SAR data for this purpose, and I am sure they will put these to use. It is a credit to Werner and the community that the have been able to respond as rapidly as they have. However, I am still feeling a sense of failure. Our community (scientific, engineering, imaging, GIS, etc.) knows how to accomplish these types of mapping goals to save lives and property. The problem is that there has not been enough demand in the results to justify the expenditures at the current price of the systems and products.

The systems that we fly are $1 million+. The processing costs are $10,000s (sometimes up to $100,000s) per day of operation. The issue is one of scalability and demand pull. For an integrated Search And Rescue (SAR) system to have provided help to Jim Gray, it would have needed to be a fraction of those costs, rapidly deployed on manned and unmanned vehicles flying at high altitudes (including space), delivering actionable maps within hours (if not minutes) of landing or downlink. Such technology is obtainable, but the capital investment is large.

We are trying as hard as we can, to the best of our abilities, to change the mapping game by creating and sharing knowledge, not just pictures. This will take time.

My heart and prayers go out to the friends and family of Jim Gray. I just wish we could help today.

 

Update: 1730 EST, February 5, 2007
I spoke with a contact at NASA JPL. It appears that the NASA ER2 flew without the hyperspectral sensor, but with another imaging package. WPB

Storage, Background, Hyperspectral, FERI

We need more space.

We have to buy more disks today. Actually not just disks, but most probably another RAID cabinet. I work for the Florida Environmental Research Institute, which specializes in making maps from aircraft and satellites. We primarily create maps using a special form of automated feature extraction to interpret remote sensing data, or more specifically HyperSpectral Imaging (HSI). More on this subject in a later blog, but simply put, we quantitatively separate features on the ground based on their color. “Quantitatively” is important because it means we can do it on a computer, routinely and automatically, using algorithms and computer code. It’s very cool stuff; we think it will change the world. Suffice to say, this is cutting-edge mapping technology.

The problem we have is that our imaging sensors produce about 1 TB of flight operations data per day. Already massive, these data then have to be calibrated and processed to accurately map the pixel-data in the image to an exact location on the ground. Finally we stitch all of the frames or individual lines of data together into mosaics of entire areas, which then allows us to create thematic maps for our clients. At each step of this processing, we at least double the data. For every day we fly to collect data, we could end up needing at least 5 TB online for continued imagery R&D. Ouch.

We’re usually processing multiple projects from multiple flight days in our lab. As you can imagine, our data storage requirements — barely manageable for even a large organization with an infinite budget — are nearly impossible for our small non-profit. We buy disks when we need them, unfortunately many times after we need them. By this “absolute necessity” approach, we conserve money by assuring we always use the latest, most cost-effective technology.

Today, I walked back to one of our image engineers and asked if I could have the early versions of a processed data set for St. Joseph Bay, Florida (see here for a Google Earth kml link for one of the flight segments). The final processed image data for this segment was only ~70 GB, but the interim processed data was ~2 TB. My colleague told me he’d dumped the interim data because he needed the space to work on one of our other missions. It would take him several days to restore and recreate the previous products without interrupting his current process. This was definitely not what I wanted to hear.

The immediate solution is to buy more storage space (which of course requires money we have yet to procure), but what about next time? Do we always just throw money at the problem? Clearly disk space will get cheaper, but increases in productivity and efficiency, which lead to better business opportunities and greater margins, are built on creatively using today’s infrastructure for tomorrow’s solutions. Waiting until tomorrow’s technology arrives, e.g. cheaper disks, is not going to create a better business model for this field. Mapping, particularly quantitative mapping — the kind that forms the basis for financial and resource management decisions — is still prohibitively cumbersome and expensive. In order to deal with the fact that it’s plain hard and expensive to collect, manage, and distribute mapping products, this field needs to get creative.