We have to buy more disks today. Actually not just disks, but most probably another RAID cabinet. I work for the Florida Environmental Research Institute, which specializes in making maps from aircraft and satellites. We primarily create maps using a special form of automated feature extraction to interpret remote sensing data, or more specifically HyperSpectral Imaging (HSI). More on this subject in a later blog, but simply put, we quantitatively separate features on the ground based on their color. “Quantitatively” is important because it means we can do it on a computer, routinely and automatically, using algorithms and computer code. It’s very cool stuff; we think it will change the world. Suffice to say, this is cutting-edge mapping technology.
The problem we have is that our imaging sensors produce about 1 TB of flight operations data per day. Already massive, these data then have to be calibrated and processed to accurately map the pixel-data in the image to an exact location on the ground. Finally we stitch all of the frames or individual lines of data together into mosaics of entire areas, which then allows us to create thematic maps for our clients. At each step of this processing, we at least double the data. For every day we fly to collect data, we could end up needing at least 5 TB online for continued imagery R&D. Ouch.
We’re usually processing multiple projects from multiple flight days in our lab. As you can imagine, our data storage requirements — barely manageable for even a large organization with an infinite budget — are nearly impossible for our small non-profit. We buy disks when we need them, unfortunately many times after we need them. By this “absolute necessity” approach, we conserve money by assuring we always use the latest, most cost-effective technology.
Today, I walked back to one of our image engineers and asked if I could have the early versions of a processed data set for St. Joseph Bay, Florida (see here for a Google Earth kml link for one of the flight segments). The final processed image data for this segment was only ~70 GB, but the interim processed data was ~2 TB. My colleague told me he’d dumped the interim data because he needed the space to work on one of our other missions. It would take him several days to restore and recreate the previous products without interrupting his current process. This was definitely not what I wanted to hear.
The immediate solution is to buy more storage space (which of course requires money we have yet to procure), but what about next time? Do we always just throw money at the problem? Clearly disk space will get cheaper, but increases in productivity and efficiency, which lead to better business opportunities and greater margins, are built on creatively using today’s infrastructure for tomorrow’s solutions. Waiting until tomorrow’s technology arrives, e.g. cheaper disks, is not going to create a better business model for this field. Mapping, particularly quantitative mapping — the kind that forms the basis for financial and resource management decisions — is still prohibitively cumbersome and expensive. In order to deal with the fact that it’s plain hard and expensive to collect, manage, and distribute mapping products, this field needs to get creative.
Tags: Background, FERI, Hyperspectral, Storage