Archive for the 'Background' Category

Background, WeoGeo, geospatial, mapping

Profiting from Collective Intelligence

I have had a number of questions from our private beta Providers that basically ask, “What maps should I be making?” To be honest, I wish I knew. In reality, WeoGeo Market was established to answer this very question.

We set up WeoGeo to lower the risks of creating and selling mapping products (most importantly by reducing marketing and transaction costs). We believe that by lowering the risk of creating and selling geo-content, more products could be created at lower prices. By combining more products at lower prices with a greater ability to find and customize those products, Users of those products would be more apt to purchase more geo-content. The overall goal is to create a truly functioning marketplace for geo-content. The end result would be a collective intelligence expressed through the market that would help all of us focus on making the most valuable geospatial products.

Answering the question of what product to create is one of the hardest parts of running FERI’s operations. While FERI is a research and development organization, we still had to perform the basic sales and marketing efforts of finding paying customers to support our development of value-added mapping products. It is a very time consuming and difficult process, requiring a lot of telephone calls and a lot of travel to search out programs that would value our imaging and mapping efforts. Such is the nature of sales and marketing, and every good salesman would tell you that is just the way you have to generate business.

However, as a scientist I want to focus on the generation of new mapping products. While I could (and still do) focus on sales and marketing, my real interest is in generating new mapping products that could help people make decisions with their resources or help save lives. With the hyperspectral imagery, we could develop maps focused on a variety of topics. These maps could range from Harmful Algal Blooms (HABs or more commonly called red tides) to Submerged Aquatic Vegetation (SAV) to detecting probable locations of Improvised Explosive Devices (IEDs). Yet, finding sufficient demand for these products to overcome the high initial production cost of creating these products is difficult. (I have a whole other story on the IEDs and how the DoD does business with contractors and appropriation earmarks that I’ll save for another time.)

Over the years, we have watched with great interest how the internet has impacted other businesses. One of the most interesting impacts that we have seen is the rise of shared intelligence from the accumulation of individual choices. For example, search engines have used the individual linkages of web page creators to develop a collective intelligence estimate of the most likely desired result for a search term (and a new industry of search engine optimization). In particular, we were fascinated by eBay’s ability to enable millions of people to develop larger markets for their niche products.

By establishing a functioning marketplace for these niche goods, eBay created liquidity and demand for products that previously had limited marketability. In the process of creating a market whose niches could be efficiently filled, they also provided opportunities for entrepreneurs to develop new markets. In effect, eBay created a platform that enabled individuals to make choices, create products, and satisfy the needs of others, which in turn created a positive feedback mechanism for everyone who participated. This led to the creation of whole businesses that did not exist prior to eBay, and the rise of the valuation of goods that previously had limited market enetration, and thus, underdetermined recognized value.

The increased liquidity of products and the collective actions of many individuals led to a self-sustaining marketplace that enriches all of the participants. eBay is a lesson in economic theory, and gives truth to the concept that “a rising tide lifts all boats.”

So what does this have to do with answering the question from our Providers about which maps to produce? The answer to that question is that I am not sure, but I can make sure that the Provider’s risk is low enough for them to make some reasonable choices, and to give them the agility to respond to market demands. Through this process, I believe that our collective intelligence will point Providers in the most profitable direction.

This marketplace will give those with the skills and those with the content the ability to connect as never before possible. The new network of connections will lead to the creation of new geo-content that will enhance and enrich the lives of our community. And our community will profit from it because we will know which maps to make.

Background, WeoGeo, geospatial, FERI, mapping, WeoGeo Server

Follow-up to Direction Magazine’s Podcast on WeoGeo

Adena Schutzberg did a podcast with me last week on the business model for WeoGeo. It was my first podcast and I hope that I made sense to people (I welcome comments and/or critiques in the comments section here). I would like to thank Adena for giving us the opportunity to tell our story.

However, I am not sure I was as clear as I could have been about our history and the importance that history in the development of WeoGeo. I could not quite put my finger on what was missing until after the AWS StartUp Event - Boston (see here as well for my comments) when someone asked how many man-years of effort went into developing the site.

My first response was to take the number of years that FERI was in operation times the number of people involved at FERI. Kind of silly, I know. But when I think of why we built WeoGeo, this response seems relevant. Their response, of course, was, “no really, how much technical development time?” I understood the question; the person was trying to ascertain how difficult it would be to recreate what we are doing.

Our technical development on this project did start back around 2001 with a project called Hyperspectral Data Repository On-line (HyDRO). This was our first distribution system, developed to help alleviate the problems associated in delivering HSI data to our customers. This concept and technology eventually evolved into the WeoGeo Server (see post here as well). Between 2001 and 2005 we had 4 PhDs and masters-trained personal spending a portion of their time on HyDRO because it was a critical element of our research programs. In the last couple of years, we increased the number of people working on WeoGeo Market/Server, to >12 currently if you include outside contractors. For the most part, they are highly trained GIS and MIS/CIS/CS personnel.

The technology is hot, no question about it. I am amazed on a daily basis what our group of people has developed for mapping on both commodity computers and utility computing systems. Yet, here is the rub to this type of man-years calculation. I really believe that the reasons for WeoGeo, and its associated development time, stem from our history at FERI, which makes such calculations difficult. The “technical development time” is not just time spent coding; it includes the needs assessment and the development of the system architecture to address critical problems and/or pain. What we have developed at WeoGeo is a direct function of two critical needs of our operations as a research and imagery services organization.

These two critical needs were (and still are):
1) Delivery of our survey grade, high volume mapping content;
2) Finding and acquiring other survey grade mapping content to fuse with ours to create value-added geocontent for our clients.

WeoGeo was built to solve these two critical problems (there are others, but not nearly as critical to our organization as these). If you have never been faced with these problems, then you might not appreciate the depth of the solutions we have built to service these needs (and its potential). But if you have, then you have felt our pain - and I hope value our solution.

Storage, Background, Remote Sensing, Hyperspectral, Amazon, WeoGeo, geospatial, grid computing, WeoCEO, mapping, WeoGeo Server

Image Processing and Delivery using Virtual Computing on EC2

I posted last week about bandwidth issues associated with geospatial data and our AWS S3 solution. The deciding factor for us to use Amazon’s offerings was not necessarily the edge distribution capabilities of S3, but the synergy from combining S3 data storage and distribution with virtual computing capabilities of EC2. There are multiple issues in image processing that require a ton of memory space and CPU horsepower. In both Market and Server, we offer the following basic map distribution options to our map providers -

Geo Clipping (6 zoom levels, allowing for ~125 million possible selections per data set)
Spatial Resampling (4 levels)
Layer Resampling (depends on data)
Output File types (5 - JPEG, GeoTIFF, ENVI, ESRI BIL, ERDAS IMG)
Projections (5 - UTM, Transverse Mercator, Lambert Conic, Albers Equal, Geographic)
Datums (3 - WGS84, NAD 83, NAD 27)

These options result in millions of possible map variants, which preclude the storage of each variant for distribution. So processing power for conversion is critical; and this processing power needs to be connected to a large, web-addressable, temporary data storage array to house the unique variant that a map user has selected. Now for a true mapping marketplace, this infrastructure needs to support 100s to possibly 1000s of simultaneous map requests from the same base map like the 40 GB image in Figure 1. Doing our NeoMapping Market correctly requires the creation of enormous processing, storage, and bandwidth infrastructure.

Figure 1. 40 GB, 156 layer HyperSpectral Imagery (HSI) map listed on WeoGeo Market. (Click on image to go to the listing in the Market).

However, who could afford that infrastructure upfront? Our original estimates for acquiring base computation needs and placing them into a co-location facility were around $500K. While not a lot of money in the scale of today’s internet operations, it was big for us. In addition, we were trying to develop the software architecture to support the Market and Server, and these expenses were large in it of themselves. AWS provided a unique and simultaneous answer to many of our immediate storage, processing, and distribution needs.

Developing our infrastructure on the scalable AWS solution allows us to say we can support the 1000s of map requests required for a functioning digital marketplace. The user experience is vital to the service’s credibility and therefore our success. However, there is a true (and in a number of cases unexpectedly high) cost in this decision. We traded high capital expenditures for high operating expenditures. In an upcoming post, I’ll talk about the Total Cost of Operations (TCO) on AWS, and some of the ways we are moving to reduce these high operating expenses through stability and scaling solutions. Some of these solutions we have turned into products that we provide to others (e.g WeoCEO)..

I would be interested in hearing about the actual experience of others on AWS and whether S3 and EC2 could or could not meet their needs.

Storage, Background, Amazon, FERI, mapping, WeoGeo Server

How do you deliver 100 40GB imagery files?

This is a bit tougher than the solution discussed in this earlier post. When we (FERI) first started developing HSI sensors and flying them for others, the distribution of imagery data was mainly through DVDs. As the research groups got larger, we started getting more and more requests for data. This eventually led to the WeoGeo Server solution, which allows for customization and asynchronous delivery.

However, 100 40GB files that look like Figure 2 in my HSI post means 4TB of data through our lab’s pipe in a relatively short period of time. Our bandwidth at the time we were trying to develop these solutions was a dedicated T1, or 1.5 mbits per second. To transfer 4 TBs of imagery files with full access of our pipe would require 259 days.

Clearly there are some solutions these days that would have helped this type of large file distribution effort. Akamai, Limelight Networks, or some bittorrent solution would provide capabilities to deliver large files over distributed networks. However, we were also providing search and customization solutions, which required modification of the data before delivery. This meant that we had a scalability problem in processing as well as delivery. Edge distribution solutions would solve one part of our problem, but not necessarily the processing part.

We began to explore co-location solutions, but these seemed to require a lot of upfront costs, as well as travel and maintenance expenses. As a small business, those capital expenditures were more than we could absorb. It was at this point that we were introduced to Amazon Web Services by a former co-worker who had been recruited by Amazon. AWS allowed us to build a distribution of large data files on top of a very large pipe via S3. (I’ll discuss the processing using EC2 later). It provided us scalable distribution at reasonable cost for those 100 40GB files.

To be honest, there are some devils in the details in using S3 for our operations. But (to date), the service has been more valuable than costly. The rapid ingestion of large files into S3 is a current problem that we are trying to solve. Moving forward we hope to build on the expansion of S3 as Amazon develops more physical data storage locations. This will provide us with some of the edge distribution advantages of the above solutions, while keeping us connected with our virtual computing solutions on EC2.

I’m also curious to see how others are using S3 in geospatial solutions; if you have a unique one, please let me know.

Background, Hyperspectral, Amazon, WeoGeo, grid computing, FERI, WeoGeo Server

40 GB Imagery File Redux

An obvious question that drops out of yesterday’s post on the right file format to use to distribute large raster files is, “How do you distribute a 40 GB file?” The distribution of a single 40 GB file would overwhelm the bandwidth of many small businesses. That was one of the reasons we originally developed the WeoGeo Server.

Figure 1. WeoGeo Server (click on the image to see more information)

The Server allows the mapping organization to distribute customer-defined customized products that would reduce the required file size, and thus bandwidth, to satisfy their customers’ demand. However, there is still the use case where the customer wants the whole file.

Since FERI is a small business, we couldn’t have our daily research activities impacted by an imagery request. So the first (obvious) step was to develop a customization and distribution system that processes a data request in an asynchronous manner, i.e. the order is taken during business hours, but it is processed and delivered after business hours. This allowed us to optimize our bandwidth in our labs and still reasonably satisfy customer demands (assuming they did not need instantaneous data delivery). We also tweaked the system to allow some small files and all of our own requests to be processed immediately, while larger ones for external users were processed in the evenings.

The asynchronous data delivery is also a fundamental difference between our technology and online GIS servers. We optimized for discovery, customization, and ordering in a way that allows the customer to receive near-instant gratification on the discovery and ordering, while (possibly) delaying gratification on the delivery.

While the customization of product selection and the asynchronous processing and delivery bought us some additional help in terms of distributing large geospatial content files, it still did not help us with the problem of what to do with multiple requests for 40 GB image files. This is where some of my earlier posts, where I described our use of Amazon Web Services, begin to make some sense (and maybe why Jinesh digs what we are doing).

However, I am late for dinner, so I’ll pick up this theme on a later post…

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping, BigTIFF

What file format do you use for a 40GB image? (BigTIFF!)

Large imagery files are a problem. In the hyperspectral world, we send things via ENVI’s file format (BSQ, BIL, or BIP). ENVI was designed by folks doing HSI remote sensing and was optimized to easily handle large raster images. The use of this file format allows us to deliver extremely large raster files, with a separate header that described all the channels, bands, or layers in the image.

Unfortunately, not everyone owns a copy of ENVI. It is an expensive image processing package. While other remote sensing and GIS packages claimed to handle multi-band imagery data, we found that support for imagery with bands n > 3 was difficult at best. So if our customers at FERI didn’t have ENVI, the transport of the imagery had to be accomplished in another file format. The most common format other than the ENVI format for us was GeoTIFF.

Unfortunately, the GeoTIFF format is limited to 4 GB. This is clearly problematic for the image shown here in Figure 1.

Figure 1. HSI imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This image is 156 band hyperspectral mosaic. The entire image at is native spatial resolution equals 40 GB in size. Cutting this data into 10 tiles of 4 GB a piece would be one way to deliver this data set. But this is problematic for both us and the receiver of the images, as the time, energy, and effort to tile and then re-mosaic is less than efficient.

You could also say that for the most part that HSI data is a relatively small backwater of the remote sensing community, so why worry about it. To this I would respond with this imagery that we collected at the same time in Figure 2.

Figure 2. 3-Band DSS imagery of St. Joseph Bay, FL (click on the image to see the data set at WeoGeo Market.)

This is a 3-band RGB from an Applanix DSS. The resolution was about 1/6 the spatial resolution of the HSI sensor. The higher spatial resolution makes this image nearly as large as the HSI image. We actually incurred the pain of tiling the full image set for our original customer because they had only ESRI software with which to analysis this image.

Our friends at GDAL asked us about sponsoring a new file format, BigTIFF, which would be based on extending the TIFF format. We were happy to step up to help make this happen. I believe that the other sponsors had similar file storage and distribution issues, and we look forward to broad acceptance of this file format.

It will certainly make our distribution issues easier.

Background, Remote Sensing, Hyperspectral, WeoGeo, FERI, mapping

HyperSpectral Imaging (HSI) and the Path to a Digital Marketplace

WeoGeo was born from a need to preview, share, and distribute geospatial content. Our experience with this goes back nearly 9 years in developing a technology called environmental HyperSpectral Imaging (HSI) spectroscopy (see our non-profit research efforts at the Florida Environmental Research Institute). HSI technology is built upon collecting images at many narrow discrete wavelengths to build up a calibrated spectrum for each pixel in the image (Figure 1). Each of these discrete wavelengths is stored as a unique spectral channel yielding dozens, even hundreds, of bands of color information (as opposed to consumer cameras with three bands: Red, Green, and Blue). We created some novel techniques (including WeoGeo) to process, store, and deliver those hundreds of bands efficiently.


Figure 1. HyperSpectral Imaging Concept.

HSI is not a new field. The US government has been actively supporting it development for over 2 decades. The best known aircraft HSI instrument is run by NASA JPL. They have been operating the AVIRIS sensor since the early 1990’s for earth sciences studies. Two recent satellite HSI missions include NASA’s Hyperion and ESA’s CHRIS sensors. Our contribution to this field has been focused on dark target spectroscopy for water applications. Our primary patrons in the development of HSI for water have included the Office of Naval Research (ONR) and the National Oceanic and Atmospheric Administration (NOAA). Both agencies have an interest in finding and identifying things in the water using automated targeting and classification techniques. Basically we have been trying to “see” through the water to determine the depth of the water, the bottom habitat, and the water quality (Figure 2).

Figure 2. Imaging through the water. The color of light leaving the water is affected by the depth of the water, the stuff in the water, and stuff on the bottom.

Water is called a “dark target” because the reflectance of light from beneath the water is usually less than 1%. (“bright” land targets can be greater than 50%). This is important for signal processing where the quality of the feature map is strongly dependent on the signal to noise in the imagery, which is directly dependent on the target reflectance. The Spectrographic Aerial Mapping System with On-board Navigation (SAMSON) that FERI built and deploys is specifically designed to simultaneously handle bright and target targets.

Figure 3. FERI’s Spectrographic Aerial Mapping System with On-board Navigation (SAMSON; top image) and Ground Processing Unit (GPU; bottom image).

During September of 2006 FERI conducted a mission for NOAA to demonstrate the capabilities of HSI for detecting red tides. Figure 4 shows some results from one the largest Harmful Algal Bloom (HAB) ever recorded in the US. This three band false color composite was created with 3 narrow bands in the blue, green, and near infrared from the full 188 band hyperspectral imaging cube.


Figure 4. False color composite of red tide in Monterey Bay created from HSI image.

An example of how imaging spectroscopy is useful in quantitatively determining the extent of the HAB in this region may be seen in Figure 5 where the full spectra (uncorrected for atmospheric interference and illumination effects) is shown in comparison to a spectra collected outside of the red tide region. The biggest difference is seen in the near infrared region which is responding to increased reflectance of light by the dinoflagellates in the bloom.

Figure 5. A quantitative look at the spectra from an HSI image inside and outside of the bloom. The green line is the spectra inside of the bloom, the pink line is from outside of the bloom. The big difference around 710 nm results from the large numbers of dinoflagellates that reflect light out of the water. A different effect accounts for the difference seen in the 400 to 600 nm range where the dinoflagellates have pigments that absorb light. These pigments result in less light being reflected out of the water where high concentrations of these dinoflagellates are be found.

The more subtle differences in the blue and green regions relate to the differences in absorption of light by the pigments in the dinoflagellates. The change in relative reflectance is what gives this bloom its characteristic “red” color (Figure 6).

Figure 6. Red tide (HAB) as seen from the research vessels collecting data during the experiment. (Photo courtesy of Dr. R. Kudela, UCSC.)

An advantage of HSI is automatically rendering data into feature extracted maps. Automated, in this case, means that an algorithm (as opposed to an expert) can render the imaging data stream into maps of bathymetry, red tides, sea grass beds, wetlands vegetation, habitat maps, land use change, etc. Automated is important because these imaging data can be terabytes in size. The time requirements just to load the imagery into computer memory for viewing and editing can be onerous. Trying to manipulate and analyze the imagery for features, targets, and materials taxes the time and computer systems requirements to the point of making HSI technology and products the realm of the few.

The ideal approach is to use well calibrated sensors to remove atmospheric and illumination effects (the subjects of future blog entries) to generate HSI imagery that can be directly processed into target and feature maps during the initial image processing. This approach can render products like Figure 7 in less than 8 hours of processing on FERI’s field processing station (right side of Figure 3). These map products are much smaller in size than the original imagery data and contain valuable information for users that are unfamiliar with spectroscopy itself. Using automated feature extraction techniques with HSI provides a mechanism for mapping our world more quantitatively and more frequently than is currently being accomplished with traditional field and photogrammetry techniques. It is the future of remote sensing.

Figure 7. The concept of automated feature extraction and classification applied to the wetlands of Morro Bay, CA using HSI data.

The concept for a server that could handle TBs of HSI imagery was originally conceived as a mechanism for FERI to serve its research partners. WeoGeo Market and Server took this concept and expanded it to handle a larger number of map forms, in a more intuitive manner. The Market provides a portal where other can contribute their value-added mapping content and be compensated. Server gives an enterprise the ability to manage its geospatial content, as well as easily monetize that content. Together they help address what became one of our hardest technical challenges at FERI – How do we serve our partners the maps that they want?

Background, Amazon, WeoGeo, FERI

Building a Web 2.0 Mapping Solution

I am writing from San Francisco today, where I am attending both the Web 2.0 Expo and Location Intelligence conferences. I have found that the serendipity of discovering real potential value in the concept of “Web 2.0” while developing our solution for B2B mapping a bit humorous. My original take on the Web 2.0 business was that it was all about social networking and advertising. However, our industry (the global mapping industry) is ripe for a true SOA solution, and we are trying to build something that will release the potential of both the internet and mapping beyond just the ability to share mashups. In order to accomplish our goals we needed to overcome some critical infrastructure hurtles in the development of a platform that allowed real internet commerce to proceed within the mapping industry. As I am preparing for both of these conferences this morning I thought I would begin to share some thoughts on how we are planning to build a SOA, which may be considered a Web 2.0 application.

The global mapping industry is a $4 to 7 billion a year market (depending on which report you read). It is a B2B industry, dominated by large investments in infrastructure (think satellites, airplanes, computers, software, and content), as well as large investments in highly skilled technicians. The data volumes are enormous; our own mapping efforts (at FERI) run upwards to 10 terabytes of mapping products per day, requiring multiple distributed processors just to generate the maps, which we then have to serve to clients and users in near-real time. WeoGeo (www.weogeo.com) is our B2B portal and server solution to rapidly delivery mapping products to end user customers.

Imagine building the computational and internet infrastructure to deliver gigabyte to terabyte size maps. A terabyte map takes 90 days to be transported over a 1.5 megabit per second link. WeoGeo has developed the technology to dramatically reduce this effort, but to service a global market of such maps would require mind boggling infrastructure support. Enter Amazon Web Services (AWS).

The initial beauty of AWS is in the cost structure, where we are paying for our computing time (EC2) and data storage (S3) on a pay-as-you-go basis. Our initial budgeted start-up infrastructure costs were ~$300,000 plus first year expansion ~$200,000. When we budget the same effort on AWS, its pro forma was somewhere between $10,000 to $20,000. AWS allowed us to spend our limited start-up dollars on developing the technology of WeoGeo, rather than buying and maintaining computers. But the initial beauty is quickly overtaken by something a bit more sublime. With EC2 and S3 our processing and storage requirements are totally scalable. The term scalable is so prevalent in today’s business press that it often loses its significance. However, scalable to us has very significant time lag and costs implications. Besides pitching to potential customers who have map archive inventories approaching petabytes, we are talking about a web services business that currently counts 200 million Google Earth users. If we are as successful as we hope to be, an exponential growth in business would rapidly overcome our abilities to assemble hardware, much less install and maintain servers to service the business. Our business requires scalability, with a capital S.

So we made a bet at the beginning of WeoGeo that a business model built on commodity computing cycles or elastic computing, as opposed to commodity computers would best enable us to handle growth in this industry. The fact that the upfront cost was cheaper was a bonus. The bet required focus, so we decided to make an all inclusive AWS service platform that required no outside data center processing or storage. For this to work, our web and data base services had to be robust and durable in a virtual machine environment that in it of itself might not be durable. It had to handle spiking (think “Digg Insurance”) and cyclic patterns in processing to assure up-time and optimize costs. It also had to address load balancing and stable IP addressing in an environment where the virtual machines’ IP addresses and domain name records may be lost.

With a lot of brain sweat and great interaction with the AWS team, we created an internal EC2 management solution that accomplished these goals. After some prodding by the AWS team, we have begun to offer one of these solutions as a product. WeoCEO (www.weoceo.com) is a management solution for stable IP address, fail-safe monitoring, load balancing, and auto-scaling of EC2 resources. Besides the insurance aspect of this solution that provides for robust ecommerce activities, the auto-scaling feature actually provides a tremendous cost savings over daily and seasonal cyclic usage patterns. We look to providing an extension of the WeoCEO services for durable database operations in the near future.

In short, the mapping industry is competitive B2B market that has high infrastructure costs to support large processing and storage requirements. WeoGeo has created an SOA on AWS that will allow for the unleashing of huge volumes of archived mapping products to create a geospatial information exchange that will scale from the smallest to largest users. While cost containment will certainly be a key component to our viability, we believe that quick, reliable, and scalable service will be more important to our eventual success.

Adena Schutzberg at Directions Magazine (which is hosting the Location Intelligence conference) has indexed a podcast to be available on April 17th titled, “Is Web 2.0 Mapping “Dead”?”. All I can say is that we don’t think so (and I sure hope not).

Background, Amazon, WeoGeo, grid computing

Cycles in the Sky

There is a revolution happening quietly in the development of web computing infrastructure. To those who have been involved in the development of large scale distributed computing, i.e., cluster and grid computing, the concepts and applications of the revolution are decades old. To the computation science community, including weather forecasters, climate change scientists, numerical ecologists, artificial intelligence experts, bomb developers, etc., these types of efforts have been at the forefront of super computing technology development. I have been involved in various types of cluster computers for oceanic ecological modeling, and some of our collaborators at Rutgers University are experts in the field of distributed computing for coupled atmospheric and oceanic modeling. However, for the average person, terms like cluster or grid computing have little or no tactile meaning. Perhaps a few could tie it to the SETI grid computing project, but even these few might not understand the implications at the average business or consumer level.

One of the favorite terms today for large scale distributed computing at the business and consumer level is “Web-Scale Computing”. You see it in the sessions for a couple of the O’Reilly Conferences (ETech and Web 2.0 Expo) mainly discussing Amazon Web Services’ (AWS) EC2 and S3 services. AWS is one of the first mainstream applications that put the power of cluster computing into the hands of commercial web application developers. With these services, and those that will surely follow, we as a society/culture/business community move one step closer to the concept of on-demand purchasing of computer cycles, and the development of markets in these cycles.

These services, what I will refer to as Commodity Computer Cycles (C3), are different from commodity computing. In commodity computing you are still responsible for assembling the components and the network of processors into a cluster for your distributed processing application. You are still required to pay for the power, cooling and maintenance, as well as, the personnel involved in development, care, and security of the systems. These expenses are upfront and continuing throughout the life of the business, regardless of total computational use. With C3, you buy the FLOPS or the storage space needed for your application, on-demand.

With C3, your business can then focus on developing better applications and services for your customers, rather than the development of the in-house infrastructure to rack, cool, and take care of your computers. If the outsourcing of FLOPS and storage makes business sense (which I truly believe it does), we should expect that the demand of C3 services will increase, leading to the building of more C3 infrastructure and therefore feeding virtuously into the creations of evermore efficient web applications and services. If the revolution seriously takes roots and spreads across the whole of the business and consumer communities, it will affect us all.

To bring this discussion back home, we at WeoGeo are trying to change the dynamics of quantitative mapping. Our own maps are terabytes in size, and require petaFLOPS of processing. We developed our web exchange and server application on EC2 and S3 for many reasons, including the costs associated with the growth in our computing needs and the requirement to automatically scale as a function of computing cycle demand. In addition, by developing on a fully scalable C3 model (see below), we could pass the infrastructure savings directly to our user community. This should help enable them to develop new markets for their mapping products and hopefully lead them into a new model for generating revenue in the field of geospatial maps, services, and technologies.

The EC2 version of C3 marks the beginning of the widespread commercial use of on-demand distributed computing. I believe it is a harbinger of things to come. For our purposes, EC2 was not quite ready for prime time and we had to overlay additional intelligent management software to provide stability and optimized scaling to take full advantage of the C3 potential offered by AWS (see this WeoCEO blog post, as well as this AWS forum post by Robert Banfield). I am sure that our solution is but one of the first of many to come. The important thing to recognize is that the delivery of scalable, fully optimized, Commodity Computing Cycles is happening right now and will only get better, easier, and cheaper with time. I believe that the next phase of productivity enhancement in the business and consumer markets begins now, and it is truly exciting to be a part of this wave.

Background, Remote Sensing, Amazon, geospatial

Whether it is $3.6 or $7.0 Billion, it is still a big market

I ran across a recent post by Roger Hart at GeoCarta that highlighted a remote sensing market report (BCC Research) suggesting the total world-wide market for remote sensing products was on order of $7 billion in 2006. This number is similar to the $3.6 billion for 2006 estimated by Daratech, if you remove weather forecasting and climate change studies from their 2006 estimate.

These are big numbers. However, the total remote sensing and geospatial market are also segmented, with lots of niches that make it difficult for developing economies of scale in the collection of data, or the creation of derivative products.

I have a sense that this is changing. In other words, that the growing demand for products will run right into the ability of individuals to create content using base maps provided by large scale mapping projects (e.g. NAIP). I believe that we may be approaching a cusp period in the development of geospatial markets, where the benefits of low cost powerful servers and commodity computing (a la Amazon Web Services EC2/S3), combined with the robust open source geospatial software (e.g. GDAL) and the innovative power of individuals and small businesses, will begin to impact the traditional government services model. I see the impact to be greater supplies of content at lower cost points, resulting in an ever increasing demand for geospatial products.

I am not quite sure who wins or loses in this period. I would like to think that a rising tide raises all boats. I do think that it will be a period of rapid change, so if you are doing the same old thing, with the same old tools, it might be time to reassess your business model.

Next »