Posts Tagged ‘Amazon’

The Internet Is Not the Computer – Yet

Tuesday, October 6th, 2009

Has an IT professional in your organization told you that Software-as-a-Service (SaaS) just does not work for your enterprise?  I am going to help their argument; then I’ll tear it down.

The biggest problem for a professional engineering organization with SaaS is latency.  Also called disk-access-time , data transfer rate, bit rate.  Rapid communication between processing units and data storage pools is critical for today’s professional computing efforts.  If you don’t believe me, try opening (and editing) a large GIS or CAD file via a VPN from a network storage device operating in a satellite location from across the country.

For example – Our realized transfer rates from our facility in Portland to Amazon Web Services (AWS) in Virginia is 3.5 megabits per second (mb/s).  We have a 100 megabit per second dedicated service pipe to our facility.  This latency is not Amazon’s fault, but rather it is a function of the speed of light and the number of Internet “hops” or network transfer points between WeoGeo and AWS.  From Florida, we average about 7 mb/s to the same facility.  This issue is one of the fundamental reasons AWS released their Import/Export feature to S3.

To put this into perspective, in a typical disk drive on a desktop computer (circa 2008) operating at 7200 rpm operates at a disk access transfer speed of 560 – 2400 mb/s (70 – 300 megabytes per second).  That is 2 – 3 orders of magnitude faster than accessing the same file via the Internet.  The last time desktop computer users “suffered” thru 5 mb/s disk access transfer speeds was in 1987 when IBM released the PS/2 with a 5 MegaByte ST-506 Seagate Technologies hard drive.

When will latency be reduced to the point where we might consider “the internet is the computer”?  That is kind of hard to say.  Even with ubiquitous broadband access at >100 mb/s to all business, we will stuff suffer the “speed of light” problems – photons can only go so fast.  In addition, every time you have to go through an Internet junction or telecommunication switch (i.e. hops) you will increase the packet transfer times.  A report from the National Broadband Coalition (also covered here) suggests the pipe speeds to small and medium business will not approach those of hard disk drives until sometime between 2015 and 2020 (see table below).

This suggests that the internet-as-the-computer to replace your desktop is still some time away, maybe as long as 3 – 5 years.  This latency argument is what many in corporate IT departments would use to strike against SaaS, PaaS, or IaaS services within your organizations.  (I am purposely ignoring security, but will address this issue in another post).  I would counter that those arguments are very similar to the ones once used against the IBM PS/2 as a corporate workhorse back-in-the-day; and ultimately I believe they are rooted more in bureaucratic inertia than true cost/benefit analysis to the enterprise.

Internet-based services have a place in today’s enterprise environment.  Depending on the use case, they can be more efficient in managing your limited IT dollars.  These services can also provide greater, timelier, software support, which together with the lower costs, increases the bottom-line productivity of your organization.  More importantly, these managed, off-site, services will have a greater place in your organization tomorrow.  Data transfer speeds will increase to a point that the latency issues will be negligible for the services you require.  As a decision maker in your organization – are you planning for greater productivity and enhanced profits tomorrow, or adding to the “mainframe” infrastructure of today?

Scaling FME Engines on WeoGeo

Friday, June 19th, 2009

I presented the movie below as part of a presentation at the Safe Software FME User Conference. We had a great time and the Safe crew put on a marvelous show.

The movie shows WeoGeo scaling up to 64 Safe Software distributed FME Engines in the production of tile caches from a world-wide elevation database. The FME Workspace script was created by Dmitri Bagh, and processed on WeoGeo’s FME Constellation built on Amazon Web Services.

The scaling occurred automatically, spinning up FME Engine AMIs, and then shutting them down when the job queue was completed. This is one of our first examples of bringing scalable processing to difficult geospatial tasks.

Examples of the tiles created by Dmitri’s script for Virtual Earth (Bing Maps for Enterprise) and Google Earth can be found here.

Panel 1 (upper left hand corner) refers to the total number of engines in the constellation processing job.

Panel 2 (upper right hand corner) refers to the total constellation utilization percentage. The constellation is polled and when the utilization exceeds the pre-set threshold (50% in this example), it increases (doubles here) the number of engines until it reaches the pre-set maximum number of engines (64 here). The downward spikes occur when each new set of engines are added.

Panel 3 (lower left hand corner) is the average job processing time. There is an increase in velocity when the number of engines exceeds 16, which may be a function of increased overhead costs on the FME Core or bandwidth to the database.

Panel 4 (lower right hand corner) is the total number of jobs completed. 2000 jobs were submitted for this test. The job completion rate accelerates until the maximum number of engines are brought on-line.

Some Thoughts on Mechanical Turk and Geo-Processing

Wednesday, July 16th, 2008

We use Amazon Web Services (AWS) quite a bit. Mostly we use the EC2 and S3, but recently we have been using a limited bit of Mechanical Turk (MTurk) for some testing of the web site.

For those of you who don’t know what MTurk is, from the web site -

…The Mechanical Turk web service enables companies to programmatically access this marketplace and a diverse, on-demand workforce. Developers can leverage this service to build human intelligence directly into their applications.

Our use has been somewhat limited to testing of the web site only. However, there has been some image processing uses of MTurk, including the SAR efforts to find Jim Gray and Steve Fossett.

I wear two hats these days. We are still actively involved in the development of HyperSpectral Imaging (HSI) sensors and algorithms (see the Florida Environmental Research Institute). It was from these efforts that we developed the cataloging, discovery, and distribution systems that we spun out into WeoGeo.

The holy grail of imaging techniques is the automatic extraction of features and classification of materials within the raster data. It is something we have been trying to develop for over a decade. There are others who have been working at it longer.

After all these years, there are some problems that are still difficult to solve in processing imagery. They frequently require just looking at the images frame by frame to resolve features and classify stuff that just defies algorithmic development. It strikes me that there may be some parts of this processing that may not be easily solved using computer algorithms. Things like finding seam lines in overlapping aerial photographs.

Several major imaging vendors send a chunk of their current image processing to low cost countries like China and India to complete their large-scale projects. It seems that there might be a better way to accomplish such geo-processing tasks that still require eyes then to incur the time and expense of sending these tasks overseas. Perhaps Mturk and some smart programming might offer a different approach.

I also wonder what other sort of QC/QA tasks in geo-processing might be solved by MTurk. I might try to kick it around a bit at GeoWeb. Find me if you got some thoughts.

(Ford assembly line, 1913)

Innovation in Web Mapping Systems

Thursday, January 10th, 2008

There is a nice discussion happening on James Fee’s Blog about Web Mapping Systems and Services and the future of hosted mapping services. I was reading it and thought back to an interesting Wall Street Journal article on Monday about Circuit City that said same store sales in December fell by 12% in the US. While this news was depressing for the stock market, the silver lining for the geo-community was that navigational products were the only product line with increasing sales over the period.

Geo-devices are becoming more ubiquitous. The shear number of curious and talented people moving into our industry combined with these devices will drive product and service innovation in directions that may not be completely clear at the moment.

Converging with the mass market penetration of geo-devices and geo-content (geoware?) is the cloud computing efforts by AWS (and soon to be others). While the production of quality mapping today may require high end desktop workstations and servers, I think that Moore’s Law is eventually going to allow our field to produce geo-content and services far more easily, leading to a feedback into future product innovation. How we in the professional community create products and services today may be radically different in the future.

I offer this anecdote – today, after 10 years of running a Microsoft Exchange Server for our email requirements, we switched to Google Mail Premium. Over the 10 year period, we incurred costs of $10,000s, possibly greater than $100,000. These costs included licensing, hardware, server room, service personnel, etc. Our spam filter alone on the MSFT Exchange Server costs us $35 per year per mailbox. Our costs for Google Mail Premium service is $50 a mailbox per year. It is an easier to use, cheaper to implement, and offers more robust service than the Exchange product.

I think there might be parallels for our industry in this anecdote. It is probably a good exercise to be thinking about what products might be replacing the ones we are using today.

The future of GIS, geo-content, geo-entertainment, etc. will belong to those who can think outside of the traditional methods of production and product delivery. For historical evidence of the difference between companies that focus on the future and those that focus on their current narrow niche, look at the change in market capitalization of Trimble (TRMB) and Garmin (GRMN) over the last decade.

Above Chart taken from Google Finance

Rock the Vote! Geospatial Moving Mainstream

Wednesday, December 5th, 2007

I am writing this in Seattle as we prepare for the finals of the Amazon Start-Up Challenge. We are truly excited to be a part of this Challenge. It is an amazing opportunity to be recognized for our technology and business model.

I am in love with our technology and how easy it makes it for all players, large and small, to participate in a global mapping market based on the quality of their skills and geo-content. WeoGeo will make it easy for all of us in the geospatial field to do our jobs, while at the same time increase our operating margins and productivity.

A really exciting part of this Challenge is that our business model is also being recognized by an internet service company with a $39 billion market capitalization. Everyone in our field has seen the explosion of interest in geospatial over the last few years. One only has to look at the >200 million downloads of Google Earth, as well as the consolidation by Tele Atlas and NavTeq to know that our industry is moving mainstream. This is excellent new for all of us, for it will provide more resources and revenues for our field, which translates into better opportunities for us all.

Come see the videos of the finalists in this Challenge and vote for your favorite (preferably WeoGeo!). But make sure you see our video. I hope the passion for what we are trying to accomplish is evident. I also hope that you will find what we are attempting to accomplish as exciting as we do.

Big thanks to Adena Schutzberg at All Points Blog and James Fee at Spatially Adjusted for helping us rock the vote!

WeoGeo’s Mapping Marketplace Makes Final Cut in Amazon’s Start-Up Challenge

Wednesday, November 21st, 2007

The only thing I can say is, “Wow!” Followed by the biggest grin you have ever seen on my face. As one of 7 finalists, Amazon expresses their confidence in our technology and business strategy. In all honesty, I am humbled and honored by the selection, and truly thank them for their selection of us as one of the 7 finalists.

I believe (passionately) in what we are trying to create. I believe that WeoGeo will change the paradigm in how we discover and access geo-content. I believe that we (the geospatial industry) as a community will more easily synthesize new mapping products that will help us create a better world. But these are my beliefs, and I tend to view everything we do through these rose colored glasses.

The selection as a finalist by Amazon Web Services (AWS) means that someone else out there sees the same potential for the mapping and geo-content industry as we do. It provides validation for the people who have worked so hard on this project beyond anything that I could offer, and for this I am eternally grateful.

In addition, Amazon will offer the winner of this contest a venture investment. I believe this says a lot about the geospatial industry, as well as WeoGeo. For WeoGeo to be among those considered a suitable investment opportunity by a $32 billion dollar company, we must have (1) a great business plan, (2) a great set of technology, and (3) be in an industry with high growth potential. Our industry, the geospatial industry, is now recognized by a leader in internet services industry as having high growth potential.

I’ve been grinning so much, my face hurts…

WeoCEO Emerging From Private Beta

Thursday, October 18th, 2007

WeoGeo has created a scalable, fault-tolerant infrastructure to manage its use of Amazon Web Services Elastic Compute Cloud (EC2) operations. I’ve written about it a couple of times (see this link for a listing of the Amazon tagged blogs). The latest version of WeoCEO (Version 0.1.0) is ready for release and with it we are moving from private to public Beta.

This version includes the Assistant to back up WeoCEO (see this feature described in this Amazon Web Services StartUp Event Slide Show). WeoCEO Version 0.1.0 also provides enhancements to the stable IP addressing, failure detection, and automatic scaling and load balancing. These enhancements include automatic emailing to your site administrator during trouble events and detailed logging capabilities.

WeoCEO Version 0.1.0 (including the load-balancing and auto-scaling capabilities) will be free of charge at least until December 1, 2007. It will continue to be free if you only use the stable IP addressing and auto-recovery features for a single client instance.

There will be a charge for the load-balancing and auto-scaling features of WeoCEO, which support running multiple EC2 instances and optimizing your network. The charge for these features will be $0.05 per managed client instance per hour. The charge will be on the average usage over an hour, calculated at <15 minute intervals.

You can obtain a WeoCEO ISO with the setup and installation instructions, by visiting http://www.WeoCEO.com and clicking the “Signup” button, or by clicking the Signup button below. We are still in beta, so constructive comments on any of the components that make up this service will be met with exuberance and free goodies.

Amazon Web Services EC2 Outage

Monday, October 1st, 2007

This weekend was a bit crazy for some of the AWS EC2 users. EC2’s “management software erroneously terminate[d] a small number of user’s instances” (from the AWS forum post). Some of our instances were among them providing an opportunity to test the fail-safe mechanisms in WeoCEO. We received the following email:

From: Amazon Web Services
Sent: Saturday, September 29, 2007 5:46 PM
To: David Kohler
Subject: Amazon EC2 Notification of Terminated Instances

Hello,

This is just a quick note to let you know that some of your instances were erroneously terminated today. We have resolved the underlying issue, and the service is fully available.

You can find a summary of the issue here:

http://developer.amazonwebservices.com/connect/thread.jspa?messageID=6816

These are your affected instances:
i-8004e0e9
i-681ef101

We apologize for this inconvenience.

Sincerely,

The Amazon EC2 Team

Please be aware of the limitation of utility computing, as well as the promise. Planning for these outages will be a requirement for safely outsourcing your metal resources.

If we had not prepared for this by building WeoCEO, this could have been a real issue for us. We would have needed to scramble staff at 6 AM on a Saturday morning. Fortunately, WeoCEO recovered from the failure and it was not until Monday afternoon that we notice that it happened to a lot of other people.

From WeoCEO’s architect, Bob Banfield’s, forum post:

Here is a quick shot from our WeoCEO logs. We told WeoCEO that regardless of usage we want a minimum of two instances running, so that is the initial number of instances at 6am in the morning, even though we are receiving next to no traffic. At 6:09, i-681ef101 stops responding (the first of five allowed consecutive failures). At 6:10 it still hasn’t responded, and at 6:11 both it and instance i-52907e3b have now stopped responding. Instance i-52907e3b comes back up in another 2 minutes, but instance i-681ef101 is ruled dead after 5 failures. It is automatically terminated and a new one is brought up in its place.

(SSS) Sat Sep 29 06:07:24 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2
(SSS) Sat Sep 29 06:08:25 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2
(EEE) Sat Sep 29 06:09:25 2007 Weoceo[6562]: Instance i-681ef101 has not reported statistics (1/5)
(SSS) Sat Sep 29 06:09:25 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2
(EEE) Sat Sep 29 06:10:25 2007 Weoceo[6562]: Instance i-681ef101 has not reported statistics (2/5)
(SSS) Sat Sep 29 06:10:25 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2
(EEE) Sat Sep 29 06:11:26 2007 Weoceo[6562]: Instance i-681ef101 has not reported statistics (3/5)
(EEE) Sat Sep 29 06:11:26 2007 Weoceo[6562]: Instance i-52907e3b has not reported statistics (1/5)
(EEE) Sat Sep 29 06:11:26 2007 Weoceo[6562]: No instances have reported statistics.
(EEE) Sat Sep 29 06:12:26 2007 Weoceo[6562]: Instance i-681ef101 has not reported statistics (4/5)
(EEE) Sat Sep 29 06:12:26 2007 Weoceo[6562]: Instance i-52907e3b has not reported statistics (2/5)
(EEE) Sat Sep 29 06:12:26 2007 Weoceo[6562]: No instances have reported statistics.
(EEE) Sat Sep 29 06:13:26 2007 Weoceo[6562]: Instance i-681ef101 has not reported statistics (5/5)
(EEE) Sat Sep 29 06:13:26 2007 Weoceo[11310]: Terminating i-681ef101 due to lack of statistics
(SSS) Sat Sep 29 06:13:26 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(III) Sat Sep 29 06:13:26 2007 Weoceo[6562]: Launching 1 instance(s)
(III) Sat Sep 29 06:13:26 2007 Weoceo[11310]: Terminating 1 instance
(SSS) Sat Sep 29 06:14:28 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(SSS) Sat Sep 29 06:15:28 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(SSS) Sat Sep 29 06:16:29 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(SSS) Sat Sep 29 06:17:29 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(SSS) Sat Sep 29 06:18:29 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 1
(III) Sat Sep 29 06:19:05 2007 Weoceo[11351]: Added ID=i-94ce20fd, PublicHost=ec2-67-202-13-222.z-1.compute-1.amazonaws.com, Host=domU-12-31-36-00-1D-B4.z-1.compute-1.internal, PublicIP=67.202.13.222, IP=10.253.34.66
(SSS) Sat Sep 29 06:19:32 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2
(SSS) Sat Sep 29 06:20:32 2007 Weoceo[6562]: Overall usage = 0% NumInstances = 2

Email warnings were delivered to me 6am on Saturday alerting me to the problem, however I was fast asleep and WeoCEO corrected identified and corrected the problem.

We believe in the future of scalable utility computing. Dealing with events such as these is just a part of the issues with these types of systems that we’ll all have to overcome to make this future work. Our goal is that we can share what we are creating for WeoGeo in a way that helps other overcome such problems.

I do not wish to minimize the impact of this AWS outage, but it would be unrealistic to assume that this type of event will not happen in the future. We should all consider this in building our virtual computing architectures. The use of AWS means that you are outsourcing your metal infrastructure. This means that your system design must be organic and self-healing (see also slideshare link).

Our solution is simple to use and operate, but does expect that you have some working knowledge of EC2. There are others who can help in building these types of architectures on AWS from the ground up (some of those contributed to the above AWS Forum thread including Thorsten at RightScale and Reuven at Enomaly).

WeoCEO was built to help us at WeoGeo survive these types of outages. We are completing our private beta shortly, and are releasing the latest version of WeoCEO that we will be bringing into open beta. Contact us at WeoCEO [at] WeoGeo [dot] com if you would like to participate. Open beta will provide the stable IP addressing and recovery options for one instance for free.

Amazon Web Services StartUp – Boston Presentation

Monday, October 1st, 2007

I was out of town last week. I’ll try and catch up on a number of subjects this week.

One of the reasons I was out of town was that I was invited by AWS to present at their StartUp event in Boston.


A copy of the presentation may be seen on Slideshare.net (or just click on the image embedded above). It was a great event, and I enjoyed sharing the stage with the talented people from AideRSS, Praxeon, and Geezeo. It was good to interact with others who are building (and bootstrapping) new web services using AWS.

I truly believe that utility computing is going to change the way businesses get started and (eventually) operate. However, we are going to have to build systems that are organic in how they handle resources, i.e. scale up and down as a function of load. In addition, these systems need to be self-healing by automatically addressing processor and storage outages.

The importance of self-healing will be evident in the next post.

Image Processing and Delivery Using Virtual Computing on EC2

Thursday, September 6th, 2007

I posted last week about bandwidth issues associated with geospatial data and our AWS S3 solution. The deciding factor for us to use Amazon’s offerings was not necessarily the edge distribution capabilities of S3, but the synergy from combining S3 data storage and distribution with virtual computing capabilities of EC2. There are multiple issues in image processing that require a ton of memory space and CPU horsepower. In both Market and Server, we offer the following basic map distribution options to our map providers -

Geo Clipping (6 zoom levels, allowing for ~125 million possible selections per data set)
Spatial Resampling (4 levels)
Layer Resampling (depends on data)
Output File types (5 – JPEG, GeoTIFF, ENVI, ESRI BIL, ERDAS IMG)
Projections (5 – UTM, Transverse Mercator, Lambert Conic, Albers Equal, Geographic)
Datums (3 – WGS84, NAD 83, NAD 27)

These options result in millions of possible map variants, which preclude the storage of each variant for distribution. So processing power for conversion is critical; and this processing power needs to be connected to a large, web-addressable, temporary data storage array to house the unique variant that a map user has selected. Now for a true mapping marketplace, this infrastructure needs to support 100s to possibly 1000s of simultaneous map requests from the same base map like the 40 GB image in Figure 1. Doing our NeoMapping Market correctly requires the creation of enormous processing, storage, and bandwidth infrastructure.

Figure 1. 40 GB, 156 layer HyperSpectral Imagery (HSI) map listed on WeoGeo Market. (Click on image to go to the listing in the Market).

However, who could afford that infrastructure upfront? Our original estimates for acquiring base computation needs and placing them into a co-location facility were around $500K. While not a lot of money in the scale of today’s internet operations, it was big for us. In addition, we were trying to develop the software architecture to support the Market and Server, and these expenses were large in it of themselves. AWS provided a unique and simultaneous answer to many of our immediate storage, processing, and distribution needs.

Developing our infrastructure on the scalable AWS solution allows us to say we can support the 1000s of map requests required for a functioning digital marketplace. The user experience is vital to the service’s credibility and therefore our success. However, there is a true (and in a number of cases unexpectedly high) cost in this decision. We traded high capital expenditures for high operating expenditures. In an upcoming post, I’ll talk about the Total Cost of Operations (TCO) on AWS, and some of the ways we are moving to reduce these high operating expenses through stability and scaling solutions. Some of these solutions we have turned into products that we provide to others (e.g WeoCEO)..

I would be interested in hearing about the actual experience of others on AWS and whether S3 and EC2 could or could not meet their needs.