Can Outages be Good for the Cloud?

I was just thinking about the recent AWS outage and came to the conclusion that infrequent events like this probably help Amazon. While the first wave of response is usually criticism and doubt, the end result is probably increased adoption. Here is why. I don’t think that events like these are chasing anyone away from the cloud. The reality is that technology occasionally breaks — especially under extreme conditions. No matter where you have a data center, bad events are going to happen. I was just talking to a friend who works in the Baltimore area and their self-hosted site has been down for days due to power outages. When people take a close look, they realize that most cannot provide better availability than cloud computing resources.

Rather than abandon the cloud, I expect that most customers will do what I found myself doing: the opposite — increase their cloud investment. If they were in one availability zone, they will expand their cluster to get into multiple availability zones and even multiple regions (which I am doing). They will create warm standby servers to switch over to in the event of a catastrophic failure. All these changes increases their monthly AWS bill and leads to higher Amazon revenues. It’s just like the insurance business. The best time to sell new policies is right after an unlikely disaster, which is also usually the least likely time for the disaster to happen again (especially when new controls are put in place to prevent it).

Related Articles :

  • Andy

    Yep. I’m with you. Sometimes we take the short cuts we know that we shouldn’t because they are not visible to the customer. Be it an external or internal customer. But as IT continues to put more onto the Internet and into the cloud then the customer begins to expect the service delivery maturity of an Internet player. The expectation is that you will be available 100% of the time. Perhaps not all functions will perform at peak levels but you will never be unavailable. This has been a concept creeping and growing for years now. Here is a post over a year old that is referring to a post that is over a year old that is on the same topic. http://www.codinghorror.com/blog/2011/04/working-with-the-chaos-monkey.html

    In general I believe that the consumerization of IT (I don’t like the term but it gets the job done) is good for our industry. It means a new level of rigor but in the end the service delivered is better for it.

  • http://www.netsight.co.uk Matt Hamilton

    I think you are right about it being good in the long run. See Netflix’s Chaos Monkey for a company that is taking a proactive approach to this. I think it certainly does help a lot of the other providers too and it can help to make visible the amount of engineering (ie cost) that may be needed in keeping a site up and running. We had an outage here in the datacentre we run for an hour last week. One customer of a customer of ours complained their site was down for the time and how come another customer’s site was not. It was explained to them that the other customer pay about ten times the annual costs for their infrastructure as they have multiple sites, replication, DR plans etc. etc. When everything is rosy then people often have a hard time to justify the additional expense.

    Conversely, I think actually the likes of Facebook and Twitter in their early days were also very good at raising public awareness that the world does *not* end if a site is unavailable for a short period of time. They go and do something else and then come back. Obviously different businesses have different priorities and different risk profiles, but for some 100% isn’t needed.

    -Matt