Outages Happen: Disaster Recovery, the Cloud and a Lesson From Cycling

I’m an avid cyclist and after about 2000 miles of riding, I recently replaced my tires without ever having gotten a single flat. I ride on some fairly inexpensive tires (my friends pay $65/tire whereas I pay $25) and a few poor roads, but I learned years ago that as long as I run my bike with 110 lbs. of pressure in my tires, I won’t get a flat. Now ask me if I ever ride without a spare tube, a pump, tools, and a mobile phone. God no!! Those items are my plans B and C in the unlikely event that I do get a flat 50 miles from home.

Why do I bring this up? Simple. There was a big cloud outage last week and as a result, many are declaring the Cloud dead. This is outrageous. Outages happen. They happen in the cloud, they happen in managed hosting sites, and they happen with self-managed servers in a customer-owned super-redundant data center. So we should all be prepared for when they do happen. But even if you choose a carrier-grade, enterprise-grade cloud provider you still need to plan for the unexpected. Lance Armstrong probably spends a lot of money on tires. I imagine his bikes are equipped with some of the lightest, most durable tires in the world, yet he has a whole team of people driving behind him ready to change his wheel in the event he gets a flat.

Now ask me if my 8-year old daughter has a spare tube, a pump, tools, and a mobile phone when she rides up and down my street. God no!! She is simply developing her riding skills, testing what works and what doesn’t on a bike, and certainly won’t miss school or sleep if she’s sidelined with a flat. Why do I bring this up? Simple.  Some workloads can run in the cloud with no plan B at all.  Development, test, and quality assurance (QA) projects are great for the cloud. If an outage occurs, there’s no impact to customers or on revenue. And provided there’s geographic diversity, the cloud is a great place for your disaster recovery (DR) or backup site.

To illustrate this, let’s remind ourselves of how we calculate availability:

Calculations for Cloud Computing Availability

In the last equation, two sites running in parallel–each with an availability of 90 percent–will result in a system that is 99 percent available. So what if you have a hosting provider who delivers 99 percent availability and you use a cloud for DR that has 90 percent availability? This yields a system that is 99.9 percent available—a considerably higher service level. In other words, the cloud is a great place for DR.

Please note these numbers are being used to show the math and that one should check with their hosting and cloud provider for the actual availability.

While significant cloud outages are scary, let’s be careful not to make them out to be something bigger than they are. At the end of the day, there is no substitute for good planning and engineering. Some projects can run in the cloud with no backup and some will need a backup site. So take a step back, decide what your tolerance for failure is and design accordingly.

Have you been affected by outages? How is your company adjusting its hosting strategy to mitigate this risk moving forward?
Don Parente Technology Strategy and Chief Architect Director AT&T About Don