February 27, 2013, 9:00 AM — Public clouds might be getting a bad rap. It turns out that public clouds aren’t more outage prone than private clouds.
RightScale scoured public records to examine cloud outages in 2012 and came up with a few findings (link launches PDF) that might surprise you.
It found 27 “notable” publicly reported outages worldwide. Public cloud naysayers might be surprised to learn that private and public clouds had an equal share of the outages – 26 percent each.
“The private data center number probably under represents the outages because they don’t get publicly reported all the time,” Michael Crandell, CEO of RightSScale said.
“The reality we all face is outages happen,” he said.
Equally surprising to me is that RightScale found nearly half -- 41 percent -- of the outages were experienced by hosting providers.
Crandell suggested that the hosting providers may have disproportionately suffered under Hurricane Sandy since some of them are located on the East coast in Sandy’s wake. Six of the 27 outages during the year were attributed to Sandy.
Sandy caused an additional interesting impact for cloud users. Businesses running workloads on AWS from the company’s East coast region in Virginia wanted to prepare for an outage. “The logical thing to do is prepare to launch resources in another region, like US West,” Crandell said. “What’s interesting is US East is quite a bit bigger than the other regions.”
Some businesses were worried that if the east region went down, and everyone tried to move at once to the west, there wouldn’t be enough capacity. As a precaution, some spun up instances in advance in the west region just to prepare for that scenario. (Check out the interesting set of challenges one company faced when trying to shift from east to west in advance of the hurricane.)
In the end, Sandy caused only intermittent outages for AWS from its east region. Crandell said that Amazon surely noticed the spike in demand on the West coast and may be thinking about the potential problem that could unfold should its entire East operation go down.
RightScale also looked at what caused all the outages during the year. Power was by far the biggest reason – 33 percent of the outages were caused by a power outage or failed backup power. Natural disasters and traffic/DNS routing issues tied for second, causing 21 percent of the outages. Software bugs, human error, failed storage systems and network connectivity rounded out the failures.
Many of those, even the natural disaster, have human error at their roots. Bad software, failed generators and poor backup plans tend to be the work of humans.
The first ever big AWS outage happened due to a misconfigured router, Crandell said. Last week’s Azure issues on Friday occurred when a security certificate wasn’t updated properly. “That’s more or less an administrative matter,” he noted.
At public and private clouds, the length of outages should give IT admins heartburn. The average was 7.5 hours, RightScale found. Twenty-three percent lasted eight to 12 hours.
In fact, for end users, outages are even worse, Crandell said. “They’re magnified by the fact that often recovery time is longer,” he said.
Once a system is back up and running, a user may have to restore data from backups, for instance, a process that takes additional time.
Read more of Nancy Gohring's "To the Cloud" blog and follow the latest IT news at ITworld. Follow Nancy on Twitter at @ngohring. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.