March 12, 2012, 8:30 PM — To make up for a string of outages that were caused by a software bug in its Azure cloud services Microsoft is granting affected customers a 33% credit for the time they were left stranded during the Feb. 29 failure.
BACKGROUND: Microsoft's Azure cloud suffers serious outage
The problem stemmed from two overlapping circumstances: that Feb. 29 comes around only every four years and that when Azure initializes virtual machines for customer applications, a certificate is exchanged and given a valid-to date of one year. When certificates were issued starting 4 p.m. PST on Feb. 28, they were given a valid-to date of Feb. 29, 2013 which won't occur and was therefore interpreted as invalid.
This glitch set off a series of retries that also failed and led to the conclusion by the system that the hardware on which the virtual machines were running had failed. That led to attempts to migrate the failed virtual machines to other server hardware within the same Azure cluster, which consists of about 1,000 physical servers.
The migrated VMs also failed to initialize for the same reason and more and more hardware was judged failed until a threshold was reached that halted all attempts to reincarnate virtual machines anywhere in affected clusters. That allowed those clusters to stay in service at reduced levels, the blog says.
Azure also shut down the customer service management platform so customers couldn't add applications or expand capacity for running applications, both of which would have made the problem worse by calling for new virtual machines. "This is the first time we've ever taken this step," the blog says. Running applications were left intact.
It took 13 hours and 23 minutes to patch the bug in all but seven Azure clusters. Those seven were in the midst of a software upgrade, and so posed a separate problem. Should the host agents and guest agents that were exchanging the invalid certificates be upgraded to the newest patched versions or restored to the old versions but patched?