October 15, 2009, 11:16 AM — The good news is that Microsoft says it thinks it can get most of your data back. The bad news is how the problem happened in the first place.
OK, here's the good news about the Microsoft/Sidekick data loss fiasco: Microsoft corporate vice president Roz Ho claims that "We have recovered most, if not all, customer data for those Sidekick customers whose data was affected by the recent outage." Here's the bad news, the data was lost in the first place because of a "system failure that created data loss in the core database and the back-up."
I'll get back to why that's bad, but first, let me point out that Microsoft isn't saying the problem's all better now. They're saying that they think they can get most of the data back for most Sidekick users. Microsoft engineers are working on restoring customer data, starting with contact lists and then moving on to other information. But, "Before Microsoft begins this process, the company must first check the data to make sure it is stable and finalize the data restoration plan."
In other words, don't hold your breath. I'll be interested in seeing how many of the burned Sidekick users actually get all their data back at the end of the day.
At least there's some hope though. What I find more disturbing is that somehow a single system failure took out both the core database and its back-up. How does that happen? In all the serious DBMS (database management systems) programs I've ever worked on the core databases were kept on entirely different systems than the back-up. For a real-time DBMS like the Sidekick the active databases and their back-ups wouldn't even be in the same data-center, lest a single disaster knock out the entire system. Oh wait, that's what happened here didn't it?
We still don't know exactly what went wrong with Sidekick. Was it an upgrade to the Sidekick back-end SAN (storage area network) gone wrong? It sounds like it. But, regardless of the details, we now know that this major, real-time system was vulnerable to some sort of single point of failure problem.
That's completely unacceptable, and it make me wonder what kind of morons Microsoft has running their data-centers. It all calls into question the claims that Sidekick's back-end was running on a cloud. One reason why clouds are supposed to be so wonderful is that they spread the data and processing out on multiple systems around not just a data center but around an entire country. In other words, there shouldn't be a single point of failure with any real cloud-based application.
I've long suspected that most people claiming they had a cloud-based application were just marketers grabbing on to the latest buzz word and that most clouds were just ordinary servers and clusters.