Analysts today say regular gap analysis is still a key component to disaster preparedness.
Cantor Fitzgerald LP, a bond-trading firm located in the World Trade Center, lost 658 employees and its primary data center on Sept. 11. It was the worse-case scenario of what could happen.
"They were one of the major bond traders on the globe. We had not imagined the scope of that disaster," Nelsestuen said.
What was remarkable about the recovery effort with Cantor Fitzgerald was that its competitors jumped in to lend a hand and took over its bond trades so that the firm could continue operations as it recovered from the devastation.
"There was no one who planned that: "If we have a disaster, will you do our processing and credit it to us," Nelsestuen said. "But, those are the kinds of things that came out of that level of disaster.... People [had] to start thinking about the human contingency that we'd never thought about before."
RPO and RTO
Businesses disasters are classified in three categories by Tower Group: natural, such as hurricanes and earthquakes; technological failures; and human, either on purpose or by accident. But no matter what causes a disaster, the nature of how best to recover is constantly being reexamined, Nelsestuen said.
"Companies are asking: 'How can we change our technology infrastructure to make it more recoverable and dynamic?' When failure occurs, your data is still preserved up to that point," he said.
Disaster recovery and business continuity today are often thought of in terms of recovery point objectives (RPO) and recovery time objectives (RTO). In other words, how much data is a company willing to lose if its systems go down.
For example, a company that synchronously replicates all backups to separate data centers that are actively up and running 24/7 has created an architecture with a tight RPO and RTO. A firm that allows data to be replicated off site asynchronously or backed up only to tape, expects it will lose some of the data being transmitted at the time of failure and assumes it will take longer to restore systems.
"The whole concept before was we have a production data center and then we have the disaster recovery site and that will take 24 to 72 hours to set up and get going," Nelsestuen said. "Now they're looking at making internal backups between the two. There are many institutions running data in multiple data centers throughout the day now."
Virtualization has allowed firms to be more dynamic in their recoveries because of self-healing systems and automated failover capabilities; when one server or data center goes down, another with the same data can come up almost instantly.