When backup is a disaster

Utility Company AMP overhauls a mishmash of backup and recovery procedures to create one coherent plan

disaster recovery button keyboard
Credit: Thinkstock

Shortly after Branndon Kelley joined American Municipal Power (AMP) as CIO, the company's financial system went down.

It took four days to restore the system and Kelley, who had previously consulted with state governments on business continuity issues, immediately started exploring AMP's backup and recovery strategy.

He quickly discovered that there was none. No coherent plan. "We had a whole bag of tricks," Kelley says, including more than 10 different backup systems and processes. There were outdated off-the-shelf packages and hand-coded scripts--none of them documented or interconnected. There were backups of backups, and fewer than half of the backups succeeded on the first try.

run branndon kelley Stephen Webster

CIO Branndon Kelly overhauled AMP's backup and recovery process, replacing "a whole bag of tricks" with a single process.

That's when Kelley started losing sleep. AMP, a nonprofit utility that provides electricity to member utilities in several states, had begun acquiring and operating power plants, making the company's business continuity strategy especially critical.

"It made us feel uneasy because in the event that there was some sort of natural disaster or some kind of technical failure or even human error, it wasn't clear that we could continue operations," he says.

Trial by Fire

Kelley and his team documented the existing environment, gathered requirements, issued an RFP and chose a system from CommVault, which AMP implemented two months later, in November 2011. It wasn't the cheapest option, and the finance and executive committee made its displeasure over that fact clear.

"We have a competitive bid process, and I took some heat over that," Kelley says. But CommVault's system met all of AMP's requirements, providing, among other things, a single backup and recovery platform not just for IT systems but also for the company's plant operating technology (which was previously managed separately).

Within the first week, Kelley recalls, the new system "paid for itself for the next 10 years" during a surprise crisis: Recovery of the company's finance system was inadvertently started and then stopped at 3 p.m., in the middle of the monthly close, interrupting the system once again. A patch had been applied and an Oracle application stopped working, Kelley says.

"I'm sitting at my desk when I get the news, and I start thinking I need to pack my things," he recalls. "We're in the middle of the month-end close for October, heading into the new year. We've never done a recovery with the new software. I'm thinking the financial system will be down for days."

Kelley called CommVault. The vendor's database administrator started troubleshooting with AMP's DBA and the system was back online within a few hours. They lost just two hours out of finance's working day, lost and then recovered just 15 minutes work of transactions, and had the whole system restored by 9 p.m.

Today, 95 percent of backup jobs complete successfully on the first try and AMP has a 100 percent successful restore rate.

"It's hard to put a number on [the benefits]," says Kelley. "We've got a lot going on. We're busier than hell. As a 24/7 shop, backups used to literally keep me up at night. Now when a system goes down, I don't even hear about it. It's a commodity task, not a firefight."

This story, "When backup is a disaster" was originally published by CIO.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon