Microsoft site outages show DNS as point of failure
Microsoft Corp. this week fought off two major Web site outages, claiming the first was due to an internal human error and the second was caused by hackers.
The first outage, which began Tuesday sometime around 11 p.m. EST and lasted approximately 20 hours, occurred when a Microsoft technician incorrectly configured a router on the edge of Microsoft's DNS network, thereby preventing Web traffic from reaching a number of Microsoft sites, according to a company statement.
Maintaining that the outages were separate events, the Redmond, Wash.-based company issued a second statement late Thursday stating that the second set of outages had been caused by a denial of service attack, in which a hacker intentionally floods the servers in an attempt to bring Web traffic to a standstill.
The blackouts highlight the fact that DNS failure can render the network beneath it inaccessible -- even if all the other pieces are in place and are otherwise functioning properly.
The outages also open Microsoft up to criticism by those who claim that basic networking tactics could have prevented hackers from bringing down the entire DNS system, or at least could have made their task considerably more difficult.
The company was criticized, for instance, when it was discovered that all four of its DNS servers are on the same network.
"Whatever the cause, all the [DNS machines] are sitting in the same place in one room. That's generally considered to be bad practice when you're aiming for reliability," said Steve Hotz, CTO of UltaDNS, a San Mateo, Calif.-based company that provides outsourced DNS services.
Microsoft's DNS servers merely point Web traffic in the direction of its pages, but having all four of them on the same network makes those pages more susceptible to crippling.
If DNS servers were spread out, hackers would have to attack each server individually, making their task all the more challenging.
DNS failures due to DoS attacks inevitably result in lost customers and lost revenue. What is worse is that DNS failure is a problem most companies don't think much about until they are confronted with it, Hotz said.
"If you're a security guy, DNS is an important thing to protect because it's a single point of failure," said Tom Shaw, chief engineer at OITC, a Melbourne Beach Fla.-based systems engineering and consulting company.
Analysts said that DNS is simple and straightforward and that it does what it is supposed to do.
"The actual problem isn't so much with DNS itself, the problem is with managing it," said Michael Hoch, an analyst at Aberdeen Group, in Boston.
For instance, if a technician types one dot out of place when entering a DNS address, the visitors looking to reach that Web site can be routed to an address that doesn't exist.
Keynote Systems, a San Mateo, Calif.-based company that measures Internet performance, lists address inconsistencies as the second-most frequent reason for connection failures; the first is a connection timing out while a machine waits for a response.
"If you're a Web site trying to provide 99.999 percent uptime, DNS becomes very important," Aberdeen's Hoch said. "It's going to take more and more time to manage DNS, and it's going to take more resources."