Don’t blame DNS for Facebook outage, experts say

Facebook gave little detail about the cause of its outage

By Network World staff, Network World |  Networking, DNS, Facebook

"This is what DNS does by default," Galvin said. "If it gets bad data in the cache, that is where the TTL comes to play. You may or may not be able to do something about that depending on how long your TTL is."

While the Facebook outage does not appear related to DNS, similar misconfigurations of DNS data have prompted massive outages, most recently for Germany and Sweden.

In May, Germany suffered an outage of its .de top-level domain zone servers that kicked millions of Web sites such as www.ford.de offline for several hours because of a truncated zone file.

A year ago, all Web sites with Sweden's .se extension were unavailable for an hour or more because an incorrect script used to update the .se domain was missing a dot.

"These types of outages happen frequently," Hyatt said. "They happen through poorly managed systems. The one that happened in Germany and the one that happened in Sweden - those were mistakes or errors in automated scripts that should never happen...They could have been avoided."

Hyatt said DNS appliances including BlueCat's feature configuration checking software that can alert administrators that the DNS data change they are making is invalid.

"We have data checking rules that look at the configuration you're trying to deploy and won't push it out...if the system doesn't exist or the system isn't configured right," Hyatt said. "Our system has a lot of smarts. It will give you an alert and tell you what's wrong."

BlueCat's appliances have featured DNS configuration checking since they were introduced back in 2001.

"We're looking for anomalies, logical errors that don't make sense," Hyatt said. "We definitely would have caught the Germany and Sweden errors because those were logic errors."

Similarly, Afilias checks zone file changes for the top-level domains that it operates before the changes get published to prevent errors like those experienced by the operators of .de and .se.

"We notice when zone files are changed. It pops an alert so it gets investigated," Galvin said. "We check the percentage of change...It would have helped prevent the Germany and Sweden problems, where there were very dramatic zone file changes."

But Galvin added that there's not much a service provider like Afilias can do if a customer has bad data in its DNS database, much like the scenario Facebook experienced.

"You're wholly responsible for your own data; all we guarantee is that your data is available," Galvin said. "You cannot recover faster [from your bad data] than your TTL allows recovery to occur."


Originally published on Network World |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Spotlight on ...
Online Training

    Upgrade your skills and earn higher pay

    Readers to share their best tips for maximizing training dollars and getting the most out self-directed learning. Here’s what they said.

     

    Learn more

Answers - Powered by ITworld

Ask a Question
randomness