December 12, 2000, 2:17 PM — Somewhere, deep in the bowels of the data center, a mission-critical server goes
belly-up. This server is as bulletproof as they come. It has clustered hard drives;
should a drive fail, the others take over. It has multiple NICs, all dual-homed to
redundant switches. Those switches are, in turn, dual-homed to a redundant core
One morning, the server disappears from the network. Users arriving at work attempt
to log in and can't. Mission-critical applications and databases are offline. Work
comes to a screeching halt. Soon word begins to circulate: "The network is down!"
This cannot be considered a network failure in any way, shape, or form -- the server
has abended and needs to be rebooted. However, to your users, anything attached to the
network is "the network." This extends to Internet sites and even to users' own
The more you argue the network is perfectly fine, the more convinced users become
there is a network problem. ("Methinks thou dost protest too much!")
Stop arguing. There's a better way to correct this misconception, serve your users,
and keep your network's good PR in place.
- Let your attitude be "I am guilty until proven innocent." (In
other words, do exactly the opposite of what telephone carriers do when they
have a network failure.) Assure your users that you know this is a serious situation
that must be taken care of quickly. Assume responsibility even if you aren't convinced
it is a network problem.
- Communicate and work closely with the server administrators.
This may not always be easy to do, but it pays off in times of crisis. A good working
relationship allows you to assuage user complaints while solving their problems as
quickly as possible.
- Educate your users. You can't expect ordinary users to
distinguish between a server abend and a router crash, so don't push this too far.
However, you can help them understand that just because they can't reach a certain host
or Website, the network is not necessarily at fault.
- Monitor your network closely so you know what actually went
wrong. Most servers can be equipped with SNMP MIBs that signal you if there is a
problem. If you have hundreds of servers, you might not want to equip every one with a
MIB, but you should include your mission-critical servers in your SNMP
If you follow these axioms, you still may hear those feared words, "the network is
down," but you will hear them less often and with less venom.