Somewhere, deep in the bowels of the data center, a mission-critical server goes belly-up. This server is as bulletproof as they come. It has clustered hard drives; should a drive fail, the others take over. It has multiple NICs, all dual-homed to redundant switches. Those switches are, in turn, dual-homed to a redundant core network.
One morning, the server disappears from the network. Users arriving at work attempt to log in and can't. Mission-critical applications and databases are offline. Work comes to a screeching halt. Soon word begins to circulate: "The network is down!"
This cannot be considered a network failure in any way, shape, or form -- the server has abended and needs to be rebooted. However, to your users, anything attached to the network is "the network." This extends to Internet sites and even to users' own workstations.
The more you argue the network is perfectly fine, the more convinced users become there is a network problem. ("Methinks thou dost protest too much!")
Stop arguing. There's a better way to correct this misconception, serve your users, and keep your network's good PR in place.
- Let your attitude be "I am guilty until proven innocent." (In other words, do exactly the opposite of what telephone carriers do when they have a network failure.) Assure your users that you know this is a serious situation that must be taken care of quickly. Assume responsibility even if you aren't convinced it is a network problem.
- Communicate and work closely with the server administrators. This may not always be easy to do, but it pays off in times of crisis. A good working relationship allows you to assuage user complaints while solving their problems as quickly as possible.
- Educate your users. You can't expect ordinary users to distinguish between a server abend and a router crash, so don't push this too far. However, you can help them understand that just because they can't reach a certain host or Website, the network is not necessarily at fault.
- Monitor your network closely so you know what actually went wrong. Most servers can be equipped with SNMP MIBs that signal you if there is a problem. If you have hundreds of servers, you might not want to equip every one with a MIB, but you should include your mission-critical servers in your SNMP system.
If you follow these axioms, you still may hear those feared words, "the network is down," but you will hear them less often and with less venom.