August 27, 2007, 3:48 PM — You've to feel for the 20,000 people stranded recently at Los Angeles International Airport's Tom Bradley International Terminal and the 200 million more around the globe who lost access to Skype. Television and general-interest news organizations ascribed the woes to the ubiquitous "computer glitch." We, of course, know better. And usually the truth is a lot scarier.
According to a spokesman at LAX, passengers were stuck in terminals and in 60 planes starting at about 2 p.m. on Aug. 11 and service was not restored until nearly midnight. The failed system is one that contains information about incoming international passengers, including law enforcement data and outstanding arrest warrants. It's a reflection on the time in which we live, but that's another discussion for another venue.
As it turns out, the problem was neither rooted in changes to application software nor in a cyber-attack launched by "evil-doers and their henchmen" against networks operated by the U.S. Customs and Border Protection or by the airport. It was a plain old garden-variety hardware failure. But of what? Various published reports lay the blame with a network switch, failed server, rogue network interface card, and a generally ancient network infrastructure.
I don't buy it. Yes, it is hardware that failed, but the blame also lies with people, lots of them with lots of fingers to point. There are the IT organizations from the airport and Customs, including administrators, troubleshooters, security experts, and more. There's the ISP that provides connectivity. There are the public-sector bean counters who try to do things on the cheap (except for the occasional $100 million toilet). There's the lack of network monitoring tools that could zero in on misbehaving hardware. There are even solutions integrators who may not have persisted enough in getting these organizations to replace hardware and cabling that had reached old age -- as measured in IT years.
Normally accessing a database residing in Washington, D.C., the network wisely maintains a local backup copy in the rare event that communications back to Washington is lost. Good plan. Unfortunately, because it was the local network itself that went down, accessing the local backup became impossible. Further complicating matters was an initial misdiagnosis, leading to a six-hour hunt before it was determined that the problem lay within the local network and not elsewhere.