April 03, 2001, 6:08 PM —
Here I am, still setting up the VarLinux.org Weblog and Webzine. As I mentioned last week, I need a new server to double as the VarLinux.org server and act as my secondary name server. The computer outlet Citycom (see Resources for more information) is right around the corner from me and has served me well in the past. So I pay them a visit and purchase an Asus A7V133 with a 1-GHz Athlon for a very good price.
The best way to exploit the new chipsets on this A7V133 motherboard is to use the 2.4 kernel. The older kernels don't take advantage of the UDMA100 IDE controller and drives. Unfortunately, there are some problems with the new motherboards and the newest kernels, and sometimes they feed off of each other.
Reiserfs isn't fully debugged in the new 2.4 kernels. I happen to be a very big fan of Reiserfs; I use it on all my machines. But I notice that when I upgrade my kernel to fix some of the most recently discovered Reiserfs bugs, suddenly my 3Com 3C905B card goes bahooties and starts generating tons of error messages.
I eventually find out that the Linux kernel folks blame this behavior on a motherboard PCI BIOS setting. They're probably right, but I don't know this at the time, so I assume the new kernel PCI driver simply doesn't like the 3C905B card. I can't afford to sacrifice the Reiserfs fixes to go back to a previous kernel version, and I don't want to back-port the Reiserfs fixes manually. So I swap the 3Com network interface card (NIC) for an Intel EtherExpress Pro 100 NIC from another machine that is using a different PCI chipset.
Little do I know that this decision is about to throw me into duplex hell. In its new home, the 3C905B doesn't generate the same error messages, but all network traffic to that machine suddenly slows to a crawl. It is an unlikely coincidence that the card would go bad just when I upgrade to a new kernel patch, but I'm afraid I'll have to assume that the 3C905B has expired. I replace it with an extra 3C905B that I have laying around. That doesn't fix the problem, it just changes it. I still get poor performance and network errors -- only different ones.
I look at the network status using a variety of command-line utilities. I should recognize what is going on, since I've encountered this kind of behavior once before. But brain-block prevents me from seeing the obvious. So I play musical NIC cards for a couple hours in an attempt to hunt down the problem. Each time I swap NICs, the behavior changes, but a problem always exists somewhere on the network that doesn't seem to correspond to any specific NIC.
Suddenly, memories of a similar situation come rushing back, and I realize that most of my NIC cards are not properly autonegotiating the full-duplex and half-duplex setting when they are initialized. You can get some really interesting network behavior when you try to mix full-duplex with half-duplex -- the NIC is using the collision line to transfer data, and the hub or the other NIC sees that data as, well, duh, packet collisions. Needless to say, these cards don't talk to each other or to the hub very well when they misinterpret what's coming over the collision line.
My home office uses two Ethernet hubs. One is a four-port half-duplex 10BaseT hub that is part of my Cayman DSL router. Some of my cards can correctly identify the router as half-duplex. But many of my NICs choose full-duplex mode for this router, which mucks up the network communication to the Internet. In theory, this should be easy to fix. You can tell many cards to switch to half-duplex against their better (or worse) judgment. In fact, I have already set the option "full_duplex=0" for my 3C905B cards that talk to the Cayman router.
Indeed, that is why I play musical NICs for so long. I assume that at least the 3C905B cards talking to the Cayman router are set in half-duplex mode, due to the configuration switch I'm already using. What I fail to realize, until I check each card individually with a diagnostic program, is that one of my 3C905B cards is stuck permanently in full-duplex mode. I'm supposed to be able to unstick this card with a DOS configuration utility or a Linux diagnostic program, but neither works. This particular card has decided that the whole world is full-duplex, and nobody is going to tell it different.













