From: www.itworld.com

A showcase of clustering diversity

by Rawn Shah

March 23, 2001 —

 


This year's LinuxWorld Conference & Expo boasted a whole new crop of clustering products. Commercial clustering solutions are finally ready for harvest. No longer are we stuck in a relative wilderness of Beowulf-only clustering -- the focus of attention for most academic institutions and scientific applications. Now, high-availability, load-balancing opportunities prance about in gleeful attempts to attract the most customers.



It is great to see a wide range of products focused on clustering finally emerge in commercially supported systems. This legitimizes the product, since companies that once hesitated because of the lack of available support can now confidently accept Linux-based clusters from small and large vendors alike.



The show's events took place both on the exhibition floor and in the session presentations. The exhibits were mostly manned by salespeople, but I was able to find technical staff and product managers to explain their products in detail, free of sales-speak.

LinuxWorld.com links


Now readily available for you


I witnessed Mission Critical Linux's working demonstration of its tightly coupled high-availability Convolvo Cluster system for pairs of Linux nodes. The two systems are connected by a private 100BaseT Ethernet with a crossover cable, as well as a direct null-modem serial cable. They attach to the same storage system through standard SCSI cables, allowing either to issue disk commands. This requires a SCSI storage system with a dual-attachment feature, which raises the cost. Mission Critical also offers special intelligent power supplies for each node, with cross-connected controls between nodes so a failed node can be hard-rebooted by the working node.



The nodes need not be identical, as long as they communicate with each other and the SCSI channel at approximately equal speeds, and run the same Linux distribution, along with any system changes. Mission Critical has not published numbers for actual failover response times. The nodes work by maintaining a quorum partition on the attached storage system. Through this partition, they can also exchange data -- such as which node will run a specific application, and which applications should failover to the other node.



The system is managed by a command-line interface or a Web-based GUI that can execute all the functions needed to operate the cluster. When a node fails, the cluster executes an application failover script to restart the application. While several major applications, like Apache and Samba, have been implemented on Convolvo clusters, no published API will allow software developers to incorporate direct failover within their applications. The software costs about $995 per node, or $1,990 per cluster pair.



Alongside their current multinode clustering on PIIIs and Xeons, SGI demonstrated an impressive eight-node clustering of IA-64 systems from their 1200, 1400, and 1450 Linux server lines. SGI's real contribution to Linux is still under the hood, so it does not get the respect it truly deserves. SGI's Advanced Cluster Environment is a well-put-together package for Beowulf clustering, complete with proper management, development, debugging, and profiling tools. This package can help you build a Beowulf cluster right; you can download the tools from SGI.



SGI is also bringing NUMA support to Linux, to help in the transition of its NUMA-architecture Onyx servers to Linux (like the servers' Irix siblings). The Onyx servers, still the preferred choice for graphics-rendering shops, have received some competition from Beowulf PC clusters. The move to Linux, along with the introduction of even higher-performing models, is an attempt to shape up the Onyx. At this time, support is still low-level, though, and SGI has not announced the availability of any Linux-based Onyx models.



Several other big-name vendors, including HP, Compaq, and IBM, had Trillian Linux running on the upcoming Itanium IA-64 -- on individual systems as well as clusters. Nothing extraordinary was demonstrated, besides the fact that it actually runs.



I had a conversation with Linux NetworX's people about their new Beowulf cluster management software. The Sandy, Utah-based company, once known as Alta Technologies, is known for constructing custom Beowulf solutions, including the large-scale Road Runner cluster at the University of New Mexico.



The leading Linux server distribution vendors, Red Hat and TurboLinux, also demonstrated their failover clustering capabilities. PolyServe, whose Understudy software-based failover clustering system was recently covered by Joe Barr, demonstrated its per-application, per-port failover system with an Ethernet-based heartbeat monitoring system.


High-end storage products abound



Several storage vendors were present -- including Adaptec, ICP vortex, 3ware, and Raidion -- offering SCSI- and IDE-based smart controllers, and storage systems for servers. Adaptec presented its usual set of dual-channel cards, but ICP introduced some five-channel Ultra3 cards that made quite an impression. Each channel on the card can maintain its own RAID 0, 1, 5, or 10 system, even though all share the same on-card memory -- limited to only 128 MB, unfortunately. For five channels running at 160 MBps, that is barely adequate for any substantial caching, especially when used with mostly-read system applications like Web servers. ICP also offered a similar card with Fibre Channel interfaces and protocols.



3ware aimed for the low end of the market, with multichannel RAID cards for IDE devices. Although IDE is limited to two drives per chain (the equivalent of a SCSI channel), 3ware offered up to eight smart IDE controllers per card -- each controlling its own RAID 1 or 10 -- with a maximum of 16 drives. In addition, the devices were priced under $500 for the 8-port card. The catch: these internal connector cards can become a tangle of cables if not done carefully, which can make the rare hardware maintenance check downright nightmarish.



I bumped into Dave McAllister while heading to one of the many vendor parties. Formerly CTO of Maxspeed and product manager for Linux systems at SGI, he now leads the technical side of 3ware alongside CEO Breau Vrolyk, also from SGI's Linux initiative. McAllister let me in on upcoming news of 3ware's Gigabit-Ethernet-based storage cards, which create a combination of direct-attached SANs or network-attached storage systems, giving both high-speed access and easy network integration. He also mentioned that the company may have a 10-GB Ethernet solution by mid-2001.



Raidion offered Fibre Channel and SCSI Storage Area Network systems based on their Virtual Storage Server, a SAN manager and network interface. The unit is a storage node that can be accessed from clusters or server nodes over the separate Ethernet interface. I also noticed that the company's high-density, low-profile 3U-height storage enclosures with 10 hot-swap bays for Ultra2/Wide LVD SCSI drives were being used in several other vendors' booths to demonstrate their own products.



I made it to a few sessions on clustering, but was surprised at how similar they were to the birds-of-a-feather sessions. In one session, the speaker simply assumed that everyone was intimately familiar with the topic and ran through the session, leaving a lot of comments, terms, and software unexplained. The information was too vague to make much sense. The one session I thoroughly enjoyed was Peter Braam's "Linux Clusters" presentation, although the title was a little misleading.


Resources