From: www.itworld.com

The supercomputer in my basement

by Louis Chua

May 21, 2003 —

 

In the world of High Performance Computing (HPC), Beowulf clustering stands in a class of its own.

Beowulf is an approach to building a supercomputer by means of clustering commodity off-the-shelf (COTS) computers that are interconnected with a local area network technology like Ethernet and running programs optimized for parallel processing.

It is possible to assemble a collection of commodity hardware components and freely available software packages in a day and have them executing real-world applications. Traditionally, Beowulf is a class of Pile-of-PCs that leverages on low cost mass-market systems that support Unix-like operating systems at low or no cost and for which source code is readily available.

The benefits of a Beowulf cluster is that it allows even small research firms to build their own supercomputers. In addition to possible cost savings, building their own supercomputers is very much a learning investment in itself for the researchers as well as making them less dependent in the future on particular hardware and software vendors. A Beowulf cluster can upgrade and evolve as off-the-shelf technology evolves.

The development of this technology is due to the development of the causal computing market such as office automation, home computing, games and entertainment which provided system designers with cost-effective components. The COTS industry provides fully assembled subsystems such as microprocessors, motherboards, disks and network interface cards for which mass market competition has driven the prices down and reliability up for these subsystems.

Coupled with the development of standards for interoperability of subsystems which has generated an open market of COTS, it is possible to customize different versions of Beowulf or just maximizing cost advantages. Beowulf developers often choose the Linux operating system and use standard message passing protocols between the computers within the cluster for cost advantages.

This presents an opportunity for parallel computer vendors to provide a low entry-level cost to parallel systems, expanding the role of parallel computing and the number of people capable of using parallel computers, creating a larger customer base for parallel computers in the long term. In the taxonomy of parallel computing, a Beowulf cluster is placed somewhere between a massively parallel processor (MPP) and a network of workstations (NOW) that is clustered for the purpose of load balancing.

The original Beowulf cluster was developed in 1994 at the Center of Excellence in Space Data and Information Sciences (CESDIS), a contractor to the U.S. National Aeronautics and Space Administration (NASA) at the Goddard Space Flight Center in Greenbelt, Maryland. Thomas Sterling and Don Becker built a cluster computer that consisted of 16 Intel DX4 processors connected by channel-bonded 10M bps Ethernet. Their success led to the Beowulf Project, which fosters the development of similar COTS clusters. A number of such clusters have been developed in universities and research groups around the world.

While the original Beowulf cluster was designed to squeeze out additional computing power and life span out of spare equipment to maximize cost advantage, nowadays, while cost is still an advantage, researchers are creating Beowulf clusters out of new hardware devices instead of just spare equipment.

"People in the past were either used to keeping older hardware or they re-cycle hardware. Today, Beowulf clusters are part of any researchers' arsenal of tools, so people buy new hardware to implement Linux Beowulf clusters," explained Lawrence Liew, manager of Singapore Computer Systems' Linux Competency Centre (SCS-LCC). "Treat Beowulf clusters as a cost-effective supercomputing solution. It does not make sense to run on old P2 clusters as your code will probably run faster on a single 2.8GHz P4 then on a cluster of P2s," said Liew. "So people buy latest, faster CPUs they can find to build clusters."

Locally, the Singapore-MIT Alliance (SMA) at the National University of Singapore, together with the SCS-LCC, recently commissioned a new Beowulf cluster based on Intel Itanium 2. The SMA is a science and engineering education and research collaboration between three engineering research universities, namely, National University of Singapore (NUS), Nanyang Technological University (NTU) and Massachusetts Institute of Technology (MIT).

The SMA cluster, named HydraIII, is the first large-scale Intel Itanium 2-based Beowulf cluster to be deployed into production using the open-source NPACI Rocks cluster toolkit according to SCS. "The increasing demand for high performance computing power will be a major driver of computing innovation throughout the next decade. We expect clusters and grids using the open standard Intel Itanium processor family to deliver the performance and affordability required by the industry," said William Wu, Itanium processor family marketing manager, Asia Pacific.

The Rocks cluster toolkit's development is led by the San Diego Supercomputer Centre (SDSC). The SDSC is a research unit of the University of California, San Diego (U.S. and runs the National Partnership For Advanced Computational Infrastructure (NPACI) site. SDSC's mission is to develop and use technology to advance science, and to provide leadership internationally in areas such as computing, data management and biosciences.

SCS-LCC is a co-developer of NPACI Rocks cluster toolkit with the SDSC, and provides commercial Rocks cluster toolkit support and turnkey Linux high performance computing cluster solutions based on Rocks.

"SCS Linux Competency Centre collaborates closely with the San Diego Supercomputer Centre on NPACI Rocks and provides critical support in the areas of file systems and queuing systems," said Philip Papadopoulos, program director for SDSC's Grid and Cluster Computing group. "The Rocks user community benefits greatly from SCS's expertise and their significant contributions to this community toolkit."

HydraIII cluster supports about 50 SMA researchers and post-graduate students involved in various projects, ranging from computational fluid dynamics to bio-engineering. "We are very pleased with the performance and ease of management of the Rocks-based Itanium 2 cluster," said Professor Khoo Boo Cheong, program co-chair of High Performance Computation for Engineered Systems at SMA. "We intend to encourage more researchers to migrate to HydraIII over the next few months. The technical expertise and assistance that the SCS-LCC team has provided to us made a huge difference to our transition to 64-bit Linux parallel computing."

According to Liew, SCS is providing level 2 and 3 support to SMA. SCS will also conduct monthly NPACI Rocks administrator as well as providing end user training for the internal SMA's engineers to manage the cluster. While managing a cluster used to be a tedious and time-consuming job, Liew believes that this is no longer true.

NPACI Rocks provides a very scalable and easy-to-manage methodology for building large scalable clusters, he said.

The SMA cluster consists of 15 Hewlett-Packard (HP) rx5670 nodes, each with four Itanium 2 processors, and is interconnected with a high performance, high bandwidth, low latency switching Myrinet system. Myrinet is a high performance, packet communication and switching technology that is widely used to interconnect clusters of workstations, PCs, servers, or single-board computers. Myrinet is also an American National Standard -- ANSI/VITA 26-1998 -- for which the link and routing specifications are public, published and open.

While conventional networks such as Ethernets can be used to build clusters, they lack certain features as well as the performance required for high-performance or high-availability clustering that SMA requires. Myrinet is able to provide full-duplex 2+2 Gigabit/second data rate links, switch ports and interface ports as well as flow control, error control and "heartbeat" continuity monitoring on every link.

It is a form of switch networks that can scale to tens of thousands of hosts and that can also provide alternative communication paths between hosts. The host interfaces also allow the execution of a control program to interact directly with host processes, bypassing the operating system, for low-latency communication and directly with the network to send, receive and buffer packets. "Myrinet powers around 70 to 80 percent of all large-scale, production-grade Beowulf clusters," said Liew.

The SMA's cluster used the Red Hat Linux operating system and is managed by the tools of NPACI Rocks version 2.3.2. Current Linpack performance achieves around 70 percent of theoretical peak processing power (240GFLOPS) at 167GFLOPS over the Myrinet interconnect. According to SCS, the cluster was installed with Rocks and had applications running in less than a day.

According to Liew, using NPACI Rocks provides many benefits such as easy cluster management and complete out-of-the-box set of HPC softwares and libraries as well as support on Itanium 2 using Red Hat Linux OS. This allows the cluster to be easily expanded in the future when the need arises.

"The team took less than a day to install the cluster with Rocks and getting the cluster operational. This is a testimony to the amount of work that has gone into making Rocks one of the best and easiest to use cluster toolkits in the world," said Liew.

According to Liew, the Rocks is supported on any systems that can support Red Hat Linux. The decision to go with Itanium 64 is due to SMA's requirement for 64-bit processing. "During the purchase cycle, Linux was not generally available on any other 64-bit platform that is commercially well supported," added Liew.

"The rapid deployment by SCS of the HP system demonstrates that 64-bit high performance clusters are now as easy to build as 32-bit x86 processor systems," said Leslie Ong, director, HP Business Critical Systems, South East Asia. "Such efficiency in rollout underscores the growing momentum to move to open standards from proprietary systems in the scientific community," he added.

However, despite all the benefits of a Beowulf cluster, Beowulf is not for everyone. Any site that would include a Beowulf cluster should have a systems administrator already involved in supporting the network of workstations and PCs that inhabit the workers' desks.

This is because Beowulf is a parallel computer which must run parallel programs. A site without such a skill base should probably not follow the Beowulf path. Beowulf is loosely coupled and is a distributed memory environment that runs message passing parallel programs which do not assume a shared address space across processors. As such, its long latencies require a favorable balance of computation to communication and code written to balance the workload across processing nodes.

Incidentally, Beowulf is the name of the legendary hero who slayed the monster Grendel and became the king of the Geats in an anonymous Old English epic poem that is believed to have been composed in the early eighth century.