September 25, 2012, 5:34 PM — This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.
The high-performance computing (HPC) scientific/academic sector is accustomed to using commodity server and storage clusters to deliver massive processing power, but comparable large-scale cluster deployments are now found in the high-end enterprise as well.
Large Internet businesses, cloud computing suppliers, media and entertainment organizations, and high-frequency trading environments, for example, now run clusters that are on par and in some cases considerably larger than the top 100 clusters used in HPC.
What differentiates the two environments is the type of networks allied to the application programming models and the problem sets used. In the scientific/academic sector, it is typical to use proprietary solutions to achieve the best performance in terms of latency and bandwidth, while sacrificing aspects of standardization that simplify support, manageability and closer integration with IT infrastructure. Within the enterprise the use of standards is paramount, and that means heavy reliance upon Ethernet. But plain old Ethernet won't cut it. What we need is a new approach, a new "maverick fabric."
Such a fabric should have a way to eliminate network congestion within a multi-switch Ethernet framework to free up available bandwidth in the network fabric. It also should significantly improve performance by negotiating load-balancing flows between switches with no performance hit and, use a "fairness" algorithm that prioritizes packets in the network and ensures that broadcast data or other large frame traffic, such as localized storage sub-systems, will not unfairly consume bandwidth.
Adaptive routing and loss-less switching
A fundamental problem with legacy Ethernet architecture is congestion, a byproduct of the very nature of conventional large-scale Ethernet switch architectures and also of Ethernet standards. Managing congestion within multi-tiered, standards-based networks is a key requirement to ensure high utilization of computational and storage capability. The inability to cope with typical network congestion causes: