After trying several popular file systems for Clemson's cluster, researchers determined they needed higher performance and greater reliability, says Boyd Wilson, executive director of computing, systems and operations. The result: revival of development work on the open source Parallel Virtual File System (PVFS) with the original architect, Clemson faculty member Walt Ligon. Ligon is working with a Clemson spin-off company called Omnibond that is providing commercial services for the file system.
In the Clemson cluster, OrangeFS is used to virtualize 32 commodity Dell storage servers while providing a single name space for the cluster nodes, Wilson says. Directory and file metadata are distributed on 1.6TB of solid state drives across the 32 storage nodes and there is a total of 256TB of raw rotational disk storage.
Unlike other high-performance file systems such as Lustre, which can only have a single metadata server, OrangeFS' distributed metadata approach and unified name space enable the file system to scale nicely while also simplifying operations, Wilson says.
These capabilities may ultimately benefit enterprise computing environments. "With a unified name space across potentially hundreds of storage nodes, you can add and remove nodes as needed and customers won't notice their files moving or ever have to be pointed to a new storage location," Wilson says. "Your unstructured data stores can grow and resize and be redundant and you won't have all of these different little silos of data. So it holds some potential to become an enterprise computing solution a couple of years down the road."
One Clemson researcher, Sebastien Goasguen, is using OrangeFS to develop a cloud-based infrastructure that can launch and work with tens of thousands of cluster-based virtual machines at once. "It leverages OrangeFS by enabling you to have a shared high-performing file system between all cluster nodes," Wilson says.
Goasguen is collaborating with KC (Kuang-Ching) Wang to build software-defined networks between VMs and client machines using OpenFlow, "which represents a nice convergence point with the university's work on OpenFlow," he says.
Clemson is one of seven collaborators with Stanford on the initial OpenFlow deployment. What started out as a tool to facilitate network research by adding an open, centralized, software-defined layer of network routing, OpenFlow promises to "change the whole way we think about networking," Wilson says. "A lot of people are realizing they would like more software-based control over their network infrastructure. ... You can do some really neat stuff."