Making Project Serengeti available free through Apache also continues a trend by VMware to embrace open standards. Its platform-as-as-service (PaaS) offering, Cloud Foundry, for example, is also open source. Ibarra says VMware wants Project Serengeti to be widely adopted within the Hadoop community and compatible with all the various Hadoop distributions, so open source was the way to go.
Project Serengeti is an important move to make Hadoop enterprise-friendly, says Tony Baer, an analyst at Ovum. "This will help Hadoop become more mainstream," he says. There are a variety of use cases where Hadoop could benefit from running in a virtualized environment, such as if an enterprise wants to experiment with a new feature on a dataset, but not expose the entire cluster.
Ibarra says VMware officials have seen three major use cases for Hadoop among customers: One is in companies that are testing the platform and have less than 20 nodes or so. These customers, he says, are ideal for virtualized distributions of Hadoop because it will not require large new capital expenses if Hadoop can run on legacy vShphere private clouds.
A second customer set has an expanded use of Hadoop, up to 100 nodes or so, Ibarra says, and may be looking to take advantage of the dynamic elasticity Project Serengeti allows Haddop to leverage. A third use case is for the early Hadoop adopters, who are running hundreds of nodes and are looking for advanced uses. Almost any business today, he says, will find some use for Hadoop given the vast amounts of unstructured data produced through web traffic that can be analyzed.
Carl Brooks, a cloud analyst at the 451 Research Group, says VMware is not the first to run Hadoop on virtualized machines, so the more significant news is that more vendors are recognizing Hadoop's importance and potential, and are offering services around it. HortonWorks, for example, announced on Tuesday a Hadoop distribution compatible with VMware vSphere.
Hadoop is still early on its enterprise adoption phase though, says Ovum's Baer. A lack of skilled workers to manage Hadoop clusters and interpret the data Hadoop creates is another challenge for curious enterprises, he says.
Read more about data center in Network World's Data Center section.