February 01, 2011, 4:05 PM — Yahoo is discontinuing its distribution of the Hadoop platform and will instead focus on Apache Hadoop, the Hadoop Team at Yahoo said this week.
Hadoop, which was built initially by Apache Chairman Doug Cutting while he was at Yahoo, has become prominent in data centers and cloud computing. Yahoo will halt its own distribution and remove all references to a Yahoo distribution from its Web site and close its github facility for Hadoop. "Our intent is to return to helping Apache produce binary releases of Apache Hadoop that are so bulletproof that Yahoo and other production Hadoop users can run them unpatched on their clusters," said Eric Baldeschwieler, vice president of Hadoop development at Yahoo, in the company's announcement.
[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page Cloud Computing Deep Dive PDF special report. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]
The Apache Hadoop community has been "very turbulent" lately, according to Baldeschwieler. "Over the last few months we have been developing Hadoop enhancements in our internal git repository while doing a complete review of our options. Our commitment to open sourcing our work was never in doubt, but the future of the Yahoo distribution of Hadoop was far from clear. We've concluded that focusing on Apache Hadoop is the way forward," said Baldeschwieler
Yahoo will have to sort out how to contribute several man-years' worth of work to Apache to "unwind the Yahoo git repositories," Baldeschwieler said. Yahoo has proposed a 20.100 release of Hadoop, featuring stability and high performance. Also, Yahoo has set up a feature branch called hadoop-future. A draft list of proposed features includes federation, with the ability to use more storage per Hadoop cluster; a new metrics framework; and optimizing the Hadoop MapReduce parallel applications framework for use with small jobs
Yahoo said that until the Hadoop 0.20 release, Yahoo committers worked as release masters to produce binary Apache Hadoop releases for the entire community to use on clusters. "As the community grew, we experimented with using the Yahoo distribution of Hadoop as the vehicle to share our work. Unfortunately, Apache is no longer the obvious place to go for Hadoop releases. The Yahoo team wants to return to a world where anyone can download and directly use releases of Hadoop from Apache. We want to contribute to the stabilization and testing of those releases," Baldeschwieler said.