Apache Hadoop to get more user friendly

The forthcoming Hadoop release, likely to be dubbed version 0.23, will also get improvements in availability, performance, and scalability

By Paul Krill, InfoWorld |  Data Center/Servers Add a new comment

Relief is on the way for users of the open source Apache Hadoop distributed computing platform who have wrestled with the complexity of the technology.

A planned upgrade to Hadoop distributed computing platform, which has become popular for analyzing large volumes of data, is intended to make the platform more user-friendly, said Eric Baldeschwieler, CEO of HortonWorks, which was unveiled as a Yahoo spinoff last month with the intent of building a support and training business around Hadoop. The upgrade also will feature improvements for high availability, installation and data management. Due in beta releases later this year with a general availability release eyed for the second quarter of 2012, the release is probably going to be called Hadoop 0.23.

[ Explore the current trends and solutions in BI with InfoWorld's interactive Business Intelligence iGuide. | Discover what's new in business applications with InfoWorld's Technology: Applications newsletter. | Follow Paul Krill on Twitter. ]

"A big focus for us is going to be adding tools for monitoring and distributing and management, [with the goal of making it] much easier for organizations to use Hadoop. The problem now is it takes a pretty sophisticated operations staff to install and use it," Baldeschwieler said during an interview at HortonWorks's Silicon Valley offices this week. He formerly was vice president of Hadoop engineering at Yahoo, which has been instrumental in Hadoop development.

Version 0.23 also is set for improvements in availability, performance, and scalability. "That's a big one for very large customers," such as Yahoo and Facebook, Baldeschwieler said. Tending to single points of failure in Hadoop's master nodes will be a goal.

Also, the new HCatalog data management software layer planned for Hadoop 0.23 will let users store data in a more traditional table style, enabling users to transparently move data between tools. It also yields benefits for the MapReduce programming model used with Hadoop. Currently, users can work with two higher level languages on top of Hadoop -- Pig and Hive -- said Baldeschwieler. Pig and Hive have their own specialty data stores. "What HCatalog's going to allow is for Pig and Hive and MapReduce itself to operate on one set of tables," he said.

An Apache representative concurred that goals for Hadoop include improvements for high availability, data management, and user friendliness, but Apache would not confirm what will be in the next release or what the version number will be. Because of Hadoop's culture of continuous beta releases, there has yet to be a formal 1.0 release, Baldeschwieler said. "There will come a point where we will want to call it 1.0 or 2.0."

This article, "Apache Hadoop to get more user friendly," was originally published at InfoWorld.com. Follow the latest developments in business technology news and get a digest of the key stories each day in the InfoWorld Daily newsletter. For the latest developments in business technology news, follow InfoWorld.com on Twitter.

Read more about data explosion in InfoWorld's Data Explosion Channel.


Originally published on InfoWorld |  Click here to read the original story.

ITworld LIVE

Data Center/ServersWhite Papers & Webcasts

White Paper

The Forrester Wave™: Disaster Recovery Services Providers

Improvements in disaster recovery plans and broad business continuity strategies are top-of-mind concerns for leading enterprises today and recovery time is now measured in hours and minutes not days. These key insights are discussed in the 2010 Forrester Wave Report.

White Paper

Roadmap to the Cloud Summary HP Brochure

This white paper reveals the key steps you need to take in order to build an effective cloud computing infrastructure. Start building your cloud step-by-step today.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

Forrester Whitepaper: IT Operations Managers Must Rethink Their Approach to Private Cloud

Organizations of all types are attracted by the promises of private cloud computing, but few actually have the virtual maturity to be successful. This Forrester report reveals the latest virtualization trends so you can see how far your peers are in their journey to the private cloud. Read on and discover best practices for improving virtualization in order to prepare for the cloud.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

Building Cloud-Optimized Data Center Networks white paper

Enterprises are turning to the Cloud to improve business agility, reduce expenses and accelerate business innovation. Cloud computing redefines the way IT assets are deployed and consumed and dramatically affects the way data center networks are architected and managed. Conventional hierarchical data center networks built to support traditional IT architectures can't meet the security, agility and price/performance requirements of virtualized cloud computing environments. This white paper reviews the impact of cloud computing on data center networks and describes HP's approach to building simpler, more secure and automated networks that fully meet the stringent performance, security, reliability and agility demands of the new data center in the Cloud.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

White Paper

Seven Priorities for Integrated Network Management - How HP Intelligent Management Center Delivers an Enterprise-class Solution

This white paper describes the major requirements for network management solutions to help the organizations become more profitable, efficient and reliable.Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.

See more White Papers | Webcasts

Ask a question

Ask a Question