How to use Hadoop to overcome storage limitations

By Frank J. Ohlhorst, CIO |  Storage, Hadoop

Big data analytics requires that organizations choose the data to analyze, consolidate it and then apply aggregation methods before it can be subjected to the ETL process. What's more, that has to occur with large volumes of data, which can be structured, unstructured or from multiple sources, such as social networks, data logs, websites, mobile devices, sensors and other areas.

Hadoop accomplishes that by incorporating pragmatic processes and considerations, such as a fault-tolerant clustered architecture and the capability to move computing power closer to the data and perform parallel and/or batch processing of large data sets. It also provides an open ecosystem that supports enterprise architecture layers from data storage to analytics processes.

Not all enterprises require the capabilities that big data analytics has to offer. However, those that do must consider Hadoop's capability to meet the challenge. But Hadoop cannot accomplish everything on its own -- enterprises will need to consider what additional Hadoop components are needed to build a Hadoop project.

For example, a starter set of Hadoop components may consist of HDFS and HBase for data management, MapReduce and Oozie as a processing framework, Pig and Hive as development frameworks for developer productivity and open source Pentaho for BI.

From a hardware perspective, a pilot project does not require massive amounts of equipment thrown at it. Hardware requirements can be as simple as a pair of servers with multiple cores, 24 or more gigabytes of RAM and a dozen or so hard disk drives of two terabytes each, which should prove sufficient to get a pilot project off the ground.

However, be forewarned that effective management and implementation of Hadoop require some expertise and experience, and if that expertise is not readily available, IT management should consider partnering with a service provider that can offer full support for the Hadoop project. That expertise proves especially important when it comes to security. Hadoop, HDFS and HBase offer very little in the form of integrated security, so data still need additional protections against compromise or theft.

All things considered, an in-house Hadoop project makes the most sense for a pilot test of big data analytics capabilities. After the pilot, a plethora of commercial or hosted solutions are available to those who want to tread further into the realm of big data analytics.

Frank J. Ohlhorst is a New York-based technology journalist and IT business consultant.

Read more about business intelligence (bi) in CIO's Business Intelligence (BI) Drilldown.


Originally published on CIO |  Click here to read the original story.
Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness