February 23, 2012, 7:18 AM — One of the challenges companies often face when using Hadoop to aggregate massive volumes of structured and unstructured data is finding a way to efficiently control and manage user access to that data.
Zettaset, a Mountain View, Calif.-based vendor of tools for managing big data, on Wednesday announced a new security initiative to help companies address that issue.
Under its SHadoop initiative, Zettaset will integrate new functions into its existing Hadoop Orchestrator platform that will allow IT administrators to implement role-based access control over Hadoop environments.
The new tools will allow administrators to better define what different categories of users can and cannot do with data in a Hadoop platform -- giving administrators a way to restrict users from executing certain jobs, or from importing or exporting certain kinds of data, according to Zettaset.
SHadoop will allow administrators to establish a baseline security policy for all users with access to a Hadoop system, the company said. It will then allow them to track, log and audit all user or group activity within the Hadoop platform.
Future versions of SHadoop will enable companies to encrypt data stored in a Hadoop cluster or transmitted between Hadoop nodes, Zettaset said.
Those features all address enterprise concerns around using Hadoop, according to analysts.
The Apache Hadoop Distributed File System allows companies to store and manage petabytes of data from disparate data sources far more efficiently than relational database management systems allow. A growing number of companies have begun using the open-source technology to aggregate and analyze huge volumes of structured and unstructured data captured from websites, social media networks, emails, audio and video files, sensors and machines.
While this data aggregation has enabled new levels of social media mining, sentiment analysis and fraud detection, it also creates new access control problems. Unlike traditional database management technologies, Hadoop does not give administrators many ways to control access to data beyond Access Control Lists and Kerberos-based authentication.
"So, while you can authenticate users, how do you go about setting up fine-grained access provisions," said David Menninger, an analyst with Ventana Research. "You can segregate sensitive data into separate files, nodes or clusters, but that still doesn't give you the row-level access control that people are used to in relational databases."