Big data analytics moves to the cloud
What if you could store your data in the cloud and run complex queries and analytics on it where it resides, without moving it? That's the question high-performance cloud infrastructure and big data analytics specialist Joyent looks to answer with its new Joyent Manta Storage Service.
Manta is a next-generation cloud object store and data services platform designed to bring compute and analysis capabilities directly to customers' data in the cloud.
"The killer app for the personal computer was the spreadsheet," says Bryan Cantrill, vice president of Engineering at Joyent, who has spent the past three years immersed in the creation of Manta. "That was the killer app because it represented the convergence of data and compute. It allowed business folks to run that data on their desks, and they didn't need a time-sharing program to do that."
Cloud storage, he says, brought together two other pillars of technology--data and the network--and combined them. Now, Manta goes a step further, Cantrill says.
Is Manta the Spreadsheet of the Cloud?
"I believe Manta represents the spreadsheet of cloud computing," Cantrill says. "It opens up vistas of ad hoc analysis of unstructured data. What we're doing is converging compute with data with the network. We're converging all three of those into a single facility. This is the holy trinity of compute, network and data unified into a single offering."
By removing the need to manage infrastructure and move data, Cantrill says Manta gives enterprises the capability to process big data faster and more easily, all while keeping it secure and at prices on par with Amazon.
"You really can't bring a product to market that is not going to be competitive on price," he says. "We think there's more to cloud computing than just price, but it's simply a non-starter to overprice a product. You'll pay S3 prices for the storage, and you pay what are effectively EC2 or Joyent Infrastructure as a Service prices for any compute you spin up on your data. You may spin up for a second or two seconds or three seconds. If so, you just pay for one or two or three seconds of compute."
For instance, he says, maybe you want to run periodic validation checks on your backups. It's typically not done because it would mean dragging all your backups back to compute. But, he says, it becomes trivially easy to do such validation tests with Manta--you just have to spin up compute on the backups where they reside for a few microseconds.
Manta gives you the capability to execute compute tasks including log analysis, search index generation, financial analysis and other data-intensive tasks without moving data or setting up compute clusters and processing software. Cantrill explains that code is brought in parallel to physical servers in secure containers while data is automatically merged using the industry-standard MapReduce pattern.
Eliminating Need to Copy Data from Storage to Compute Clusters
Copying data across a network from storage onto a compute cluster can take hours," says Konstantin Gredeskoul, CTO of online shopping community Wanelo.
"Joyent Manta Storage Service strips the need to invest any time moving the data around, making ad hoc querying and analysis near-instantaneous, seamless and cost effective. We are now able to perform complex cohort analysis and retention reports across hundreds of gigabytes of data in a couple of minutes. When compared to traditional methods such as data warehousing, this is game changing," Gredeskoul says.
"Fifty percent of the world's smartphone traffic goes through Ericsson, and we are continuously evaluating new technologies to increase the ability of the network to manage growing data volumes in the most responsive, secure, cost effective ways possible," adds Vish Nandlall, Ph.D., CTO and head of Strategy and Marketing for Ericsson North America. "Joyent's new compute-on-storage innovation is a fundamental paradigm shift that changes the economics and utility of object storage and high-performance big data analysis."
The Joyent Manta Storage Service provides the following:
A multi-datacenter object store with fine-grained replication controls.
No object size limits.
Strongly consistent writes and highly available reads.
Per-object replication policies.
A file system-like namespace, including directory queries.
Manta Derives Its Capabilities from OS-Level Virtualization
Cantrill notes that Joyent achieves these capabilities by focusing its efforts on OS-level virtualization instead of hardware virtualization. OS virtualization gives Joyent the power to spin up compute on the objects in the Internet-facing object store where the objects live.
"Virtual hardware is great if you need to support a legacy operating system," he says. "But if you want to deliver the highest-performance experience, you shouldn't virtualize hardware, you should virtualize the operating system. The application runs directly on the hardware but in a secure and containerized way."
"You bring your analytics tools to the cloud, effectively, he adds. "You can run whatever you want on top of your data--Java, Python, Perl or just a Unix Shell. If you've got your own analytics tools that are using R or what have you, you can bring whatever analysis software you've got, effectively, and you can push into the cloud and run it where your data is."
Joyent has partnered with data storage management company Panzura to help customers securely migrate their data from existing NAS, backup and archive storage to Manta.
Thor Olavsrud covers IT Security, Big Data, Open Source, Microsoft Tools and Servers for CIO.com. Follow Thor on Twitter @ThorOlavsrud. Follow everything from CIO.com on Twitter @CIOonline, Facebook, Google + and LinkedIn. Email Thor at email@example.com