Making big data smaller

MIT researchers develop a new approach to working with big data: reduce it to a size that can be managed and analyzed with conventional tools

By  

As an example, they tested the approach on the problem of discerning patterns in GPS data. Instead of having to process every coordinate collected, they reduced the problem to one of estimating a set of common routes or paths (the coresets) via linear regression, that could then be used for analysis. They were able to compress a data set of over 2.6 million data points from San Francisco taxis to smaller ones that ranged from 0.14% to 1.79% of the size of of the original while still preserving the core information in the data. The amount of compression achieved depended on the quality of the approximations to the true data.

This last point indicates a potential drawback to the approach: by approximating the meaningful portion of the data via estimation in order to achieve compression, errors are introduced. This means that analytics based on the compressed data, potentially, aren’t  as accurate as they would be if they were based on the original information. However, since the estimations are based on large data sets, they find that the margin of error can be small enough to be considered an acceptable tradeoff for the data compression achieved. 

What does all this mean for the average business wrestling with taming big data? In the short term not much. But in the longer run these methods could lead to newer, cheaper approaches to a wide range of big data problems. The authors argue that their methods have ”many potential applications in map generation and matching, activity recognition, and analysis of social networks.“ 

We’ll keep an eye on it, so stay tuned...

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Big DataWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

Ask a Question
randomness