Take all that traffic, take all that metadata, and what do you have? You have exabytes of data. Google's Eric Schmidt said in 2010, "There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days, and the pace is increasing ... People aren't ready for the technology revolution that's going to happen to them."
Well, people in general may not be, but the NSA has been working hard on it. In the obscure Utah town of Bluffdale, the NSA is building the blandly-named Utah Data Center. In this million-square-foot data storehouse, the NSA will be keeping its -- and your -- data.
It takes more than massive amounts of storage though to make big data usable. It takes software, but thanks to programs such as Hadoop, Hive, NoSQL, and Scala, we're getting there.
Scalable: New nodes can be added as needed, and added without needing to change data formats, how data is loaded, how jobs are written, or the applications on top.
Cost effective: Hadoop brings massively parallel computing to commodity servers. The result is a sizable decrease in the cost per terabyte of storage, which in turn makes it affordable to model all your data.
Flexible: Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. Data from multiple sources can be joined and aggregated in arbitrary ways enabling deeper analyses than any one system can provide.
Fault tolerant: When you lose a node, the system redirects work to another location of the data and continues processing without missing a beat.
What all this means is that the NSA can use Hadoop, or a similar program to take in huge amounts of data of all kinds and sorts, store it cheaply, and immediately get to work on it. With sufficient computing power, real-world data could be analyzed in close to real time.
The NSA isn't the only one working on this kind of speedy processing of massive data sets. Apache Drill is a relatively new open-source project that's building a distributed system for interactive analysis of large-scale datasets. Inspired by Google's Dremel research, Drill is designed to scale to 10,000 servers and query petabytes of data in seconds. Companies like IBM, HP, and Teradata are already making hundreds of millions of dollars helping customers like GE, Walmart, and Wells Fargo extract useful business information from petabytes of what once seemed like unrelated, even irrelevant data.
Still other companies, like Facebook, Google and Microsoft, use every bit of your data that comes their way from your use of their search engines and services to present you with customized ads. They've been using the triad of big data, traffic analysis, and metadata to make our Web experience more engaging for over a decade.
Put it all together and what do you get? You get a world where even if the NSA isn't actually looking at your Internet messages' content or listening to your phone calls, they can already find out a vast amount about you, whenever they want.
In the meantime, all the major Web companies are already doing the same things. We traded our privacy for the convenience of a customized Web experience years ago.
It's not just that the NSA has long been looking into our affairs that we've been oblivious to for years, it's that we gave up our privacy to businesses ages ago as well.
Ready or not, like it or not, welcome to the 21st century and the death of privacy.