Big data, metadata, and traffic analysis: What the NSA is really doing

The NSA doesn't have to intercept and read all your messages to know what you're doing -- and neither do many Internet businesses.

What I find most remarkable about all the hubbub about the National Security Agency's Prism program is how little new "news" there is to Edward Snowden's "revelations."

After all, the NSA's mission has been to intercept communications and break codes ever since it was founded in 1952. Combine that with the Patriot Act, and anyone who's bothered to read the books of NSA expert James Bamford over the last few years won't find anything in the least bit surprising about Prism.

It's possible, of course, that the NSA is doing something technically interesting, like intercepting and breaking SSL-protected Internet communications. But the NSA doesn't have to bother with deciphering your PGP-protected love notes to your sweetie to know what you're up to. No, they can combine their age old techniques of working with metadata and traffic analysis with 21st century big data analysis to have a darn good idea of what you, along with everyone else, are doing.

It's not just the NSA, though. Big Internet businesses have been using the same techniques to deliver customized Web experiences to you for almost twenty years.


It's metadata that gives anyone with access to your data, not just the NSA, the ability to work out what you're up to even if your data is locked up and encrypted.

Unless you're a serious photography, video, or music collector, you may not know about metadata. It's "data about data" -- or, more properly in this context, it's data about content. When you look at a Web page, a photo, or an e-mail message, what you see is the human-readable content. Hiding underneath that picture of a kitten, the ITworld Web page, or a note from your mom, is all kinds of data about what you see.

With a digital photograph, there can be dozens of data fields. There are multiple formats for this data. The most popular are Exchangeable Image File Format (EIFF), International Press Telecommunications Council (IPTC), and Adobe's Extensible Metadata Platform (EMP).

A photograph's metadata can record the camera that was used to take it, and the date and time it was taken -- along with the location, if the camera has a GPS. If you edit your photograph, the metadata can also be used to record what software and operating system you used. And with the right software, or even a Website like exifdata, you can read any image's metadata.

Web pages are the same way. You probably know about cookies and your Web browser history, but there's far more data available out there about your Web interactions than you might think.

For example, when you user Twitter, a host of metadata about each of your tweets is preserved in JavaScript Object Notation (JSON). This data can, in turn, be used by others, including companies such as Gnip, which specializes in analyzing social-networking metadata for enterprises. How much data? There's the stuff that's obvious, such as your Twitter ID and the time and date you sent the tweet, but there's also additional metadata, such as your location and the program and device you used to send the tweet. So it is that Gnip and MapBox can create maps of smartphone users for any given location. Is that you in the upper right?

Welcome to Manhattan, which, unlike most places, still has some BlackBerry users contending with many iPhone tweeters and Android doing well in the suburbs. Is that you in the upper right corner?

If you think that's bad, consider all the information that the MIT Media Lab Immersion program can pull up about you from just the From:, To:, CC: and Timestamp fields of the messages in your Gmail account. Stunning isn't it? When you take a closer look at in the traffic analysis, you'll see it's actually far more revealing than it looks at a casual glance.

You don't need to be a traffic analysis expert to figure out who I interact with -- you just need four fields from my Gmail messages. That small square of blue dots at the lower left is ITworld.

Not worried? Think you can dodge around e-mail tracking with a few simple tricks? That's what former CIA director David Petraeus thought -- and he was wrong, wrong, wrong. Petraeus and his mistress Paula Broadwell used Gmail to communicate, but never actually sent messages to each other. Instead, they used anonymous email accounts to leave drafts of messages for the other to read. Safe? Anything but.

While they did avoid the common mistake of using their home Internet accounts, Broadwell, at least, logged into the various mail accounts from public hotel Wi-Fi networks. From there, it was simply a matter of collating guest lists from various hotels, IP login records, and, eventually, it appears, access to the actual drafts.

So it was that the head of the CIA itself was brought down by an FBI investigation of anonymous e-mails. Do you think you can do better? I doubt it.

1 2 3 Page 1
Page 1 of 3
ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon