The 31 things inside everything you tweet

A keen look into the huge data pile Twitter is building out of your breakfast photos and jokes

Credit: Image via michielsikkes/Flickr

With Twitter making its initial public stock offering today, you will more than likely hear some commentary and jokes about how something so ephemeral, so unserious, so simple could be worth billions of dollars and capture the world's attention.

"Simple" is not the best way to describe Twitter, when you consider that every tweet sent out, however many characters, carries along with it 31 different packets of data. Paul Ford's cover story for Bloomberg Businessweek on Twitter's "Hidden Technology" does a lot of heavy lifting in just a few remarkably graceful movements. That's how a writer might describe it, anyways.

Here's how Ford scales the huge data collection Twitter is building with all your complaints about coffee service and television series plotlines:

You know how the National Security Agency collects “metadata” about the phone calls Americans make? Well, that’s what these fields are, except instead of metadata about phone calls, this is metadata about tweets. In fact, those 140 characters are less than 10 percent of all the data you’ll find in a tweet object. Twitter’s metadata is publicly documented by the company, open for perusal by all and available to anyone who wants to sign up for an API key.

Much of that metadata is common stuff: the number of retweets and "favorites," the app or site that sent the tweet, and so on. But some stuff is quite tricky to consider, especially on the scale of millions of messages every day.

Another tweet field, “withheld_copyright,” if set to “true,” lets you know that a tweet is in trouble—that its content has raised flags and hackles over copyright. The text of the tweet, in that case, may be suppressed. The “withheld_in_countries” field can provide a list of the nations in which the tweet is banned. Another field has a telling name: “possibly_sensitive.” It’s set to either true or false. The field indicates whether a tweet links to potentially offensive things such as “nudity, violence, or medical procedures.” (If ever you wanted a snapshot of our world’s anxieties in three terms, there you have it.)

Twitter is building one of the most interesting "Big Data" sets around, one populated almost entirely by people willingly typing into small boxes whenever they feel the urge. I won't write down here that Twitter is not perhaps overvalued or strangely fixating among web-focused firms. But after thinking about the kinds of connections Twitter is parsing and storing, I can no longer use the word "simple" in its proximity. Read the whole piece.

Disclaimer: Paul Ford once edited a piece I wrote for a magazine, and is an occasional email correspondent and friend of a friend. He is also quite fun to follow on Twitter, in his own way.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon