May 14, 2013, 6:00 AM —
There are a number of Twitter statistics or facts that most of us probably already know. For instance, that it was founded in 2006, that tweets are 140 characters or less and that Justin Bieber has more Twitter followers than anyone else (although a good portion are fake accounts). But the sheer volume of data that Twitter generates every day means that there are lots of other interesting statistics and findings waiting to be found by those with the time and inclination to mine their data.
One group of individuals who have done just that are researchers from the University of Illinois who recently dug into a very large slice of Twitter data. They obtained a month-long sample from the Twitter Decahose (a real time, random selection of 10 percent of all global tweets) which covered 1.5 billion tweets from 71 million unique Twitter users late in 2012. The primary focus of their research was to study geographic trends of the Twitter universe, by looking at tweets and users with associated geolocation data. They came up a with a long list of interesting findings, not all of them geographic in nature.
Here are some things they found which you may not know (I didn’t):
A small number of users drive most Twitter traffic
The top 1% of Twitter users account for 20% of all Tweets, while the top 5% of users account for almost half of all Tweets. On the other hand, half of users tweet very infrequently (four times or fewer during the sample period, October 23 through November 30, 2012).
Half of all tweets mention another Twitter user
One half of those tweets, or one quarter of all tweets, are retweets.
16% of tweets contain links to external web sites
Twitter.com is the most popular domain that gets linked to (17% of tweets with links), followed by Instagram.com (13%), Facebook.com (12%) and YouTube.com (6%).
More tweets come from Jakarta than any other city in the world
3% of the data sample (46 million tweets) had location information (either exact GPS data or less exact location information specified by the user). When ranking cities based on the number of georeferenced tweets, Jakarta came out on top, followed by New York City, São Paulo, Kuala Lumpur and Paris. Five other U.S. cities were in the top 20: Chicago (9), Los Angeles (11), Houston (13), Philadelphia (15) and Dallas (16).
New York City users are the most retweeted of any city in the world
Rounding out the top five cities whose users get retweeted the most are São Paulo, London, Paris and Kuala Lumpur. Since the number of times a city’s tweets get retweeted is highly positively correlated with the city’s total number of tweets, this list is similar to the list of cities that tweet the most often. However, in this list Jakarta drops to number nine.
The average distance between two users mentioning each other in tweets is 744 miles
Not surprisingly, people don’t confine themselves to interacting on Twitter with other users physically close to them. This average distance is greater than the distance from San Francisco to San Diego and the entire length of France.
There are a lot more detailed findings in the study, which is pretty fascinating. I encourage you Twitter and big data fans to take a look. Of course, if you like it, go ahead and tweet about it.
Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.