The GitHub Archive is a large source of interesting data on software development activity and the developers behind it all. Lots of people (including myself) have mined it to learn all sorts of things about what programmers are up to. Since the archive includes some basic information on the developers committing code to GitHub, such as company and location, you can also use it to learn about developers themselves, at least to a certain extent.
Recently, developer Lane Aasen used the GitHub data in one of the more creative ways I’ve seen. He created a fun data visualization of the locations of GitHub contributors. I must say it is mesmerizingly fun to spin that globe around.
Aasen made his code available, of course, on GitHub, so you can grab it and play with it yourself. Using Google BigQuery to query a snapshot of GitHub data from 2012, he gets the top 1,000 locations, geocodes them using the Google Geocoding API and then plots them all using WebGL Globe.
If you want to see the list from which the visualization is generated, you can run his query yourself in BigQuery:
COUNT (*) num_users
(actor_attributes.location IS NOT NULL) AND
(actor_attributes.location != '')
GROUP BY actor_attributes.location
(COUNT(*) >= 1)
ORDER BY num_users DESC;
Using this, I pulled out the top 100 locations and just did some manual cleanup (e.g. “Seattle” and “Seattle, WA” are the same place). I also ignored whole countries as locations (as Aasen did) just to be consistent with his approach. Doing all that, here are the top 20 cities for GitHub contributors:
San Francisco, CA
New York, NY
Los Angeles, CA
It’s a pretty diverse list, spanning the globe. Well, spanning half the globe (the Northern Hemisphere) mostly. I don’t see any real shockers on there and can’t think of any obvious cities that I would have expected to see that didn’t crack the top 20.
Do you know of interesting or fun uses of the GitHub Archive data? Please share them here in the comments.
Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.