Where in the world are GitHub users?

Using the GitHub Archive to see what cities commit the most code

github_locations-600x450_0.jpgImage credit: Lane Aasen
Visualizing the GitHub contributors by city

The GitHub Archive is a large source of interesting data on software development activity and the developers behind it all. Lots of people (including myself) have mined it to learn all sorts of things about what programmers are up to. Since the archive includes some basic information on the developers committing code to GitHub, such as company and location, you can also use it to learn about developers themselves, at least to a certain extent.

Recently, developer Lane Aasen used the GitHub data in one of the more creative ways I’ve seen. He created a fun data visualization of the locations of GitHub contributors. I must say it is mesmerizingly fun to spin that globe around.

Aasen made his code available, of course, on GitHub, so you can grab it and play with it yourself. Using Google BigQuery to query a snapshot of GitHub data from 2012, he gets the top 1,000 locations, geocodes them using the Google Geocoding API and then plots them all using  WebGL Globe.

If you want to see the list from which the visualization is generated, you can run his query yourself in BigQuery:

SELECT actor_attributes.location,

COUNT (*) num_users

FROM publicdata:samples.github_nested


 (actor_attributes.location IS NOT NULL) AND

 (actor_attributes.location != '')

GROUP BY actor_attributes.location


 (COUNT(*) >= 1)

ORDER BY num_users DESC;

Using this, I pulled out the top 100 locations and just did some manual cleanup (e.g. “Seattle” and “Seattle, WA” are the same place). I also ignored whole countries as locations (as Aasen did) just to be consistent with his approach. Doing all that, here are the top 20 cities for GitHub contributors:

  1. San Francisco, CA

  2. London, UK

  3. Paris, France

  4. New York, NY

  5. Seattle, WA

  6. Tokyo, Japan

  7. Berlin, Germany

  8. Portland, OR

  9. Washington, DC

  10. Chicago, IL

  11. Boston, MA

  12. Beijing, China

  13. Los Angeles, CA

  14. Moscow, Russia

  15. Toronto, Canada

  16. Austin, TX

  17. Stockholm, Sweden

  18. Melbourne, Australia

  19. Yokohama, Japan

  20. Shanghai, China

It’s a pretty diverse list, spanning the globe. Well, spanning half the globe (the Northern Hemisphere) mostly. I don’t see any real shockers on there and can’t think of any obvious cities that I would have expected to see that didn’t crack the top 20.

Do you know of interesting or fun uses of the GitHub Archive data? Please share them here in the comments.

