Where in the world are GitHub users?

Using the GitHub Archive to see what cities commit the most code

By  


Visualizing the GitHub contributors by city

Image credit: Lane Aasen

The GitHub Archive is a large source of interesting data on software development activity and the developers behind it all. Lots of people (including myself) have mined it to learn all sorts of things about what programmers are up to. Since the archive includes some basic information on the developers committing code to GitHub, such as company and location, you can also use it to learn about developers themselves, at least to a certain extent.

Recently, developer Lane Aasen used the GitHub data in one of the more creative ways I’ve seen. He created a fun data visualization of the locations of GitHub contributors. I must say it is mesmerizingly fun to spin that globe around.

Aasen made his code available, of course, on GitHub, so you can grab it and play with it yourself. Using Google BigQuery to query a snapshot of GitHub data from 2012, he gets the top 1,000 locations, geocodes them using the Google Geocoding API and then plots them all using  WebGL Globe.

If you want to see the list from which the visualization is generated, you can run his query yourself in BigQuery:

SELECT actor_attributes.location,
COUNT (*) num_users
FROM publicdata:samples.github_nested
WHERE
 (actor_attributes.location IS NOT NULL) AND
 (actor_attributes.location != '')
GROUP BY actor_attributes.location
HAVING
 (COUNT(*) >= 1)
ORDER BY num_users DESC;

Using this, I pulled out the top 100 locations and just did some manual cleanup (e.g. “Seattle” and “Seattle, WA” are the same place). I also ignored whole countries as locations (as Aasen did) just to be consistent with his approach. Doing all that, here are the top 20 cities for GitHub contributors:

  1. San Francisco, CA

  2. London, UK

  3. Paris, France

  4. New York, NY

  5. Seattle, WA

  6. Tokyo, Japan

  7. Berlin, Germany

  8. Portland, OR

  9. Washington, DC

  10. Chicago, IL

  11. Boston, MA

  12. Beijing, China

  13. Los Angeles, CA

  14. Moscow, Russia

  15. Toronto, Canada

  16. Austin, TX

  17. Stockholm, Sweden

  18. Melbourne, Australia

  19. Yokohama, Japan

  20. Shanghai, China

It’s a pretty diverse list, spanning the globe. Well, spanning half the globe (the Northern Hemisphere) mostly. I don’t see any real shockers on there and can’t think of any obvious cities that I would have expected to see that didn’t crack the top 20.

Do you know of interesting or fun uses of the GitHub Archive data? Please share them here in the comments.

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Cloud ComputingWhite Papers & Webcasts

See more White Papers | Webcasts

Answers - Powered by ITworld

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question
randomness