Python developers are the most giving

GitHub Archive data reveals that Python repositories, on average, receive the most pull requests of any programming language

A sign on a wall that reads PullImage credit: flickr/Ged Carroll
Python developers do a lot of this

GitHub, you probably know, has quickly become the main (or, at least, the most well-known) repository for open source software. It currently hosts millions of code repositories for hundreds of programming languages supported by several million users.  The sheer volume of activity on GitHub reflects the growing popularity of open source software and the commitment of many people to working together to improve code across many different programming languages.

But I wondered recently if some programming languages tended to generate more open source contributions than others. Thanks to Google BigQuery, I was able to poke around raw GitHub Archive data myself to look into this further. Specifically, to try and quantify this, I looked at the average number of pull requests opened per GitHub repository by programming language. I thought that would be a good (but certainly not perfect) proxy for measuring the number of contributions (or attempted contributions) to a code base by someone other than the repository owner.

First, let me present the results.

Bar chart showing average number of pull requests per GitHub repository by programming languages. The results: Python .94, PHP .83, CoffeeScript .78, JavaScript .74, C++ .74, Ruby .68, C .65, Perl .64, Go .64, Java .58, C# .46, Shell .45, Objective-C .41, CSS .29, VimL .20Image credit: ITworld/Phil Johnson
Python generates the most pull requests, on average, per GitHub repository

Now, here's my methodology

  • As mentioned, I queried the GitHub Archive using Google BigQuery which, at the time, covered (roughly) GitHub activity from March 11, 2011 through March 14, 2014.

  • First, I queried the number of (non-forked) repositories per programming language using the following:

    SELECT repository_language, count(distinct repository_url) as cnt

    FROM [githubarchive:github.timeline]

    WHERE repository_fork == "false"

    group by repository_language;

    This gave me results for 150 programming languages covering over 4.1 million repositories (I ignored repositories with no programming language specified), for an average of 27,473 repositories per language.

  • I then queried the number of pull requests opened per programming language using the following:

    SELECT repository_language, count(*) as cnt

    FROM [githubarchive:github.timeline]

    WHERE repository_fork == "false"

    AND type="PullRequestEvent" and payload_action="opened"

    group by repository_language;

    Again ignoring repositories without a programming language specified, this gave me a total of just under 2.8 million pull requests across the 150 programming languages, for an average of 20,567 pull requests per language.

  • Overall, for the 4.1 million repositories with a programming language specified, the average number of pull requests opened per repository was .67.

From the results, then, we see that Python repositories are generating the most pull requests, on average, with .94 per repository. It's interesting that, using this measure, the Python community is the most giving, though the language currently only ranks as the eighth most popular programming language based on the latest TIOBE rankings. So, the Python community may be smaller than, say the Java community (#2 on the TIOBE list), but this suggests it's a tighter group.

After Python, we see that PHP (.83), CoffeeScript (.78), JavaScript (.74) and C++ (.74) also generate an above average number of code contributions. Of these languages, C++ had the highest TIOBE ranking (#4). The top 3 languages on the current TIOBE list all scored below average on the number of pull requests: C (.65 pull requests/repository, #1 on TIOBE), Java (.58, #2) and Objective-C (.41, #3).

Does this really mean that the Python community is more helpful than other language communities? Not necessarily, of course. Using the number of GitHub pull requests as a proxy for measuring outside (non-repository owner) contributions is far from perfect. Pull requests can come from outsiders forking and updating a repository, or from other project owners working on the same project but using a shared repository model. Maybe Python developers are more likely to use shared repositories and pull requests as development methodology.

Still, I think the results are interesting and suggest that, if you want to choose a programming language for a project that has a large, active and helpful community of developers behind it, you could do a lot worse than Python.

[SEE ALSO: Gitty up: 12 things other than programming code that are managed on GitHubThe most WTF-y programming languagesPython squeezes out JavaScript, C as best starter programming language]

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon