GitHub trends show programming language fragmentation

A new study of GitHub activity over time reveals shifts in the choices developers are making among coding languages

GitHub’s increasing popularity, and the availability of its usage data, make it a good source for examining trends in software development. For example, I’ve used their data in the past to look at things like which languages seem to frustrate coders the most (spoiler: C++) and which languages developers give back the most to other open source projects (spoiler: Python).

defrag-290x218_0.jpgImage credit: flickr/teresatrimm (license)
Fragmentation: Not just for hard drives anymore

Donnie Berkholz, an analyst at Redmonk, recently delved deeply into historical GitHub activity data to examine trends in the usage of different programming languages over time. He looked at activity for repositories by their primary programming language from 2008 to 2013 for a dozen of the top languages: Ruby, JavaScript, Java, PHP, Python, C, C++, C#, Objective-C, Perl, Shell, and CSS. For each language, he gauged the usage by year using several different metrics: the percentage of new (non-forked) repositories created in that language, the percentage of new issues created for repositories using that language and the percentage of new GitHub users using that language as his or her primary language. 

He came up with a whole lot of findings (and lots of interesting charts). Here are the biggest take-aways that caught my attention:

Historically, only five languages have mattered on GitHub: JavaScript, Ruby, Java, PHP and Python

Over time, these five languages have dominated GitHub usage, no matter what metric you use. While their rankings relative to one another have shifted (Ruby dominated early, now JavaScript does), they remain the big players. Berkholz, however, notes that CSS has shown a strong uptick in the last two years.

JavaScript usage shows the greatest growth

JavaScript usage has shown strong and steady growth since 2008, particularly when based on the percentage of new repositories and issues created (though its choice as the primary language of new GitHub users has been declining). Berkholz attributes this partly to the rise of Node.js, but also to the growth of frameworks that use lots of JavaScript and possible misclassification of repositories’ primary languages (more on that below).

Java growth suggests that GitHub is making inroads in the enterprise

Java was the only one of the 12 languages Berkholz looked at which showed steady growth over time in the percentage of new GitHub users choosing it as a primary language. It also showed increases in the percentage of new repositories and issues created over time. Berkholz concludes that this “supports the assertion that GitHub is reaching the enterprise.”

Programming language use is fragmenting

Despite the growth in usage of Java and JavaScript, the GitHub market shares for the remaining 10 languages he considered has shown a steady decline over time. Since 2009, Berkholz found that the percentage of new repositories, new issues and new users claimed by languages other than these top 12 languages have all steadily increased. This suggests an increasing fragmentation of developers’ choices among programming languages. As Berkholz writes:

“The programming landscape today continues to fragment, and this GitHub data supports that trend over time....”

As Berkholz and several commenters pointed out, there are a number of potential problems with these data that may make one think twice before using them to extrapolate to the general software developer population. For one thing, GitHub tags repositories with a primary language based on the number of lines of code. This can lead to the misclassification of repositories like, for example, if a project uses a framework heavy with JavaScript libraries, which a developer may never touch, but which leads to the repository’s language to be incorrectly tagged as JavaScript.

Also, Berkholz notes that both Objective-C and C#, the languages used for iOS and Windows development, respectively, are “almost invisible” on GitHub, but clearly they are currently big players in the developer world.

Given all that, take these results with a grain of salt. I still think they are telling a real and interesting story. I encourage you to read Berkholz’s full analysis for yourself. He’s also made some of the raw data available for download, so you can do your own analysis.

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon