The most WTF-y programming languages

The GitHub Archive reveals which languages seem to baffle developers the most

By  


Some code just makes developers ask "WTF?"

Image credit: flickr/Daikrieg el Jevi

Ranking programming languages based on how much software developers dislike (or hate) them is, obviously, an inexact science. I took a stab at it last week, using votes and comments provided by programmers on popular discussion forums, to come up with a list of the 10 most hated programming languages:

10. Python
9. LabVIEW
8. JavaScript
7. Tcl
6. COBOL
5. C++
4. PHP
3. Java
2. Perl
1. Visual Basic

There’s more than one way to define or measure dislike, though. In the process of researching that piece, I found an interesting approach taken by developer Sammy Larbi a couple of years ago. He used GitHub API data to identify which languages had the most instances of the string “WTF” in their GitHub code repositories. In his case, Objective-C won (er, lost) out.

Larbi’s method, while a bit tongue-in-cheek, is still, I think, an interesting way to try and get at a measure of the pain a language can cause developers. As a follow-up to my list last week, I wanted to generate an updated version of Larbi’s list. Here now, using GitHub data from the last 21 months, are the top 20 GitHub programming languages ranked by the amount of confusion they seem to cause developers.


C++ seems to generate the most WTFs

Image credit: ITworld/Phil Johnson

Here’s the methodology:

  • Where Larbi screen-scraped GitHub search results, I queried the GitHub Archive using Google BigQuery.

  • I first generated a list of the top 20 languages in GitHub based on the number of repositories created since January 1, 2012 (there were about 4.5 million total repositories created during this time, including forks). Here’s the query I used (h/t to Adam Bard):

    SELECT repository_language, count(repository_language) AS repos_by_lang
    FROM [githubarchive:github.timeline]
    WHERE repository_fork == "false"
    AND type == "CreateEvent"
    AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2012-01-01 00:00:00')
    AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2013-09-23 00:00:00')
    GROUP BY repository_language
    ORDER BY repos_by_lang DESC
    LIMIT 100

  • Larbi’s list was based on searching through the actual code for instances of the string “WTF”. I, instead, looked for stand-alone instances of “WTF” in the comments attached to GitHub commits. The idea was to weed out cases where the string “WTF” was legitimately used in the code, as opposed to being inserted as a comment indicating confusion. Here was the query I used to count the total WTFs in commit comments by repository language pushed between 1/1/2012 and 9/23/2013 (there were over 50 million total commits pushed during this time):

    SELECT repository_language, count(*) AS wtf_cnt
    FROM [githubarchive:github.timeline]
    WHERE type == "PushEvent" AND
    REGEXP_MATCH(LOWER(payload_commit_msg), r'wtf[^a-zA-Z0-9]')
    AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2012-01-01 00:00:00')
    AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2013-09-23 00:00:00')
    GROUP BY repository_language
    ORDER BY wtf_cnt DESC
    LIMIT 100

  • Given these two query result sets, I calculated the average number of WTF commit comments per repository for each of the top 20 programming language, then sorted. Voila!

So, as you can see, using this measure C++ (the #5 most hated language on my list) wins (or loses) here, by a significant margin over the runner-up language, Lua. One of the common complaints about C++ is that it allows programmers to mix object-oriented and procedural code. Maybe that’s causing an inordinate amount of developer head-scratching?

Perl, which was the #2 most hated language and takes a regular beating from developers, was #19 in the WTF measure. Maybe developers who hate it are successfully avoiding it - or they’re so baffled by the zillion ways you can do things that they’re too beaten down to even write “WTF?”

Visual Basic, the #1 most hated language, doesn’t even make this list, since it wasn’t one of the top 20 most common languages in GitHub. In fact, it averaged only .0006 WTF commit comments per repository, the same rate as Perl. Again, maybe this just means developers are able to avoid it - or they’ve just given up fighting it.

While this kind of analysis is all just a bit of fun, it may also be revealing some truths like, for example, that C++ is causing developers to age prematurely.

What languages make you ask - and actually include in code comments or commits - “WTF?”

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Answers - Powered by ITworld

ITworld Answers helps you solve problems and share expertise. Ask a question or take a crack at answering the new questions below.

Join us:
Facebook

Twitter

Pinterest

Tumblr

LinkedIn

Google+

Ask a Question