If you’re developer, no matter which programming languages you code in, you’ve no doubt run into - or created - ugly hacks. Whether its due to an impending deadline, a lack of knowledge, or simple laziness, sometimes you just have to code something in a way that you know isn’t the best way to do it with, of course, every intention to revisit and redo it properly later. But which programming language tends to lead programmers to create the most down, dirty, and ugly code fixes and implementations? Based on GitHub data, it turns out that C developers are creating the most ugly hacks or, at least, are those most willing to admit to it.
Try to make things a little more formal, I decided to then control for the number of repositories per language. To do that, I first queried the GitHub Archive using Google BigQuery to find the number of non-forked repositories created per language between January 1, 2013 (inclusive) and May 1, 2015, to try base it on relatively recent code (the query I ran is listed at the end of this post). I then reran a another search on GitHub, this time using the advanced search options to look for code files containing “ugly hack” from non-forked repositories created between 1/1/13 and 5/1/15 and calculated the average number of code files containing the string "ugly hack" per repository by language. Below is a chart of how that shook out.
Even when controlling for the number of repositories, C wins the ugly-hackathon by a landslide. C had almost three times as many mentions of ugly hacks per repository as the next language, PHP, and almost 50 times as many as Java, which ranked 12th on this list.
This approach has a couple of potential flaws. First, a code file may contain the string “ugly hack” if somebody had fixed or removed an ugly hack (e.g., "Fixed an ugly hack"), so we’re undoubtedly counting some files that say ugly hack but don’t actually have an ugly hack (anymore). Secondly, whether a code file has one or many ugly hacks, it only counts once using this measure. We could, then, be undercounting the actual number of ugly hacks that are out there.
No matter how you slice it, however, C seems to be generating more ugly hacks than any other programming language. Or, looking at it another way, C developers are the most honest about when they code an ugly hack. Either way, there’s no doubt that all of those ugly hacks will soon be fixed. Right?
Google BigQuery to pull counts of non-forked GitHub repositories by programming language created between 1/1/2013 (inclusive) and 5/1/2015:
SELECT repository_language, count(repository_language) AS repos_by_lang
WHERE repository_fork == "false"
AND type == "CreateEvent"
AND PARSE_UTC_USEC(repository_created_at) >= PARSE_UTC_USEC('2013-01-01 00:00:00')
AND PARSE_UTC_USEC(repository_created_at) < PARSE_UTC_USEC('2015-05-01 00:00:00')
GROUP BY repository_language
ORDER BY repos_by_lang DESC