No, doesn’t require half a billion lines of code

New data confirms that a previous claim about the size of the Obamacare portal’s code base was way off the mark

Screenshot of HealthCare/.gov websiteImage credit: REUTERS/Mike Segar
Does this really require 500 million lines of code?

Whether you’re a fan of Obamacare or against it, there’s no denying that it’s a topic that gets people worked up. Even the technical aspects behind the implementation of it can generate a lot of discussion. Take, for instance, the claim made to the New York Times last year by an anonymous specialist who reportedly worked on, the federal government's online marketplace for health care under the Affordable Care Act, that the website consisted of 500 million lines of code.

This claim immediately caught the attention - and derision - of software developers who felt that it was not only unrealistic, but flat out impossible. Sure, the site is complex, but not that complex and, besides, it just wouldn’t be possible to generate that much code in the amount of time it took to build the site.

To put that 500 million number into perspective, two years ago I did a little research into the number of lines of code behind some well-known software over the years. At the low end, the guidance system for the Apollo 11 spacecraft used 145,000 lines of code. The Mars Curiosity rover uses 2.5 million lines of code. At the upper end, there’s Mac OS X Tiger (version 10.4) which had 86 million lines of code.

500 million lines of code for a transactional website - more than five times as much code as that behind OS X - just didn’t pass the sniff test. But just how many lines of code does it take to generate

This question came up on Reddit again last week and it appears that we may now have answer. One commenter who goes by the handle agenaille and who claimed to have worked on as part of the post launch clean-up crew at the end of 2013, provided counts of the lines of code behind, broken down by programming/markup language. There’s no way to know if this person is telling the truth, but the Reddit community certainly seems to believe him or her; one redditor awarded agenaille Reddit gold for the post. For the sake of argument (and fun), let’s assume the numbers provided by this person are correct.

Here’s the breakdown:

You see that the total lines of code count provided by agenaille is 3.7 million, nowhere near 500 million. Agenaille notes that this doesn’t include code used for administrative tools related to the site. In the end, s/he guesstimates the total lines of code behind to be somewhere between 5 and 15 million. Again, way less than half a billion, in any case.

I took the numbers and generated the following chart to demonstrate the percentage breakdown of lines of code by language behind

Pie chart showing the percent of total lines of code behind by programming language (excluding blank lines and code comments). Java is 64%, HTML 14%, JavaScript 9%, XSD 4%, XML 4%, CSS 3% and other 2%.Image credit: ITworld/Phil Johnson; Data source: Reddit/agenaille

As you can see, two-thirds (64%) of the code behind is Java. Another 14% is HTML markup, followed by JavaScript (9%), XSD (4%), XML (4%) and CSS (3%).

These data, for what they’re worth, support the belief that doesn’t, as many people thought, require anywhere near 500 million lines of code. Still, nearly 4 million lines of code seems like quite a bit for this kind of thing.That’s still 8 times as much code as was required for the space shuttle’s primary flight software. So, apparently, running is harder than rocket science. Who knew?

Read more of Phil Johnson's #Tech blog and follow the latest IT news at ITworld. Follow Phil on Twitter at @itwphiljohnson. For the latest IT news, analysis and how-tos, follow ITworld on Twitter and Facebook.

ITWorld DealPost: The best in tech deals and discounts.
Shop Tech Products at Amazon