That’s one big repository: Here’s how many lines of code Google has

Counting the rings on a tree

Luckily, Google doesn't have to count its lines of code by hand.

Credit: Sam Beebe CC BY 2.0

Google share stats on the volume of code it manages and, not surprisingly, it’s a whole heck of a lot


Software code, it seems, is all around us today. It’s in obvious places like your computer, tablet, and smartphone and, increasingly, in less obvious places, like your thermostat, refrigerator, and car. But exactly how much code, (as in, how many lines), is actually floating around out there? While it’s clearly impossible to ever answer that question, Google recently gave us a little sense of it, by providing insight into the sheer volume of source code that it uses to power all of its products and services.

Last week, Google engineering manager Rachel Potvin, speaking at the @Scale conference in San Jose, said that, as of last January, Google’s total code base was 2 billion lines of code. This mammoth collection of code, she explained, spans 9 million source files which take up 86 terabytes of storage. To manage it all, Google created its own home-grown version control system called Piper, to which the company’s 25,000 developers commit 15,000 changes per day.

No matter how you slice it, 2 billion lines of code is a lot. But how does that stack up to other companies or organizations that have been churning out code for years? Unfortunately, I’m not aware of Microsoft or Apple or other such companies sharing data on their total count of lines of code. However, to get a sense of scale (and, really, just for fun), we can compare the size of Google’s code library to the amount of code used for specific software applications over the years.

Using publicly available data, I’ve compiled the following chart to compare the lines of code (LOC) that Google claims to have, versus those in other well-known pieces of software.

Chart of lines of code in historic pieces of software

A couple of things jump out at me here. First, the size of Google’s code base really does dwarf all of these other applications, some of which are pretty substantial. Basically, Google’s total lines of code are more than an order of magnitude bigger than all of the code bases in the chart combined. In fact, the scale is so much greater that, in order to save you from getting carpal tunnel from having to scroll down to the bottom of the chart, I just lopped out a big section, the part between 90 million and 1.995 billion LOC. Just imagine that blue bar on the far right being about 23 times as tall as the bar to the left of it, the one representing the lines of code in OS X.

Also, clearly, some of the code referenced here is pretty old. For example, the OS X LOC is for version 10.4 (Tiger) which came out in 2005. One would imagine that it has even more than 86 million LOC these days. Likewise for Windows 10 versus Windows Server 2003 and its 50 million LOC.

Finally, it’s always fun to be reminded just how little code was used in the past for some pretty important applications. Like a mere 145,000 lines to run the guidance software on Apollo spaceflights or the 400,000 needed to run the space shuttle’s primary flight software. Even the Curiosity rover, which is still busy roaming the Martian surface, “only” needs 2.5 million lines of code.

Anyway, the point is, even though most of us never see it, there really is quite a bit of software code out there.

ITWorld DealPost: The best in tech deals and discounts.