New data released this week shows a new center in the open source developer's universe, as GitHub takes its place as the number one forge tool.
The data from Black Duck Software was released in conjunction with analyst firm Red Monk, which examined a subset of Black Duck's commit history database for projects tracked by Black Duck and presented their findings in a June 2 webinar entitled "Survival of the Forges."
The data that RedMonk pulled together was interesting, to say the least. According to RedMonk analyst (and noted Red Sox fan) Stephen O'Grady, the data was pulled from 2.1 million commits made in four big open source forges from January to May of this year: CodePlex, GitHub, Google Code, and SourceForge.
In the webinar, O'Grady highlighted the correlation between developer language used and forges used to host project commits.
[img_assist|nid=170935|title=Code Commits by Language and Forge|desc=Image courtesy RedMonk and Black Duck|link=none|align=|width=500|height=309]
Two things are readily apparent from this graph. First, it seems that C++, Java, and Python were the top three languages used for these commits in the first part of 2011. The second is the predominance of GitHub has the forge used most during the same period. When the RedMonk team broke down the same data by just number of commits by forge, that predominance is starkly clear:
[img_assist|nid=170939|title=Code Commits by Forge|desc=Image courtesy RedMonk and Black Duck|link=none|align=|width=520|height=322]
With this data, RedMonk and Black Duck have confirmed what the general feeling in the community has been for some time: SourceForge, the original software forge launched by VA Software, is no longer the center of the open source development universe.
In his blog on the topic, O'Grady elaborated on why he believes GitHub has risen to handle a majority of open source commits.
"This is, in our view, primarily attributable to the social coding approach advocated for and supported by GitHub," O'Grady wrote. "Based on the decentralized version control system Git, which makes branching and thereby forking sufficiently low overhead to incent the behavior, GitHub has changed the way that software is built in public, and attracted substantial attention as a result."
Which is, I think, analyst-speak for GitHub just works.
One other piece of reasoning that O'Grady highlighted from the data is that GitHub's success doesn't seem to be coming at the expense of SourceForge, but rather Google Code. There's a lot of really sharp analysis in O'Grady's article, and the reader is invited to check it out.
Something I noted in the graphs was how small CodePlex's share of the commits were (around 2 percent). O'Grady's written report did not go into this, but I have to wonder if this is a telling indictment of how much the open source community really mistrusts Microsoft-related projects. You can't blame youth of the project, either: CodePlex, launched in 2006, is two years older than GitHub.
It is also telling that Google Code is lagging back in third place, with less than half of second-place SourceForge's commits. Again, O'Grady suggests there's evidence that GitHub is attracting developers who might otherwise use Google Code, but it seems to me it's also evidence that Google may need to retool how Google Code interfaces with the developer community.
GitHub's majority status gives a lot of credence to the notion of distributed development, which really does fit better with an open source mind-set. But distributed is not just about a bunch of independent hackers scattered in their basements. Corporations are well and truly adopting distributed computing as a way of tapping into the best developer resources, no matter where in the world they are.