tecosystems

DVCS and Git Usage in 2015

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

For many in the industry today, version control and decentralized version control are assumed to be synonomous. Slides covering the DevOps lifecycle, as but one example, may or may not call out Git specifically in the version control portion of the stack depiction, but when the slides are actually presented, that is in the overwhelming majority of cases what is meant. Git, to some degree, is treated as a de facto standard. Cloud platforms leverage Git as a deployment mechanism, and new collaboration tools built on services built on Git continue to emerge.

Are these assumptions well founded, however? Is Git the version control monster that it appears to be? To assess this, we check Open Hub’s (formerly Ohloh) dataset every year around this time to assess, at least amongst its sampled projects, the relative traction for the various version control systems. Built to index public repositories, it gives us insight into the respective usage at least within its broad dataset. In 2010 when we first examined its data, Open Hub was crawling some 238,000 projects, and Git managed just 11% of them. For this year’s snapshot, that number has swelled to over 683,000 – or close to 3X as many. And Git’s playing a much more significant role today than it did then.

Before we get into the findings, more details on the source and issues.

Source

The data in this chart was taken from snapshots of the Open Hub data exposed here.

Objections & Responses

  • Open Hub data cannot be considered representative of the wider distribution of version control systems“: This is true, and no claims are made here otherwise. While it necessarily omits enterprise adoption, however, it is believed here that Open Hub’s dataset is more likely to be predictive moving forward than a wider sample.
  • Many of the projects Open Hub surveys are dormant“: This is very likely true. But the size of the sample makes it interesting even if potentially limited in specific ways.
  • Open Hub’s sampling has evolved over the years, and now includes repositories and forges it did not previously“: Also true. It also, by definition, includes new projects over time. When we first examined the data, Open Hub surveyed less than 300,000 projects. Today it’s over 600,000. This is a natural evolution of the survey population, one that’s inclusive of evolving developer behaviors.

With those out of the way, let’s look at a few charts.


(click to embiggen)

If we group the various different version control systems by category – centralized or decentralized – this is the percent of share. Note that 2011 is an assumption because we don’t have hard data for that year, but even over the last four years a trend is apparent. Decentralized tooling has moved from less than one in three projects in 2012 (32%) to closer to one in two in 2015 (43%). That’s the good news for DVCS advocates. The bad news is that this rate has become stagnant in recent years. It was 43% in 2013, actually dipped slightly to 42% in 2014, and returned to 43%, as mentioned, this year.

On the one hand, this suggests that DVCS generally and Git specifically might have plateaued. But the more likely explanation is that this is an artifact of the Open Hub dataset, and our imperfect view of same. It is logical to assume that some portion – possibly a very large one – of the Open Hub surveyed projects are abandoned, and therefore not an accurate reflection of current usage. Many of those, purely as a function of their age, are likely to be centralized projects.

Nor did the Open Hub dataset add many projects in the past calendar year; by our count, it’s around 9671 total net new projects surveyed, or around 1% of the total. Which means that even if every new project indexed was housed in a Git repository, the overall needle wouldn’t move much.

Overall, however, if we compare the change in individual share of Open Hub projects from 2010 against 2015, these are the respective losses and gains.


(click to embiggen)

Git unsurprisingly is the big winner, CVS the equally unsurprising loser. Nor has any of the data collected suggested material gains for non-Git platforms. DVCS in general has gained considerably, and is now close to parity and Git is overwhelmingly the most popular choice in that segment.

What the specific rate of current adoption is versus the larger body of total projects will require another dataset, or more detailed access to this one. For those who may be curious, we did compare this year’s numbers against last years, but as the largest single change was Git’s gain of 0.75% share it didn’t offer much in the way of new information. Given that existing projects may change their repository, we can’t simply assume that Git captured 75% of the net new projects.

Our annual look at the Open Hub dataset, then, does support the contention that DVCS and Git are effectively mainstream options, but is insufficiently detailed to prove the hypothesis that Git has become a true juggernaut amongst current adoption – even if the anecdotal evidence concluded this a long time ago.

3 comments

  1. Why doesn’t the OpenHub data set extract the date of the most recent commit?

    That should be information easily available if they’re able to assess the repository type.

    1. Hi Danno;

      Team Lead of the Open Hub here. We do. If you see a project where that is not the case, please ping us at [email protected] so we can get the project back on track. Thanks so much!

  2. @Danno: we’re just operating off of what’s available at the link; I don’t have the underlying dataset to work with.

Leave a Reply

Your email address will not be published. Required fields are marked *