What Black Duck Can Tell Us About GitHub, Language Fragmentation and More

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Survival of the Forges

View more presentations from sogrady

Two things have been self-evident to us at RedMonk for some time. First, that programming language and runtime adoption is becoming more diverse rather than less over time [coverage]. Second, that an increasing proportion of that deployed code is being hosted at GitHub [coverage]. Obvious as these conclusions may be to us, however, it is important that we test them at every opportunity, both to assure ourselves of their continued validity and to help build the case for parties less interested in developer trends and behaviors. Black Duck’s databases offer us one opportunity to do this.

In cooperation with Black Duck, then, we examined a subset of their commit history for a webinar this morning. Specifically, we evaluated the metrics from commits at four mainstream forges – CodePlex, GitHub, Google Code, and Sourceforge. From January through May, 2.1M commits were made. This dataset offers the opportunity for indirect insight on developer behaviors, as it constitutes a high volume record of developer activity over a multi-month period across multiple properties.

To examine the question of runtime, fragmentation, for example, we’d look at the proportion of commits per language across the dataset. If our hypothesis is correct, we’d expect to see limited variance between the different programming languages and programming language types, with no clearly dominant platform.

Which is, in fact, what we observe.

Total Commits by Language

The total year-to-date commits recorded by Black Duck are not evenly distributed, but with the exception of C# and Perl, are roughly comparable. When we examine the same data with the additional variable of target forge, we see similar diversity on display.

Language Commits by Forge

There are wider skews within the forge data and definite patterns, but the fragmentation of runtimes is nevertheless apparent for both language and repository.

As important as the languages developers are choosing is how they leverage them, which in turn is a function of where they host it. Our belief, as articulated above, is that GitHub is emerging as a massive center of gravity. This is, in our view, primarily attributable to the social coding approach advocated for and supported by GitHub. Based on the decentralized version control system Git, which makes branching and thereby forking sufficiently low overhead to incent the behavior [coverage], GitHub has changed the way that software is built in public, and attracted substantial attention as a result.

What we would expect to see from Black Duck’s data, then, would be a majority share of commits deployed to GitHub. Which, again, is what the data suggests, with GitHub’s share at 54.5%.

Commits by Forge

More interesting, however, is a consideration of the relative volume of commits against the backdrop of forge age.

Commits by Forge, Sorted by Age

GitHub’s substantial lead in commit volume is impressive in light of both its age and the nature of the competition. GitHub is by two years the youngest market player, and has overtaken competitive platforms built by both Google and Microsoft, as well as the namesake of the forge term itself.

What’s interesting is that the data indicates that traction behind GitHub appears to have come largely at the expense of Google Code. Sourceforge, while now a distant second in commit volume, remains solidly ahead of both CodePlex and Google Code. One explanation for this might be that GitHub is more likely to siphon off Google Code-type developers, as opposed to those that might turn to Sourceforge.

Digging deeper, we observe that GitHub is dominant with dynamic language commits.

chart_2 (3)

Sourceforge, however, maintains a narrow edge with statically typed language assets.

Statically Typed Language Commits by Forge

In spite of this, both the first and second place repositories by commits exhibit substantial diversity amongst their overall commit volume.

Here is Sourceforge’s commit volume, placed by language on a stacked bar chart.

Sourceforge Commits by Language

Its relative strengths in statically typed code – primarily C, C++ and Java – is evident. But so too is the relative spectrum of assets housed at Sourceforge.

GitHub, for its part, is observably strong in dynamic language adoption, including JavaScript, Python and Ruby. It also, however, indicates substantial volumes of commits in C, C++ and Java.

GitHub Commits by Language

For the curious, here are the most popular languages by commit volume for each forge surveyed.

Popular Languages by Forge

With the evidence suggesting that our assertions regarding runtime fragmentation and the importance of GitHub are correct, the logical question is what, in practical terms, this means.

First, heterogeneity is the new norm. Enterprises typically advantage simple environments with fewer approved languages and runtimes to manage; the data indicates that instead the developer preference for multiple languages is accelerating with nearly ten languages showing substantial commit volumes. Enterprises can fight this tide, or embrace it. The outcome is not likely to differ substantially in either case, as the boundaries to technology procurement continue to erode.

Likewise, it is clear that GitHub should be core to your developer relations strategy. This is clear enough that large vendors such as VMware have begun leveraging GitHub as their primary external repository [coverage], both for the visibility it affords and the attendant developmental benefits. But that kind of usage is obvious; less so is the growing trend of using GitHub as a de facto development resume [example]. Rather than attempting to indirectly evaluate their coding ability via on site artificial problem sets, employers are increasingly evaluating their work product itself, publicly available on GitHub. Be creative: there are many ways to leverage GitHub. Algorithmic recruitment is but one example.

At a high level, all of the above is further confirmation of our belief that developers are the new kingmakers [coverage]. There’s a reason that Linux, Apache, MySQL, dynamic languages and now GitHub have all become volume success stories. Those that understand that reason will enjoy a competitive advantage over those that do not.

Credit: All source data for the above graphics is courtesy Black Duck.

Disclosure: Black Duck, CodePlex and GitHub are RedMonk clients. Google and Geeknet (Sourceforge) are not.


  1. […] is inmiddels de meest gebruikte softwarehostingdienst, becijferde RedMonk. Tussen januari en mei van dit jaar werden op GitHub 1,2 miljoen bijdragen geplaatst, tegen […]

  2. Where’s Launchpad? It has to be bigger than Codeplex…

  3. @RJ Ryan: great question. we were operating off of the dataset that Black Duck had available, but i’ll check with them and see if they have Launchpad data as well.

  4. This post conflates interpreted/compiled languages and statically and dynamically typed languages. It looks like you really want the former.

    e.g. Perl is statically typed but interpreted.

  5. @Kenny Root: that’s absolutely fair. the distinction is probably better argued as compiled vs interpreted, given the nature of the exceptions such as PERL. thanks.

  6. […] the Redmonk analysis is quite interesting, I haven’t seen the actual data behind the BlackDuck study, nor have I […]

  7. […] is inmiddels de meest gebruikte softwarehostingdienst, becijferde RedMonk. Tussen januari en mei van dit jaar werden op GitHub 1,2 miljoen bijdragen geplaatst, tegen […]

  8. […] What Black Duck Can Tell Us About GitHub, Language Fragmentation and More […]

  9. […] “What Black Duck Can Tell Us About GitHub, Language Fragmentation and More” by Stephen O’Grady (2011) […]

  10. […] try to have some fun with statistics. From a recent presentation by Stephen O’Grady from Redmonk, Github’s growth is almost […]

Leave a Reply

Your email address will not be published. Required fields are marked *