The team at Ohloh worked with me to organize a data hackfest at OSCON 2012, and we pulled together a great dataset that included licensing data for all open-source projects in Ohloh that had any commits in the past year. After working with Ohloh data for my recent post on language expressiveness, I wanted to explore it in some different ways to see what else might emerge, and licensing seemed like one worth examining more deeply.
My colleague Steve has posted about permissive vs copyleft licensing a number of times, but we’ve never done quantitative research into licensing choice to prove the extent to which any shifts are happening, the time frames involved, and the potential variations within different programming-language communities.
Approach: Classification, history, and languages
Using the Ohloh data for 57,930 active projects as of July 2012, I classified the top 30 open-source licenses into one of three categories: permissive (e.g. BSD, Apache), limited (e.g. LGPL, MPL, EPL), or copyleft (e.g. GPL, AGPL). This three-category classification accounts for 90+% of all projects with specified licenses, which means it should be representative. The total number of classified projects was 17,549, because a vast number of projects either have no license or Ohloh was unable to detect it. Limited licensing is quite rare, hovering around 2%–3% of projects with licenses, so for the purposes of this post, we will focus on permissive and copyleft licensing.
To attempt to identify historical shifts, I separated projects into buckets based on the date of their first commit. Since license changes between permissive and copyleft are quite rare, this should be a reasonable approach to examining trends over time.
Since I hypothesized that programming language might also play a role, I further split each year’s bucket by language. Here, I’m going to focus on the 11 most popular languages according to our rankings, as well as the total across all languages regardless of popularity. Any data points with 5 or fewer projects between permissive and copyleft are not shown, to remove noise.
Results: A clear trend toward permissiveness
I’m showing the data as a ratio between permissive and copyleft licensing to account for changes in absolute numbers of projects over time. Any number above 1 indicates a bias toward permissive licensing, while any number below one indicates a bias toward copyleft.
Remarkably, every single language shows an upward trend, starting either in favor of copyleft or near equilibrium and shifting upward in a more permissive direction. The overall total, shown as a thick black line, further supports and clarifies this trend since the individual languages can be rather noisy.
Two languages of particular note are the two extremes: Ruby on the permissive side and Perl on the copyleft side. While most languages cluster relatively tightly, Ruby rises far above them with a very clear and strengthening shift toward permissive licensing — 2x in favor of permissive in 2010, 6x in 2011, and 11x in 2012. At the other extreme, Perl shows a roughly 2x–3x bias in favor of copyleft, which is distinctly below the nearest neighbor, C++, but not nearly as large of a divergence from the primary cluster as Ruby shows.
The shift toward permissive open-source licensing is dramatic over the past decade. Since 2010, this trend has reached a point where permissive is more likely than copyleft for a new open-source project. Although there are language-specific effects, especially in the case of Ruby, the overall movement is clear. Outside the extremes, new projects in even the most copyleft-biased language (C++) in 2012 were given copyleft licenses less than 60% of the time.
Disclosure: Black Duck Software (which owns Ohloh) is a client.