The asymmetry of open source technologies’ ability to penetrate the enterprise datacenter is not difficult to understand. Besides questions of maturity, the fact is that different product categories carry different risk profiles, a major factor for enterprises more afraid of making the wrong choice than interested in the right one. Red Hat’s sustained growth, for example, indicates that operating systems are experiencing minimal friction in terms of adoption. Non-relational databases, on the other hand, while widely used aren’t trusted by corporate buyers yet in quite the same way, with Hadoop a notable exception.
One area that hasn’t endured the same level of skepticism is open source configuration management software. While there are many options, system adminstrators and developers are advantaging Chef and Puppet at the expense of competing solutions. But with that success comes an obvious question: which do I pick, and why?
Although discussions of the platforms’ relative technical merits can be interesting – the comments on this HN thread display the usual range of opinions on the subject – we’re typically more interested in usage patterns. Quality of implementation is an important consideration in technology selection, but history demonstrates adequately that technically inferior solutions can and often do outperform competitors. Because there is no single canonical source for usage, we instead examine a variety of proxy metrics, looking for patterns that indicate a broader narrative at work. Here’s a non-comprehensive run through some of the metrics that we regularly evaluate.
Update: Spoke with Opscode’s Jesse Robbins who wanted me to be aware that Chef’s under-representation on Debian is likely due in part to the fact that they run and manage their own repositories, but also because they recommend deployment via the RubyGems package manager over Debian’s apt-get. So consider yourselves caveated; the charts otherwise remain untouched.
Each running instance of the Linux distribution Debian has the ability to phone home telemetry of the packages installed on the system. Called Popularity Contest, this provides insight into what the relative adoption rates of various software packages are within the subset of the Debian community that has elected to self-report application information. This graph, then, reflects adoption of Chef relative to Puppet within the Debian community.
As you can see, Puppet substantially outperforms Chef in this context. Part of this is the fact that Puppet is by four years the older project, and thus has had additional years to build adoption numbers. But if we look more closely at the data, however, there are indications that adoption may also be a function of packaging issues. In mid-July of last year, Opscode (the company behind Chef) made updated packages available for Debian and related distributions. Almost immediately thereafter, according to Debian Popularity Contest data, adoption spiked on that platform.
More interestingly, the adoption of Chef enabled by the new packages may have led to a transient decline in reported Puppet adoption. If we examine a three month period of Puppet adoption beginning in July, the impact to the overall trajectory is apparent.
From a macro perspective, the data indicates Puppet still remains more broadly adopted within the Debian community than Chef. But Chef is growing, and the evidence does seem to confirm growth for one is negatively correlated with growth from the other.
As one of the largest and most important developer communities in existence, we track individual project performance on GitHub closely. With author backed repositories for both Chef and Puppet available, it’s possible to compare the performance of the two projects in basic fashion. GitHub gives a slight edge to Puppet in terms of total contributors; 121-118. Puppet also saw more pageviews on GitHub over the past 90 days, 30735 to 22361.
But in metrics relating to explicit interest in the project – specifically the numbers of forks and watchers per project – Chef outperformed Puppet.
The results from GitHub then, are relatively inconclusive. Implicit metrics like pageviews point one way, explitic metrics like forks another. There is no clear winner of this category.
On Hacker News, neither Chef nor Puppet are dominant from a discussion perspective. Mentions of one tend to closely track discussion of its counterpart, in fact, according the Hacker News search APIs.
The correlation is unsurprising except for the timeframe; Chef shows substantial traction in discussion on Hacker News well in advance of its 2009 availability. This suggests artifacts in the returned data, because mailing list traffic archives only date back to 2009.
One of the concepts shared by both Chef and Puppet is scripts that implement common patterns. In Chef, these are called cookbooks, for Puppet users, modules. While it’s impossible to effectively measure the number of scripts per project – their library size, so to speak – we can attempt to evaluate first how many each company hosts themselves. Second, we may attempt to imperfectly infer community size by comparing query returns for both projects on GitHub, as it is, not surprisingly, common practice to host cookbooks and modules on the site.
As you can see, Opscode currently hosts approximately thirty percent more scripts than does Puppet Labs, though to what extent either vendor is focusing their attention on these sites versus their efforts on GitHub is not entirely clear. Comparing general GitHub query results, meanwhile, we see that the lead has changed hands.
As measured by general GitHub queries, Puppet enjoys a slight (~10%) advantage in returns over Chef. It’s a rough metric because it includes all matching repos, but as a broad indicator of traction it has some utility.
Ultimately, the data we’ve looked at – aside from the Debian usage information – doesn’t prove the case for either platform. Puppet backers can take heart in its dominance on Debian as well as its GitHub pageview lead and repo traction. Chef advocates, on the other hand, may take comfort in the fact that a project that’s four year younger is outperforming more mature competition in some metrics. For our part, we have (full disclosure) worked with and admire both companies.
And while there is understandably friction between the two communities at times given the functional overlap, it is likely that there’s more than enough oxygen to support both projects indefinitely. Apart from the various community numbers discussed above, both can point to impressive customer rosters and partner bases. Given the opportunity size and scope, however, as well as the legitimate traction behind both projects, the ultimate leadership role for the category may not be who can create the best software but who can best leverage the data it generates. With software valuations in decline, data should increasingly be a product focus.
In the meantime, it will be interesting to watch these projects compete with each other moving forward.
Afterword: In case anyone’s curious, we did look at StackOverflow metrics as well, but the differences were slight enough that we omitted them from the above.