Donnie Berkholz's Story of Data

The breakout of Ansible, and the state of config-management communities

By dberkholz | April 2, 2015

TL;DR:

Chef is dev-biased, Puppet is ops-biased
Ansible is growing like crazy
CFEngine activity is minimal
But … Docker Docker Docker

In February, I gave a talk at cfgmgmtcamp on trends in configuration-management communities. I wanted to post the data and provide a bit more context than I did on Slideshare.

My goal was to examine a variety of community metrics across configuration-management frameworks to provide an update on the work that Steve did back in 2013.

For starters, here’s a look at the development communities for the core software. While this ignores third-party modules, it does say a lot about the amount of change to the core codebases:

It’s worth noting that in Salt, everything is done via pull requests, even from existing developers, so that number is a bit inflated. However, there’s a pretty clear correlation between age of the framework and activity in the core. CFEngine released 1.0 in 1993 and it’s fairly slow today; Puppet and Chef date to the mid-’00s and they’re in the middle; while Salt and Ansible are just a few years old and remain quite active in the core.

But it’s hard to get a feel for trends without plotting this over time, so I did:

Please note that the scales are different for Salt due in part to the inflated PR numbers. Again the numbers are not terribly surprising, with a shrinking CFEngine community, Puppet and Chef holding relative static, and Salt and Ansible growing at rates. However, Ansible has grown to around ~200 forks a month while Salt grew to around ~100/month. This indicates a significant difference in activity across the two that’s also largely supported by stars and PRs.

However, core development is not necessarily reflective of the entire community, so the next data source I examined was mailing-list activity on the development list:

In keeping with the other data, over the course of 2014 CFEngine lagged behind while Ansible charged ahead, with the others largely holding steady in the middle. There is a potential downward trend with Puppet to keep an eye on, although it’s unclear whether that will remain the case given the amount of noise in this data.

The next data source I looked at was the IRC community. This is the first source that’s suggestive of anecdotal sayings that Puppet is for ops and Chef is for developers, as IRC tends to be a more old-school chat tool. It’s otherwise broadly in line with the others:

In contrast, for a developer-leaning audience I took a look at Hacker News. This is has potential artifacts for Salt (due to salted password hashing) but that doesn’t appear to be a major issue. While the reason downward trend in many frameworks over the past couple of years is unclear, what’s absolutely clear is the growth in Ansible activity and the relative dearth of CFEngine conversation. In addition, Chef has a slight advantage over Puppet in this developer-heavy audience.

Finally, I did a comparison across Stack Overflow (a developer discussion forum) and Server Fault (an ops discussion forum), both of which are hosted on Stack Exchange. Intriguingly, the long-term trend showed that development-related discussion tends toward Chef while ops-related discussion tends toward Puppet, again supporting that differentiation.

However, it’s worth setting some broader context. Let’s compare all of this to Docker:

All this debate about configuration management may be dwarfed in the bigger picture by a move toward containers rather than configuration management. While the future of broader adoption is unclear, the dominant interest in containers among many leading-edge communities is inarguable.

Disclosures: Chef and AnsibleWorks are clients. Puppet has been. Docker, CFEngine, and SaltStack are not.

15 comments

Jonathan Davila says:

April 2, 2015 at 11:13 am

I’d be curious to know what specific repository(ies) you took into account for each product for the GitHub numbers.

Reply
1. Donnie Berkholz says:
  
  April 2, 2015 at 11:17 am
  
  Just the sole core repository for each of them: cfengine/core, puppetlabs/puppet, chef/chef, saltstack/salt, ansible/ansible, docker/docker. Due to architectural differences no perfect comparison is possible, but that’s why we use more than one metric to assess trends and state.
  
  Reply
Who’s Contributing to Configuration Management Projects? – tecosystems says:

April 2, 2015 at 4:21 pm

[…] my colleague having updated some of the basic community metrics from Hacker News and Stack Overflow that we track on the […]

Reply
Noah Gibbs says:

April 3, 2015 at 9:42 am

I keep hearing this talk about how containers are going to replace configuration management. Do we seriously think that Dockerfiles are going to just be written as bash scripts forever?

(Sorry, bit of a pet peeve.)

Reply
1. Donnie Berkholz says:
  
  April 3, 2015 at 9:46 am
  
  While I personally find it abhorrent and flaky, people seem to be doing it. =
  
  Reply
2. Greg DeKoenigsberg says:
  
  April 3, 2015 at 10:26 am
  
  Funny you should mention — wrote a blog post on this yesterday. Many are happy with Dockerfiles, especially for the simplest containers that aren’t part of a larger workflow — but many aren’t.
  
  There are clear benefits to using CM to build containers, getting the best of both worlds to an extent. At Ansible, we are seeing this *a lot*.
  
  http://www.ansible.com/blog/ansible-and-containers-why-and-how
  
  Reply
  1. Noah Gibbs says:
    
    April 3, 2015 at 11:28 am
    
    That makes a lot of sense.
    
    One thing you might see eventually is Dockerfiles (and similar) as back ends for config management systems. I don’t know of anybody doing it yet, but it seems like a clear next step.
    
    I don’t know the Ansible code at all, but in Chef you’d handle it a bit like how Windows and Unix get handled. Of course, that’s much easier said than done, so you may see a new config management system growing around it, just to avoid having to refactor Chef/Ansible/Puppet/Salt that much.
    
    It’s gonna be a *big* change to any existing config system to output Dockerfiles — you can’t just check if a file exists, for instance, you also have to see if you expected a previous command or resource to create it.
    
    The advantage would be pretty huge, though — you wouldn’t need to run the config management system in the container at all, not even via SSH. Nothing required to be installed in the container, no overhead, no leftover files or continuing runs, no SSHD or client necessary…
    
    Dunno. Maybe.
    
    Glad to see you guys are thinking about this!
    
    Reply
    1. Greg DeKoenigsberg says:
      
      April 3, 2015 at 11:32 am
      
      Pretty much everyone in the CM space has to think about it, right? 🙂
      
      Honestly, though, it’s less about us thinking about it, and more about our users thinking about it and driving the adoption patterns.
      
      Reply
      1. Noah Gibbs says:
        
        April 3, 2015 at 11:34 am
        
        Yeah. I worry a little about that, because there’s a really obvious way to hack it (treat Docker like a VM) that seems like a bad pattern :-/
      2. Greg DeKoenigsberg says:
        
        April 3, 2015 at 11:38 am
        
        Is it a bad pattern? Maybe, maybe not. Solomon himself says that there are a lot of ways to use containers, and there’s no need to be prescriptive. Maybe the ideal way is thin containers and microservices, but there’s nothing inherently wrong with thick containers that are more VM-like; they’re just different trade-offs. And if thick VM-like containers are just an onramp for users as they learn the ins and outs of containerization, that’s fine too.
      3. Noah Gibbs says:
        
        April 4, 2015 at 7:27 pm
        
        Sure. But thick, VM-like containers are an easy hack from our existing tools. That alone guarantees that they’ll be overused. The primary (paying) customers of things like config management tools are big Enterprises, who generally want/like big honking hard-to-use tools — see CfEngine, Puppet, Chef for obvious examples, but Ansible and Salt have some of the same bias.
        
        In other words, all the bad reasons are pushing hard for thick, high-resource-usage containers. Which means that just saying “oh, we’ll do that because it’s easy” guarantees a result where end users get bad results until throwing off the “yoke” of config management tools which are, in fact, doing what they’re doing for all the wrong reasons.
        
        It’s been hard for config management (which is a good, sane idea) to make this much progress. I’m not looking forward to the counter-revolution against it. But the existing tools companies seem to be working as hard as they can to ensure that the counter-revolution will be vicious, justified, and come soon.
Michael Hausenblas says:

April 3, 2015 at 12:42 pm

Awesome analysis, much appreciated Donnie! Allow me a tiny bit of nitpicking: it would really help if you’d always use the same colour code for a certain technology (sometimes changes here, making it harder than necessary to follow along).

BTW, in the intro you mentioned Docker prominently but besides the last figure I didn’t find much discussion about it. Related: how will the Datacenter Operating System change the overall setup? Mesosphere, CoreOS, etc.

KUTGW!

Cheers,
Michael

Reply
Florian Heigl says:

April 11, 2015 at 1:54 pm

The problem with ansible related discussions is that due to the kinda cobbled language body half of the discussions feel to be about “how can I say this”. Plus, yes, of course since it is easier there is a lot more influx to it in general.

For the record:
I use ansible in a number of places and in different scenarios and CFEngine elsewhere where goals and demands went a lot higher.

Regarding docker/vagrant provisiong scripts. Both max out at a level that is ridicous … low.
Kubernetes / NixOS stuff is more valid. Docker’s functionality is a joke you can only believe in if you don’t know shit about administration anyway.

Reply
Todd Gamble says:

July 9, 2015 at 5:07 pm

Any idea what event in November 2014 seemed to have correlation across most of the mailing lists (oddly negative on cfengine)?

Reply
1. Donnie Berkholz says:
  
  July 10, 2015 at 10:28 am
  
  One possible reason for the positive spikes is that it’s Thanksgiving, and a bunch of open-source types are college students, and they have extra free time that month. CFEngine, by contrast, is probably mostly used in existing long-term deployments whose employees have vacation then. All speculation of course.
  
  Reply