My friends at Weaveworks have started describing how they manage infrastructure as “GitOps” – which is to say, operations by pull request (PR). I have come across other organisations doing similar things. GitHub is now so central to how modern teams work, that it’s natural it would become a management hub for systems automation, serving as a both a system of record and a system of engagement for configuration as code platforms such as Ansible, Chef, and Puppet and or infrastructure as code like CloudFormation and Terraform.
In case you’re not already using Github and are not used to the vernacular, a pull request is used to informs others about changes you’ve pushed to a repository, generally once you’ve completed a feature of piece of code, as part of the approval process. But Developer teams have begun to use pull requests to automate infrastructure workflows. Using Git creates an audit trail of human readable systems changes – which Weaveworks describes as a “source of the truth.” A diff means not all changes have been propagated yet.
So what are the characteristics of GitOps as practiced by Weaveworks?
- Our provisioning of AWS resources and deployment of k8s is declarative
- Our entire system state is under version control and described in a single Git repository
- Operational changes are made by pull request (plus build & release pipelines)
- Diff tools detect any divergence and notify us via Slack alerts; and sync tools enable convergence
- Rollback and audit logs are also provided via Git
In tech we’re seeing a generational shift from mutable, imperative infrastructure to imperative, declarative infrastructures. That’s where GitOps makes sense. In that context Weaveworks uses Terraform to provision and maintain VMs, networking, security groups, ELBs, RDS instances, DynamoDB tables, S3 buckets, IAM roles, and Ansible for managing Kubernetes components including the Docker engine, the Kubelet, the Kube-proxy, the API Server, the Scheduler, the Controller Manager, Etcd. Weaveworks wrote a tool to parse the difference between desired and actual state in Kubernetes called kubediff.
In this post I have, perhaps unfairly, been using GitHub as shorthand for Git-based source code management. There are of course other platforms – notably Atlassian Bitbucket and GitLab worth a mention here. It should not surprise us that Atlassian’s Kubernetes team has also established a GitOps approach, although it came up with a different name – BDDA (pronounced ‘Buddha’) for “Build-Diff,Deploy-Apply”.
Atlassian’s design goals were
- Ensure that we can reprovision things as easily as possible (preferably reasonably quickly as well)
- Uses declarative configuration rather than imperative
- Allow us to use automated testing, with all relevant information stored in a central repo
In terms of declarative configuration, as Nick Young says:
“the reconciliation loop at the heart of Kubernetes is built around the idea of declaring the desired state of things and then waiting for the system to bring the world into line with the desired state.”
Version control with Git gives us an audit trail for Observe, Orient, Decide and Act loops in software development for declarative infrastructure. Which brings us to other aspects of DevOps approaches in developer-led teams. GitOps only makes sense in conjunction with other disciplines, notably around chat-based team collaboration (ChatOps) and deep Observability.
Charity Majors of Honeycomb has been doing excellent work recently distinguishing between “monitoring” and “observability”. Systems need to be engineered to to be understandable, explorable, and self-explanatory in the kind of world of automated, declarative world of continuous changes Docker and Kubernetes define.
ChatOps, using chat interfaces to interact with team mates and systems, is related. Observability is a team sport. Developers basically live in 3 places – their chosen text editor or IDE, Slack and Github. You need to be able to identify problems and issues, work together to identify the best fix, and then put it into production. Now we have a potential name for infrastructure diff part of the equation. I am not sure “GitOps” is perfect, but it’s catchy. Please let me know if you’re doing anything similar, and what you call it.
Full disclosure: Docker is a client. Weaveworks is in a coworking space I run. All opinions are my own.