As enterprises commit to building microservices to enable greater software development velocity the market for infrastructure to manage and enable these microservices is beginning to mature.
HashiCorp is ready to be a platform supplier for these tools, building on its existing beachheads, most notably Consul. That was pretty much my hot take from HashiConf EU in Amsterdam last month. The event was held in Westergasfabriek, an old Amsterdam gasworks, which is appropriate given the focus on repurposing, refactoring and building on legacy infrastructure as well as new platforms. HashiCorp is pragmatic – it builds cool products that also work with uncool platforms. The Cloud Operating Model can apply whether you’re running VMs, containers, bare metal or mainframes. While everyone else out here is telling you to rehost on Kubernetes, HashiCorp remembers its roots: its earliest tool Vagrant was built specifically for managing virtual machines. While everyone else is still trying to work out whether hybrid is a thing, whether multi cloud is a thing, whether they should support containers or Lambdas, HashiCorp gets on with it, delivering products that span cloud and on prem, flattening networks with a Cloud Operating Model and effective, well thought out, service interfaces. Enter the multiverse.
Regarding the conference itself – the aforementioned venue was excellent. The catering was extremely high quality and the coffee service was the best I have ever had at a tech conference other than my own – thanks Bitter and Real. Arguably even the gorgeous coffee truck was a metaphor for refreshing your legacy environments – the 1971 Citroën HY Van has been retooled as an electric vehicle, which is kind of wonderful. The truck even has solar panels – this post tells you more, including details of their coffee machines and equipment. The folks at Bitter and Real are fantastic. Please look them out for your own events.
All of these details matter, in an industry crowded with tech events. Create lovely experiences, with plenty of breaks, and crisp story-telling, and you definitely stand out.
HashiCorp has an enviable ability to explain complex things in a straightforward way – what struck me most at the event was that HashiCorp yet again did a better job of explaining service mesh than another other vendor. Start with first principles, then use cases, then features.
Service mesh as architecture pattern
Service mesh introduces “sidecars” into service topologies, where the logic for monitoring and controlling communications between each microservice runs in a sidecar alongside the microservice itself. The sidecar manages traffic and provides consistency for observability, security and routing.
Services should use logical names for routing, rather than being hard-wired to network addresses. With modern applications chances are high you’re running multiple regions, and multiple VPCs. But Kubernetes needs an overlay network, and as soon as the topology becomes more complex, connecting to a VM or another K8s cluster, the developer potentially has to worry about complex, manual network configuration. It’s possible that two different Kubernetes clusters can create the same IP space, so you might have two pods, in 2 clusters, with identical IP addresses. Not good.
The more complex the service topology the harder the networking problems become – this is even more true in topologies that include microservices that don’t run on Kubernetes. That “multi” story again. That’s the role Consul fills. It automatically routes network traffic, including for example firewall rules, with end to end encryption, across multiple platforms.
HashiCorp co-founder Mitchell Hashimoto claimed in his keynote: “It makes the network seem flat from a developer perspective.”
Design and operating model
HashiCorp design’s philosophy is important here. The goal of Consul is to unify everything, so that it “works everywhere”, exposing features in a consistent way in any environment. It also needs to “integrate and feel natural”. Thus, for example, supporting Helm, the Kubernetes package manager.
As co-founder Armon Dadger explained in a blog post last year:
“Our fundamental belief is that technology will continue to march forward and innovate and evolve. And yet, workflows mostly stay the same. What I mean by that is we still have to provision our application — at some point we were provisioning a mainframe, then we were provisioning bare metal, then we were provisioning VMs, now we might be provisioning containers in the cloud. So, the specific thing that we are provisioning has changed, but the fact that we have to provision and manage the lifecycle hasn’t. Core workflow is fundamental.”
The Cloud Operating Model as defined by HashiCorp is a set of disciplines and workflows spanning 4 pillars each of which maps to an IT function – run (development), connect (networking), secure (security), and provision (operations) and run. HashiCorp tooling maps to this view of the world like so:
One strength of this portfolio is that each product maps to a particular buyer and budget. In the enterprise software business it is useful to have products that customers know how to buy and that salespeople know how to sell.
Terraform is very widely used, accounting for a non trivial amount of AWS infrastructure provisioning globally. Vault too, has a dedicated buyer in security, and has seen widespread deployment in customers doing distributed software deployment. HashiCorp now sees Consul as the next key area to tackle. Nomad, which is aimed at the most mature IT organisations is by nature a smaller market. Most enterprises organisations are still primarily running legacy processes and infrastructure.
Consul as beachhead
The Day One HashiConf keynote primarily focused on enhancements to Consul, and the Consul Connect service mesh. As Hashimoto explained routing must be represented in terms of logical, rather than physical, services. This is especially true with event-based architectures because, with serverless functions and Lambdas, parts of the service may effectively not even exist until the service is actually invoked. Distributed services are dynamic and ephemeral. Networks need to be logical constructs for naming, authorisation, and routing.
For use cases like canarying, where you send some traffic from version 1 of a service to version 2, before moving the rest of the traffic there.
“I don’t want my web server to have to know about all of that.“
Or indeed, your application developer – they should ideally be focusing on the app, not the network plumbing. Hashimoto said there are two ways to think about Consul, given it’s designed to solve the service networking challenge.
- As a traditional service discovery and management platform
- As a service mesh
The real news from HashiConf was that HashiCorp is using Envoy as a proxy or “mesh gateway” within a service mesh, managed and configured by Consul. This is another big win for Envoy, which is now the industry standard sidecar for service mesh architectures. Note that AWSCloud also adopted Envoy for AWS App Mesh.
Consul 1.6 offers full Layer 7 services, with HTTP routing, and traffic splitting. It has an Autojoin provider for 3rd party services to connect to Kubernetes, with Catalog sync to Consul. HashiCorp is also further integrating Consul and Nomad. For example Nomad 0.10, announced in the HashiConf EU 2019 keynote, introduces shared namespaces so that you don’t have to manually deploy sidecar proxies. If you’re deploying Redis for example, Nomad will automatically wire and connect the sidecar.
Conway’s Law for infrastructure
The next speaker was Paul Banks, Consul Engineering Lead at HashiCorp, who did an excellent job of further explaining service mesh in a great talk titled Multi-cloud Service Mesh Networking For Humans (I highly recommend you watch it). Banks started, as many good tech talks do, with Conway’s Law:
“Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations.”
Banks made the distinction between service mesh as architectural pattern and service mesh as feature set. Service mesh features include dynamic routing, service identity, resiliency patterns (circuit breakers, retries), and consistent observability. Banks said that in the multi everything world you’re going to have many organisations which have different tech teams working in different tech stacks, with different CI/CD, security approaches and so on. You want to decouple and separate concerns, and avoid duplication of effort. Say reliable communications – for example, retries, away from the different application teams.
“Service mesh is like Conway’s Law for infrastructure”
The architectural approach is to have dynamic traffic management, dynamic runtime configuration, observability, and a central API, and most importantly make it useful and useful enough that it becomes “the default path for delivering apps in your organisation”.
“Infrastructure teams can build things on top of the network, like sophisticated rollout mechanisms, automate incident responses, and things like that that can then be provided as a service to applications without affecting application code. You can’t expect teams to go through laborious canarying processes, unless you make it easy.
“Progressive delivery is a kind of umbrella term for a bunch of practices that have been around for quite a while. I would imagine the vast majority of this audience have done at least one of these before”.
I must admit I was pretty excited that Banks used Progressive Delivery as a way to build on his narrative. RedMonk has had scores of conversations with vendors about the Service Mesh architectural pattern over the last couple of years. The lack of clarity in most of them, combined with aggressive cheer-leading, led me to coin the term Progressive Delivery because I felt we needed a better language of use cases to explain the value of service mesh patterns and architectures. Organisations need to reduce the risk when they do rollouts. You need to have dynamic configuration in your infrastructure, you need consistent observability, and you need an API for management and configurations. Banks said organisations like Netflix and Linkedin were releasing papers about their practices that come under the umbrella of progressive delivery.
Another key use case for service mesh is security. Use mutual Transport Layer Security (TLS) instead of IP firewalls for example, to manage encryption centrally across all of your services.
Organisations have to continually improve the reliability of services, as apps become more complex. We need to establish and automate best practices on how to do things – around for example short time outs, limited retries, rate limiting, to prevent cascading failures. Service mesh can provide those features, for example setting policies for example that apps need rate limiting.
Finally Banks talked about observability.
“You need a holistic view so that you can understand, debug, and innovate. Service mesh can help because you get consistent metrics and tooling across that you can enable across all systems. You can enable tracing… …We’re seeing tools so you can automate canarying, and auto scaling, automate. Control systems need input and output, with consistent data. You need consistent data about who’s performing what action, and how its performing. A service mesh gives you that. You need to complete that feedback loop, with a centralised API and control plane”
Banks then demonstrated dynamic traffic routing with a hilarious demo of Canary deployment. The demo showed a web service routing 100% of its traffic to v1 environment, then moving some traffic to v2, with Grafana for visualisation. Banks wanted a visual, dynamic way to approach the demo, rather than curl and command line.
“I am very proud to announce that I finally found a use for the Touch Bar on my Mac.”
Progressive delivery by Touch Bar made for a great, funny engaging demo.
Thoughts on the landscape
It’s 2019 so apparently everyone has a service mesh. Istio has been the most hyped – it has solid corporate backing from IBM, Google, Pivotal and SAP. These companies now need to do a better job of nailing use cases. Usability is another area that needs work. Tetrate is startup focusing on making Istio easier to use, founded by some of its project principals. Aspen Mesh is another Istio distro.
Envoy is the default proxy in Istio, and it has its own momentum. It has the backing of Amazon Web Services and now HashiCorp.
Linkerd is another service mesh option, developed by Buoyant. The founding team of William Morgan and Oliver Gould have production experience on their side, having done as much as anyone to popularise the service mesh pattern based on their experiences running microservices at scale at Twitter. They have some impressive high scale customer names – including Chase, Comcast, Expedia, and Walmart. Also engineers like the product, which is helpful.
At a glance HashiCorp has some strong advantages in the service mesh space. It tells the best story on service mesh, which is also an evolutionary narrative, based on extending Consul networking and integrating with legacy environments, rather than rehosting everything on Kubernetes. HashiCorp identifies buyers and sell to them effectively. Consul sells to the networking organisational function, which is now being tasked with managing software defined networks, with service routing using a Layer 7 model.
Consul Connect is looking like a market maker.
disclosure: Aspen Mesh, AWS, Google, IBM, Pivotal, and Red Hat are all clients. This post is independent of any client relationships however.
Progressive Delivery at GitLab