The industry has centered around Kubernetes as the abstraction layer that promises portability. Helm charts are great for packaging, but are they enough? Senior RedMonk Analyst Rachel Stephens talks with Replicated co-founder and CTO Marc Campbell about the challenges of shipping software into heterogeneous environments.
Learn more about Replicated at https://www.replicated.com/
This was a RedMonk video, sponsored by Replicated.
Rather listen to this conversation as a podcast?
Transcript
Rachel Stephens: Hi everyone. Welcome to RedMonk Conversations. I’m Rachel Stephens. I’m a senior analyst with RedMonk and today I’m really excited because we are here to talk about what it means to deliver software to people. And I think a lot of the times in this day and age we think about software living in the cloud — and a lot of the software does — but a ton of software still lives in VPCs, on prem, in even air gapped environments. And it’s a challenge to get software to these people sometimes. And our guest today is Marc Campbell, who I think is uniquely situated to actually talk about some of these challenges. Marc, would you please tell us a little bit about yourself, about the company you work for — Replicated — and the problem that you all are aiming to solve?
Marc Campbell: Sure. Yeah, happy to. So my name is Marc Campbell. I’m the co-founder and CTO of a company called Replicated. We’ve been around for a little over eight years working with ISVs, with software vendors who want to distribute their software to some of their largest enterprises to, as you mentioned, Rachel, customers that want to run it inside their controlled environments, whether that’s Bare Metal or a VPC. You know, we’ll get into a lot of those details, I’m sure. But yeah, we’ve been working really closely with a lot of large software vendors who wanted to ship into their enterprise customers and helping them get that delivery.
Rachel: Yeah. And so lots of challenges there. I think one of the things that really strikes me about the world in which you all operate is just the heterogeneity of things. So you have a whole different set of just what the networking looks like, what hardware people are shipping to, what software they are running on that hardware, be it operating systems or all the way up the stack in terms of what’s running there. And the environments themselves can look so, so different that it, I think, probably creates a fairly significant challenge that we can all imagine. But I think at the same time, we as an industry have also in the recent last half decade or so really coalesced around how can we start to standardize away some of that heterogeneity. And the place where we landed in a lot of cases is Kubernetes. So Kubernetes is kind of been this promise to help bridge us across these different environments and provide some of that portability and the problems there: 1) is that standing Kubernetes up is no small feat for a lot of companies. And 2), it doesn’t necessarily solve all the problems, as you all have heard. So I would love to talk to you just about what the state of the market looks like and what the tooling looks like in your experience and in your customers experiences.
Marc: Yeah, I think there’s a couple of interesting things there. You know, like the difference in end customer environments does continue to create challenges. Kubernetes and Linux, really, in the Linux kernel have created these common APIs and abstractions in theory where we can just say, Great, I wrote my application and I wrote it commonly as a Helm chart. It’s one of the tools that we see a lot in the market today. But I can distribute it to my customers as long as they can bring Kubernetes in. Like really most cloud providers, Kubernetes is a commodity. Most cloud providers have it as a managed service. It should be pretty easy to get Kubernetes for anybody, just like it’s as easy to get a Linux server. The challenge is that these Kubernetes environments, they have slight differences. There is a common API for here’s how a Kubernetes application can work, but a security team and an operations team manages it and maybe they’ll put a service mesh on or different policies on or, you know, on Linux itself, there might be iptables and firewall rules and different policies around what can and can’t run and pipelines that have to go through in order to get container images into these secure environments. And so this creates — every end customer has these processes and these policies in place. And even the environment that they’re actually running on is a little bit different from one to the next. And it creates a long tail where every end customer has the potential to create a new set of challenges that you haven’t seen before. You wrote a vanilla Kubernetes application and then you’re testing it and you’re running it as a SaaS service on EKS or GKE, and it’s working really, really great. And you go to the first customer and that first customer is running on a locked down, secure, air gapped OpenShift environment, which is Kubernetes compatible. But it turns out that there’s likely going to be some challenges just to get it into that environment.
Rachel: And I think on top of that, it seems like as a person who is providing the software into these environments, it would be really hard to 1) even know what all of these different things are up front so that you can understand and test upfront, but also just to be able to monitor how things are going on the back end. It seems like it’s a challenge in two places.
Marc: Yeah, I mean, I think to your point, that’s one of the biggest challenges that you see too. You start distributing it to, you know, the first, the second and then tens and twenty enterprise customers and you want to know how up to date they all are. Are these installations actually running? You need some sort of operational assurances and almost like telemetry to know whether those instances are actually successful. And that’s not something that Helm or Kubernetes really provides as their base tooling. They’re not really truly designed around this third party delivery mechanism and giving you the confidence that you need to both test for compatibility to all these different environments, know if it’s successful, have they actually adopted it? How do you know that you’re making it configurable in a way that’s actually compatible with them? Or, realistically when something does go wrong and it will go wrong, you know, customers might update to their version of Kubernetes that’s out of date or too new or something like this. How are you going to be able to support that when they’re in a disconnected environment? So a lot of that’s what Replicated helps provide. But whether you use Replicated or not, it’s just a problem that you really need to think about when you’re distributing to many, many end customers.
Rachel: Yeah. I have so many questions on all of these great things that you’ve just talked about. So I’m going to break it down in chunks. Okay. So chunk number one then I want to dive into more with you is kind of that concept of visibility. And in particular, I think one of the things that we see a lot in, especially when you’re thinking about a customer environment and especially an air gapped environment, I really want to dive in there with you. So, how can we actually see what’s happening in these on prem environments? Because it’s going to be really hard to understand how the telemetry is going to be used is one thing in terms of how can we do this, but also just how do we get the telemetry in the first place?
Marc: Yeah, you’re right. And a lot of these end customers that are running in secured or air gapped environments often are running that way. Not intentionally, because they don’t want to share telemetry, but they have different compliance reasons that they need to keep all that data internally. So one of the things that we actually realize is, you know, if the customer hasn’t downloaded the latest version of your application, they haven’t installed that version of the application. So what we’re able to do is really look at how the application has been running and be able to make some general telemetry observations around that. Have we delivered that version? Has the customer requested the update to the application or not? And that gives you, as the software vendor, some kind of higher degree of confidence whether or not they’re actually running it. In online, fully connected Kubernetes clusters that end customers are running, we might be able to know definitively that, yes, that version was installed at this time and it’s successfully switched to the new version of the pods. Everything is actually running. For disconnected ones, we’re able to put some breadcrumbs together to show whether we can definitively say it wasn’t installed or it possibly has been.
Rachel: Gotcha. So when you’re talking about kind of these breadcrumbs and assembling them together, does Replicated have a holistic way that you think about how you can do this?
Marc: We do. Yeah. So again, tying it really into a couple of different metrics around when updates were delivered or when you’re helping support a customer and they request what we call a support bundle, which is just a single archive of some operational characteristics of the application. Is it running? Is it not running? What version is running? We’re able to take these and aggregate them and show you the status of the application across all of your different customers. And so when you ship a new version, you’re able to understand adoption and success. And, is that application version and update being successfully applied, not being applied, being applied and is causing failures, or what the scenarios are. But yeah, across the board we’re able to collect that information as much as we can.
Rachel: Gotcha. And when you’re thinking about these metrics and how you gather them, have you found any patterns in terms of what a high success or, in the DORA metric language they used to use the term, an elite team. So like, have you found any kind of patterns in terms of what makes good software delivery?
Marc: I mean, it’s really the same DORA style metrics that you would look at for delivering a SaaS application. You know, the most successful vendors who ship their software into on prem environments have confidence and are shipping regularly into the on prem environments. You don’t want to fork your code or have a different release cadence or ship on a really, really slow cadence. You should be able to ship updates as regularly as you’re shipping software to your multi-tenant SaaS service. And then the same KPIs: adoption of those, the success of installation, is your team able to ship daily or a couple times a day or monthly, whatever it is that you’re shipping to your SaaS service. Can you match that? And that’s really the key metric.
Rachel: And would you say that most of your customers are delivering both SaaS and on prem versions of their software, or is it kind of a mix for you all?
Marc: It’s definitely a mix for us. There’s definitely a lot of folks who are shipping multi-tenant SaaS and they need to ship an on prem version for compliance reasons. But then there’s folks who have shipped on prem software, traditional on prem software, whether it’s a VM, an OVA or a jar file with a database. And now they’ve adopted Kubernetes and microservices, and that creates a layer of complexity to manage it. But it also creates — the team has more modern technology and they’re able to operate at a higher velocity. So we can actually bridge that so that they can treat it like the way they were treating the traditional on prem, but they’re shipping Kubernetes applications into their end customer environments now.
Rachel: So when we’re talking about these different ways in which people are approaching their business and how they’re delivering their software, it feels like it would be really challenging to test it. And we’ve talked about that when we were doing our opening in terms of what are some of the challenges that people are facing right now in terms of delivering software? And what it feels like is that we just have this combinatorial problem of how can we figure out all of the different ways in which our apps are going to be accessed. And I think we see this a lot in terms of mobile device testing. So it’s like there mobile app across devices and operating systems and versions and all these things. And so I’ve experienced it from that world and a little bit feels like you’re tackling the enterprise version of that, where we’re trying to ship it to all of these different customer environments and sometimes even our own environments. How does this portability challenge tie into the testing world?
Marc: Yeah, I mean, it ties in exactly what you just described. That is a key focus that Replicated is actually providing. Kubernetes, like I mentioned earlier, does not create this totally reliable abstraction where you can say, Oh, my application works on this distribution of Kubernetes. So therefore any distribution it’s going to work on. And end customers come with a lot of opinions about how they’re — they have to support the Kubernetes cluster often, so they’re going to come with opinions about how to manage it, what other tools are installed, what else is there. So in addition to you testing your application, one of the things that Replicated does provide, we call it right now compatibility matrix, where we can actually take customer representative environments and spin up these really quickly in your CI process. So if you have 50 different customers and they’re running 30 different distributions of Kubernetes in different versions, you can, in your CI process, test your application on those exact distributions, those exact versions and try to get the highest degree of confidence that you were able to verify both the application installation works, maybe the upgrade from the version that they’re actually running to the newest version works before shipping it to them, so that you’re not ultimately learning and troubleshooting problems because your large enterprise end customers’ Openshift cluster is not working anymore.
Rachel: Definitely. You definitely saw into the future and saw my next question, which is where does this sit in terms of the SDLC? And it sounds like it’s in the CI. I don’t know how much you lean into the shifting left thing, but it feels like this is making an earlier discovery of the problems versus having your customer success team who is trying to install something on site, be in charge of troubleshooting, is trying to catch it earlier.
Marc: Yeah, I mean, I think some of the best enterprise software distributions, the end customer is the one who’s actually performing the upgrade cycle. You don’t need to have your customer success team connecting in and managing that. So you want high confidence that it’s going to work. Like neither you nor the end customer want to troubleshoot why the upgrade failed and why this one pod is crash looping or whatever the technical problem is that you’re actually running into. So it is, it does shift that left where I think, you know, you might say our application works in SaaS, so therefore it’s going to work in our on prem environments that’s, you know, not the best, most solid deep testing strategy that you have. You might be able to say, Oh, well, we’ll test it in 1 or 2 different end customer scenarios. That’s going to be more confidence. But what we actually think is ultimately the real confidence you need is what we’re calling canary testing, where you can actually create true customer representative environments down to the patch version of Kubernetes, down to the networking overlay that’s there, the cloud provider, the instance type and everything, and actually say we’ve run through this installation or this upgrade on exactly this type of environment that the end customer is running on. And it worked. So we shifted it left into our CI process. Therefore, when the end customer goes to run it, there’s not going to be any surprises during installation time.
Rachel: How manual is it to create this testing matrix then?
Marc: Yeah. So at Replicated we’re able to tie that back into some of that early telemetry that we were talking about before where we know the end customer is running this version of your application and they might be running it on EKS or on this version of OpenShift and its many nodes, we have no idea, you know, the data that’s in the application or anything, but we see a little bit of that operational telemetry. And so based on that, we’re able to just dynamically generate that matrix so that it’s not a manual process. If your customer is a connected — not an air gap customer — but if they’re online and connected to the Internet when they’re running it and they upgrade their version of Kubernetes, the next version of your application that you test will test against the newest version of Kubernetes that they’re actually running. So it’ll automatically match.
Rachel: I see. So the insights product and then the telemetry from your insights ties into your testing matrix, that makes good sense.
Marc: Exactly.
Rachel: Wonderful. So then my last question then is, we’re talking about Kubernetes. And I know you said that Kubernetes should be easy to get and I think that’s true in theory. I think in practice maybe maybe a little bit more challenging for some organizations. So I’m wondering just about this gap between people who already are successfully using Kubernetes and maybe the folks who are not quite there yet and does replicated serve when we’re talking about heterogeneity in our environments, is Kubernetes an assumed abstraction layer for you all or not?
Marc: Sort of. We assume it as the packaging mechanism right now.
Rachel: Okay.
Marc: So if you’re a software vendor and you want to ship your software into end customer environments, package it as a Kubernetes application. Specifically package it as a Helm chart, and then you’re going to be able to meet any of the end customer demands first. Like you mentioned, many end customers can get access to Kubernetes. It is relatively commoditized. Most cloud providers have managed Kubernetes offering. That doesn’t mean it’s trivial and click a button. It can actually be a lot of work and you have to maintain it. But a lot of orgs do have teams that have Kubernetes expertise and they’ll be happy to consume a Helm chart. In fact, it’s often a preferred method to consume third party software now, but Replicated does also bridge that with some of the premium plan offerings that we actually have where we can package an embedded version of Kubernetes with your application. So if you do have a customer who might not have that Kubernetes expertise or not be running in a cloud provider that can provide managed Kubernetes, but they can spin up some Linux VMs, we can also still get the application running and it’s still the Helm chart distribution. We just take a little bit more of that surface area at Replicated and we’re able to provide the Kubernetes distribution itself also.
Rachel: So you have various means of packaging for people so you can help with the packaging or you can let people use their own installing process.
Marc: Yeah, exactly. Vendors should always package as a Helm chart or as Kubernetes manifest. And the end customer, you know, always bring Linux VMs, ideally bring Kubernetes clusters. We think the most successes are going to be the vendor who can really — or the end customer who can really draw that line and say have the operational expertise to manage the machines here in the cluster, but I need the vendor’s application managed on top of that.
Rachel: So this is not a gotcha question, but why wouldn’t someone just want to use Helm versus using what Replicated is offering?
Marc: That’s a great question. So with Helm, Helm is a good packaging tool, but it doesn’t provide any kind of insights. It doesn’t provide any update notifications. Replicated provides release channels. We provide air gap packages where you can take the entire application as a Tar GZip cryptographically signed. We can provide an expirable license with license entitlements on top of that so if you have private images that your application uses, you don’t need to go manage access to all those images and distribute image pull secrets and keys. Replicated can provide all of that out of the box. Helm is a great packaging tool. Where Helm does not deliver, we actually kind of bridge that a little bit on top of Helm, not any kind of replacement of Helm, but on top of Helm. When you’re talking about commercial enterprise software delivery, there’s a different layer there that Helm just doesn’t have an interest in solving, and it’s a layer that we just provide on top of it.
Rachel: So Marc, thank you so much for your time. I really enjoyed the conversation today. If I were going to sum up where we went with our discussion, I would say that your take on the world is that to successfully scale the delivery of software to heterogeneous environments, you have to have plans around packaging that software, having some degree of insights in and around the telemetry of how it’s being installed and used and then being able to test in advance so that people can successfully install it. But I don’t want to put words in your mouth. So would love to hear just what is your elevator pitch on what Replicated is doing and how you envision software delivery?
Marc: Yeah. I mean, I think that’s a really well said description of it. I think I would say if you want to ship your application into on prem, into enterprise customer environments, package it as a Helm chart and if you have one customer to ship it to, that’s great, you’re going to be able to solve those problems. But as soon as you want to start scaling the delivery of that to multiple customers, you’re going to find the heterogeneous environments that you’re referring to and all the challenges around testing, supportability, telemetry, observability, and all this are really going to become more and more critical as you get more customers. And that’s what Replicated provides. We provide preflight checks and compatibility testing matrices and that observability pushed into your systems and things like this. And again, it’s not important when you have one, maybe two customers, but as soon as you want to start scaling on prem customers, it becomes really, really critical and it becomes more important. So start with Helm, and then Replicated can provide that on top.
Rachel: Wonderful. Well, Marc, thank you again for your time today. This has been a wonderful conversation. If any of our viewers want to hear more about Replicated and what you all are doing, where should they go?
Marc: Replicated.com.
Rachel: Wonderful. Have a good day.
Marc: Thanks.