A RedMonk Conversation: Leadership Challenges with Infrastructure Automation

Get more video from Redmonk, Subscribe!

In this conversation, RedMonk senior analyst Kelly Fitzpatrick and Rob Hirschfeld (CEO/Founder of RackN) discuss their top takeaways from a recent NYC technology roundtable on the challenges tech leaders face with infrastructure automation. Some challenges discussed: making automation repeatable/reusable; dealing with increasing complexity (and how this can be a cultural as well as technological challenge), and the ongoing battle with organizational silos. Also of interest: why some organizations who have moved workloads to the cloud are moving them back to bare metal.

This was a RedMonk video, sponsored by RackN.

Resources

5 Steps to a vendor-neutral infrastructure
Rob’s talk on LLMS: Are LLMs Leading DevOps into a Tech Debt Trap?
DevOps: Tools Can Lead Culture Change (by RedMonk’s Rachel Stephens)
Securing Bare Metal Infrastructure: 5 questions you must ask
Platform engineering is more than a portal
Learn the bare metal booting basics in 30 minutes or less

Rather listen to this conversation as a podcast?

Transcript

Kelly Fitzpatrick: Hello, this is Kelly Fitzpatrick, Senior Analyst with RedMonk, here with a RedMonk Conversation on Infrastructure Automation. With me today is Rob from RackN. Rob, can you say a little bit about who you are and a little bit about RackN?

Rob Hirschfeld: I’d be happy to, Kelly. Thank you for the introduction. I’m Rob Hirschfeld, CEO and co-founder of RackN. I have been in the infrastructure and automation industry for over 25 years now. And specifically with RackN, we’ve really taken the vision that I’ve had for how we build and manage large scale infrastructure and tried to make it repeatable and usable to a much broader audience. So RackN’s focus is building an infrastructure automation product that allows people to basically leverage the best of automation controls and management at the infrastructure, for us, bare metal and up, that is being used anywhere in the world.

Kelly: And automation is definitely one of my favorite topics. But then you throw in repeatable and I’m just, this is even extra better. So to give our audience some context, a few weeks ago, I led a technology roundtable in New York City that was co-sponsored by RackN, on leadership challenges with infrastructure automation. Our goal was to get some very savvy tech leaders in a room, so heads of engineering, DevOps, etcetera, and dig into some of the real problems that we know they are facing around infrastructure and automation. We did this kind of Chatham House Rules style, so we will not be sharing any specific individual names or organizations. But what we did want to share were some of the key takeaways from this very productive conversation. So to kick us off, Rob, what is one of your top takeaways?

Rob: Wow. It’s interesting. One of the observations I think that’s useful to start with is that we deal with a lot of financial institutions and banks. And people imagine the banks as sort of being behind the curve. But the banks that we’re dealing with are leading technology, they’re doing processes and using technology in really cutting edge ways. So getting to be in the room and hear what they’re doing and what they’re facing was absolutely incredible. The thing that was most surprising to me was that one of our leaders actually turned around and said, “It’s not that I don’t have enough automation.” He was actually saying “I have too much automation” and that blew my mind. I usually think of helping people automate that never have enough automation. And for him to say there was too much — that took some untangling.

Kelly: Yeah. And I think we have too much of everything these days in some ways. Like we use the word sprawl all the time, like API sprawl, data sprawl when you talk about microservices. And automation sprawl is probably also real. And I think it also gets into — one of the topics that we talked about and I think our attendees really focused on as well, was this idea of complexity. Right? And complexity can come into the conversation in so many ways. So complexity of infrastructure, of how we’re building applications, of permissions, of history, especially within larger, more long storied enterprises. But to your mind, what was your take on complexity as we saw it play out in the conversation?

Rob: It’s interesting because I think, I know this and often forget it at the same time. A lot of automation is glue and when we think about glue, it’s sticking things together, but it’s sticking together all of these bits and pieces that are constantly changing. So automating one department or one server, one operating system, those are all moving targets. They’re constantly being changed. The risk profile changes, right? You find a bug or a security flaw comes out and all of a sudden you have to run around and patch everything and fix it and change it. And so that glue is much more brittle for pieces. And this is one of the things that our leader was talking about in this. He was saying within a department, they will automate things, but it’s so fragile when they do that and they’re so afraid of somebody else using their automation or making calls against it. They don’t even publish APIs in a lot of cases because a department’s worried that the automation backing that API isn’t resilient enough and people are going to hit an API and not get a reliable result. And so they keep building these silos. And this really came out, I think as we discussed it, that the automation was part of a symptom of their organizational structure where it was so hard for them to have automation that multiple people could use or that they could maintain effectively. They just built moats around every department and then that kept them from automating across departments.

Kelly: Yeah, very much. I want to get back to this idea of silos in a bit, but there’s something that also came out of what you just said around automation and folks not wanting to reuse it. Whereas, one of the things at RackN that you’re trying to do is like, how do you make automation repeatable and reusable? So what is your take on that? Is there any first steps towards thinking about how to solve the problem of people being afraid of repeatable automation?

Rob: Oh my goodness. And they really are. One of my favorite KPIs that we promote is this idea of an automation half life where where people have this, you know, they have automation. And if you say, would you run that automation in two months or six months or a year? Most people, they get nervous if the automation, if they haven’t managed it or changed it within a couple of weeks. And so you end up with this really interesting complexity challenge around how the automation works. What we did at RackN is we actually start with a mission and this really comes back to RackN’s mission, which is making it so people can reuse automation. Everybody uses the same servers and the same operating systems, the same platforms. You would think that we’d have a lot of reuse and commonality across the industry, but we end up using all those bits and pieces in different ways. And so when we founded RackN, we actually started with this vision of how do we make it possible to have the automation work shared and reusable. The consequence has been we’ve built what we call an infrastructure pipeline. Some people call them a change workflow. It’s very similar concepts. But as we build those, we build them in abstracted ways and we build safeties and protections and guards into those workflows so that the workflow itself has the right abstraction points.

Rob: And this is the benefit of being a product company, right? Treating automation as software, not as a script. So a lot of times when people build automation inside their organization, they solve their one problem for their one thing and then they move on because they don’t have a lot of time. What we do, sort of fanatically at the design level, is we make sure that when we build automation, it’s going to work across all of our customers. And that attention to detail, having an architecture that is designed for that really transforms how that works. The unfortunate consequence is that when you look at that automation — and this is the paradox of complexity, right, where that automation looks more complex because it has safeguards and protections and things that you’re doing. And one of the things that we found in researching complexity, because there is actually complexity research, is that a lot of the complexity here is actually safeguards and protections and resilience. And if you start taking those things out, then you actually make the systems simpler on the surface, but more brittle, more fragile like we’ve been talking about. And so you do have to add in these safeguards. But once they’re in, they’re much more productive for everybody. You just have to take the time to learn it.

Kelly: Yeah. And I know that you talked specifically about having a product that can be used across different organizations. But going back to this idea of silos within an organization, you think about enterprise — especially a lot of large banks — and there are these silos even within just the tech side of things. And, you know, how do you see that playing out? How do you see that play out to the folks that you’re talking to? And how do you talk to them about getting around — and not just in terms of like — here’s a tool that will help you do it, but how does that translate to some type of addressing some of the more cultural issues at hand?

Rob: And it was funny because I was asking some of the questions we had to our leaders at the table. We’re trying to drive to exactly this type of challenge. And a lot of it comes down to them being confident that their systems will work. Right? It’s, you know, if they had a system where they felt like it was reliable, then they could have an API. But some of it requires them to have business leadership that says, Oh, we actually want to connect these teams or processes together. The good news is just like CI/CD systems came through and 1 or 2 departments could collaborate and then you could add in the other pieces. It doesn’t take every department signing on board to get going. If you could actually build one, even one department with stronger APIs — this was a point from the table, actually. If you build one department that has more solid APIs and they start being more consumable by upstream and downstream departments and they start asking for better APIs, there is a bit of a contagion from that perspective, but it’s a much more outward facing model than their traditional comment. The one that was amazing to me that we heard on the roundtable was, Yeah, we have departments that would rather get a ticket because it gives them more control and they don’t have to expose their inner workings than providing an API or an automated process because then they have to show how they do something. And that’s just deep organizational behavior.

Kelly: Yeah. And that fear of giving up control — or not even just giving up control, but letting people see what is going on and powering things behind the scenes… like I never want to have to do a ticket again if I don’t have to do a ticket yet there are folks who prefer that kind of option. And then I think for me, one of the other things that came out was, we talk about automation. And increasingly now we talk about AI in this way and the different fears from different parts of the organizations, and that tech leaders have different fears around automation and AI than individual contributors. What are your thoughts on that, especially based on what our tech leaders were saying.

Rob: Well, this this to me is one of the challenges with AI because we’re very excited about AI generating a lot of code and automation. And I get excited about that too. But if you think about the — and we talked about this at the at the table — it’s like, well, if your AI is generating more automation and you already have too much automation, that doesn’t sound like a good match. And that was one of those moments in that everybody sat back and was like, Oh yeah, maybe this AI generating a whole bunch of stuff isn’t going to solve our velocity problems because our velocity problems are actually from having more automation than we need, not from having the automation work together better. And so I do think that there’s no doubt AI is very disruptive. All of our customers are talking about it. It’s changing the way they’re consuming infrastructure and the way they think about building infrastructure. So it’s impacting it both on what they do and how they think about what they’re building. But at the same time, I do think we need to be much better at using these tools to make better assessment across our organization, not just make that silo wheel spin faster, but how do we actually use it to improve across team collaboration? And we didn’t have a lot of great answers for that at the moment, but it was one of those like, Hey, we do need to fix this problem, not just use AI to run faster into the holes we’re already in.

Kelly: And I feel if we already had all the answers to that, we would be retiring on the fortunes that we had amassed from that. So another topic that came up that for me was really interesting was the idea of repatriation. So we talk about, you know, folks moving to the cloud. Repatriation is the boomeranging back to on prem, even, even on some bare metal. As RackN is kind of all about, you know, doing things including bare metal. What are your thoughts on that?

Rob: You can’t repatriate without bare metal. So we’re very excited to see people who are looking at their AWS bills or cloud bills and especially in the light of, oh, we’re going to be training models and building bigger infrastructures, storing more data, right? All the knobs are getting turned up in how much infrastructure people consume. And the cloud costs are not coming down commensurately with that increase. And so there really is a need for people to look back at, can I economically pull back. And what we know from working with our customers is that if you build from the bare metal up in very cloud like ways and provide — because our best customers are the ones who are the best cloud consumers, which surprises people, the ones who understand API driven infrastructure just take our product and they get incredible ROIs from it because we’re driving cloud behaviors everywhere. The opportunity here of being able to have an infrastructure that you control and has that API driven consistency all the way from the bottom up, it is transformative. And I think that we see the repatriation not just as, oh, we’re going to save you money on infrastructure, but actually that we’re going to let companies reestablish control of their supply chains, that they’re going to be able to have real alternatives in how they look at infrastructure. And for every business we deal with, that amount of control is, you know, they gave it up unwillingly when they moved to the cloud and they start remembering. Wait a second. I actually can control when I take patches and versions and what things are getting built and how I control these resources and even depreciating them and capital expenses for them. Those are very real business aspects and impacts.

Kelly: Yeah, very, very much. And I think one of the important things out of what you’re talking about is we often think of bare metal as the antithesis of cloud. But when you’re talking about, okay, you’re building something on top of bare metal with the eye towards it being very cloud like… or we use the term cloud native to describe as a way of doing things as opposed to just the public cloud or just Kubernetes or things like that. So I think expanding that idea of what building software in 2023 is like.

Rob: It’s a funny idea because we are starting to see a lot of interest in Kubernetes on premises, but omitting the virtualization layer. We provide dynamic bare metal and you can eliminate the whole virtualization layer from that. Change the profile of the machines you buy, eliminate really expensive shared storage infrastructures. There are ways in which you can take cloud like behaviors. And if you’re freed from some of the traditional enterprise architectures, you can create incredibly cost effective, high performance infrastructure. If you all of a sudden are like, Oh, I’m not afraid of the bare metal anymore, it really does transform how you look at it. And the reality is if you were coming back, and coming back is funny because some of it’s going back to colos and traditional data centers, but some of its edge and building more items in situ for things and being able to have the type of enterprise control and IT involvement in edge infrastructure which traditionally has been all sort of cobbled together as one siloed tech. We’re always back to silos at a time.

Kelly: Yeah, I think so much of tech is dealing with silos, avoiding the silos. It’s, I don’t know, silo driven development sounds bad, but dealing with silos is probably something that most folks need to do more than they would like.

Rob: In some ways, the whole platform engineering trend that we see is silo driven development. It’s sort of saying, you know, Hey, I just want to work on my one thing. I want everything else to go away. Behind that handy give me a whatever button that we’re building for these portals, there is actually a lot of operations work getting done. And so we definitely need to keep note of how do we improve that efficiency. Because I think what you’re saying is if we’re not careful, we just build what we need and we keep building deeper and deeper silos. And that’s expensive.

Kelly: And we end up with un-reusable scripts as automation instead of something that is more durable and more versatile.

Rob: Very short term thinking from that perspective. But that’s sort of — we’re always go, go, go, and that sort of ends up with these, you know, “I can make a decision within my with my credit card and get it done.” And what we see when our customers get very, very successful is when they step back and they start asking systems questions. The ROI on systems thinking is enormous and we see the KPIs that our customers see from that type of choice is remarkable, where they break down those silo walls. But that is a cross organizational leader. And even within the silos, the leaders in the silos have to understand that there’s a benefit to working on somebody else’s priorities. That’s hard.

Kelly: Yeah. So I think tech leaders and ways they can try to break down silos within their organizations. I think that’s a great note to end on. Rob, thank you so much for joining us today. I’m glad we got to recap everything we learned in New York City.

Rob: Me too, Kelly, I appreciate it. Great questions. And really, I wish more people had been able to participate live in the event instead of just seeing the recap, because what a great room and your leadership in that room was really fantastic. So thank you.

Kelly: And hopefully we can make this an annual event.

Rob: We’re planning on it.

Resources

Rather listen to this conversation as a podcast?

Transcript

More in this series

Conversations (136)