A RedMonk Conversation: AI and Observability (with Elastic)

Get more video from Redmonk, Subscribe!

With so much buzz (and noise) in both the observability and generative AI spaces, how should SRE and Ops teams navigate emerging standards and promises of potentially transformative tooling? Join RedMonk’s Kelly Fitzpatrick and Elastic’s Gagan Singh (VP of Marketing) as they talk about the present state and potential future of AI and observability amidst increasingly complex systems.

This episode also kicks off a special mini-series of RedMonk Conversations that focus on emerging trends in AI and how they intersect with various areas of interest to the tech industry.

This was a RedMonk video sponsored by Elastic.

Related Resources

Rather listen to this conversation as a podcast?

Transcript

Kelly Fitzpatrick: Hello and welcome. This is Kelly Fitzpatrick with RedMonk here to kick off a RedMonk Conversation miniseries of sorts on artificial intelligence because we have been hearing and speaking about it so much. As you can imagine, in this inaugural episode, we will be talking about AI and observability. With me today is Gagan Singh from Elastic. Gagan, can you please tell our audience a little bit about who you are and what you do?

Gagan Singh: Thanks, Kelly. Great to be here. My name is Gagan Singh. I lead the Product Marketing for Observability here at Elastic and really excited to be here.

Kelly: And thank you for joining me so much. So jumping into the observability part of our conversation, when folks in the tech industry hear Elastic, they often think Elasticsearch or maybe other components of the ELK stack. You know, Logstash, Kibana, and now Beats — when people hear Elastic Observability is however, probably not as top of mind. To your mind, what should folks know about Elastic’s take on the observability space.

Gagan: Right. Yeah so that’s an amazing question. And so you know, one of the things is, Elastic got its start with the ELK stack and that’s what we’re known for across the world. And you know, we’re one of the most downloaded solutions out there in the industry. And so as you may be aware, ELK stacks stands for Elasticsearch, Logstash and Kibana. But we’ve been really making great strides over the years. And, Elastic has been focused on the observability solution. And the big reason for that is we feel that ultimately, observability is a data problem and that’s what Elastic is really, really awesome at, geared to do and solve with all types of data. And so we’ve been making rapid innovations on observability and you’ll see that we have a full stack observability solution which a lot of our customers and prospects are surprised to hear about.

Kelly: And I like that assertion that observability is a data problem and it’s one where data is increasing everywhere, but especially around observability.

Gagan: Right. We think that it’s been really beneficial to us because ultimately you’ll see how observability has come into play, how it has evolved. And if you go back a little bit historically, and think through it, we started initially with the world of data centers where updates were not as frequent, where people knew what to look for. So they used tools, right? You know, whether you had Nagios or other open source tools, or for that matter, any other tool that you might have. And everything was being done in silos. And then with the advent of cloud, that got really, really complex and the silos started to break down a little bit and operations teams and vendors in the industry realized that operations teams needed something that connected the dots together. And so Observability was born, which was really trying to bring the different signals together, metrics, logs, traces in a unified view. So I think that’s been a really, really interesting trend. And obviously we continue to see an evolution there as well, which we can talk about in a little bit here.

Kelly: Yeah. So let’s jump into the AI part of things, talking about evolution. To your mind, where does AI fit in Elastic’s portfolio? And what does it mean for Elastic’s observability capabilities?

Gagan: Right. I mean, I think like I said, Elastic has been, you know, for us it’s all about data. It’s being able to derive insights from data. And we’ve been at machine learning for a number of years. We’ve been working on it. We provide extensive machine learning capabilities in our platform and we’ve been really lucky to be able to leverage and for users to take those capabilities and apply them, for example, both in the observability as well as the security domain. And the way we see it headed is that with the advent of, let’s say, generative AI coming into the picture, there’s obviously much more usage in practice. And we believe that we’re going to move more and more towards the AI powered observability where, we think from an Elastic standpoint, we’re really well positioned and that’s what we’ve been looking towards and working towards.

Kelly: And I like that articulation very much, observability as kind of powered by AI. Because you talk about the relationship between these two technologies or entities and how they run into each other. I very much like that you are setting out what that relationship is.

Gagan: And I think the way we’ve been thinking about it is, there’s been a lot of talk for AI ops for a number of years. And that’s really being able to apply all the machine learning on the data to be able to reduce the noise in terms of the alert storms that people may get to be more and more proactive. We feel that obviously we have the capabilities from an AI ops perspective, but then complemented that with the generative AI where it enables… let’s say level one operations teams, level two operations teams, you don’t have to be an expert at every error message that’s being spitted out by the different systems. You can actually have an interactive dialogue with it and get all the information that you need. And so that’s where we feel the whole combination of AI ops and generative AI will be really the next area where we’ll be seeing the breakthrough and the turbocharging of AI ops, quite honestly.

Kelly: Yeah. And I like this kind of vision, if you will, simply because it kind of boils down to these very concrete, “What does this mean for people who are working in this field?”

Gagan: Yeah. And so, you know, I think we’ve been historically, if you look at the teams: Operations and SRE teams and DevOps teams, and so on, there’s been a lot of training around how you can build dashboards, how you can look at dashboards. And while that may be very informative, and there’s definitely a place for it, but is there a way where, at the time you need the data, you’re actually able to bring that information together? Or being able to be proactive about it. And it’s not just only being able to apply machine learning to only a single data type. You know, it’s like metrics, logs, traces, any sort of business KPIs, all those capabilities you can bring in. and you should be able to apply machine learning to those, right? And that’s really, really important.

Kelly: Yeah, very, very much so. And so another question. To my mind, I think one of the very tricky things when you’re talking about AI and observability is that there has been a ubiquity about AI to the point where we have seen so much AI washing, almost like everybody wants to just stick a sticker on something that says “now with AI.” And then before generative AI really took over everybody’s imagination, we kind of saw that with observability as well, and that people were just like, we do observability now. How do people sort through all of this noise to see what really is going on with AI and observability?

Gagan: That really has been an area of confusion for folks in the industry and really trying to understand where’s the hype, where’s the reality? And ultimately, my suggestion is that as teams are considering the solutions, be able to look at a couple of areas. One is, are your solutions able to ingest the type of data, the scale of data that you’re actually looking for? How easy is it for you to be able to ingest that data? And then what do you actually do with that data? Are you able to, for example, apply all sorts of machine learning to it? When you talk about, for example, generative AI, there’s this aspect of, okay, how do you — you know, you can definitely link to things like OpenAI, right? And other public capabilities that exist out there. But how do you then use — but that can cause a lot of hallucinations as well. So how do you apply some of your private data and proprietary data where it makes sense to your operations team? So those are important things to ask, consider, like how do you personalize any of that? Are you able to apply — does your machine learning or generative AI learn based on every time it has an interaction? Does it build the models a little bit more? Is it able to leverage your data? And then to be able to then solve the problem, to be able to give you that context is really, really important.

Kelly: And I think for the job of, say, an SRE, context is always important.

Gagan: I believe that that’s going to be a great — those are big areas to sort of consider, to look at. And I would say you have to look at the ability to, when you’re looking at error conditions or things of that sort, you have to be able to say, okay, what does this mean in my context? That’s an important question. How does the tool that you’re looking at really solve your problem? Is it able to leverage your runbooks for any sort of remediation and so on, are things that SRE teams have to continue to think about and look at when they’re evaluating vendors, when it comes to observability, when it comes to AI and generative AI and how it solves their day to day problems.

Kelly: I think that is very good advice. So we’ve talked a little bit about where we see AI and observability now. To your mind, where does observability go from here? Like what is in observability’s future? And I know AI is part of that, but maybe not all of it, right?

Gagan: So I think definitely, that’s one area. So the way we think about it, I would say, and the way we talk about it in the market is that, like I said before, we were in the monitoring world in the earlier days, and that evolved into the observability world, which based on the evolution of the infrastructure, the applications, the cloud native and so on, that continued to happen in the market. But now we are moving into more and more an AI powered observability world, right? Where it’s about generative AI, it’s about being able to leverage AI with AI ops and so on. But in addition to that, we also feel that OpenTelemetry is going to play a big part in this evolution in the market. Customers are looking increasingly for observability vendors and solutions to be open, to be able to leverage these open standards to make sure that their data ingestion, their data ingestion architectures are all fairly flexible. They’re able to get the maximum benefit out of the data that they’re able to get, right? And that’s one of the things that we feel is going to be really, really a big evolution. And if you look at it from an Elastic standpoint, we had a while back contributed Elastic Common Schema to OpenTelemetry and really the big intention there with, obviously working very closely with the OTel SemConv, was to make sure that there’s a common schema that helps break down the data silos of the different types of information coming in. And that applies to both observability and security.

So that’s something that the OTel world is also very interested in. We feel the next thing is really the standardization of where Elastic, for example, has been supporting OTel cloud native and natively, which means is that you could use an Elastic agent or you can use an OTel agent or an SDK to be able to just transparently be able to see all the information related to traces in Elastic itself. And the next step is really where we are converging on a single collection architecture and ingestion architecture, which is based on OTel. And that applies for both observability and security because we feel that that’s really, you know, both of these are a big data problem. Another aspect I would talk about here in terms of the evolution we see is that there’s been metrics, logs, traces for a long time. But we believe that the profiling signal through continuous profiling is really important as well because it also bridges the gap between what you get before you can do, let’s say, APM related instrumentation, but provide SREs, very deep insights in terms of the code, the code optimization, the usage of the infrastructure, the applications, the functions and what’s optimal and what’s not, especially given these days when there’s a lot of focus on OpEx and cloud costs and so on. So you want your costs to be optimized but being delivered the right performance.

Kelly: Yeah. And I think another note that we’ve heard around profiling is also cloud cost, which more and more organizations care about in this day and age, and also sustainability. It’s like if you can just use less resources, you’re using less resources, which is kind of good.

Gagan: Yeah, absolutely. I mean, going green is a big initiative for every enterprise, every customer out there. And so we feel that that is something that really, really helps deliver on that promise for customers. And so we’re really excited about it. And so, yeah, we think that the future for Observability really lies in a couple of these big areas that really is around AI, around being open and flexible in terms of your architecture around customers owning their own data.

Kelly: So final question, in the future, do we have a world where observability has zero dashboards?

Gagan: Yeah. I mean, I believe that that could be the case. But, I think at the end of the day, there is a value. So the way I would respond to that question is to say that in terms of the information, humans can — ultimately there’s going to be some level of automation, there’s going to be some level of human interaction. That will always stay there. So a lot of the data can be automated, can be, you know, leveraged using machine learning and all those capabilities. And I believe for those you don’t need dashboards. You want to be just notified of what the problem is, what, you know, the solution considers to be the root cause of the problem because we understand the environment really well. But, when it comes to any sort of remediation, being able to look at the information, there are always going to be areas of opportunity where humans are interacting with that data. And I believe that’s when things like dashboards and all sorts of reports come in because, you know, you want to look at certain trends over a period of time that can be only visualized. You want to be able to do some sort of reporting. Maybe you are, you know, gold on SLAs that you’re able to deliver, which the other person is going to look at. So I think, yes, there will be an increasingly less reliance on solving problems through dashboards or identifying problems through dashboards. And that will more get more and more automated. But, solving the problems, remediating, might still involve looking at visual elements, which would be the dashboards.

Kelly: I will take that. I will take a more optimized use of dashboards as opposed to dashboards everywhere. It’s like dashboards all the way down, I feel like these days.

Gagan: Right, because that doesn’t solve any problem. And I think the other thing I would mention there really quickly is that oftentimes you have operations teams who are looking at dashboards. They’re looking at, for example, CPU going up and down on all those other things. But the bigger question to ask is ultimately you can have a number, but is your business operating the way you want it? Are the business transactions going through? So what if our CPU is at 90%? Does it really matter? That’s the big question that users, the operations teams have to start considering more and more of is how do they connect the operational data to the business data. And again, it’s a data problem, but being able to connect the dots, being able to see what the trends are, what the anomalies are, you want to be definitely notified when there’s a business problem, whether your CPU and memory is running high or low, it doesn’t matter. If your business is not performing, something’s wrong.

Kelly: So we are at time, but this has been a great conversation. Gagan, thank you so much for joining me today.

Gagan: Thanks, Kelly. It’s been great having a discussion around ML, around generative AI as well as, you know, OpenTelemetry. Those are big concepts that are being talked about in industry and I’m really, really excited about the future of observability.

Rather listen to this conversation as a podcast?

Transcript

More in this series

Conversations (76)