In this RedMonk conversation, Matt Klein, Co-founder and CTO of bitdrift, chats with Kate Holterhoff, senior analyst at RedMonk, about mobile observability’s challenges. They discuss how hard mobile observability is compared to server-side observability, the impact of privacy controls on data collection, and the cultural divide within organizations between mobile and backend engineers. Matt shares his perspective on OpenTelemetry, emphasizing that while it helps establish a baseline for telemetry, it’s “not going to solve your observability woes,” particularly as they relate to broader issues of data overload and vendor lock-in.

Transcript

Kate Holterhoff (00:10)
Hello and welcome to this RedMonk conversation. My name is Kate Holterhoff, senior analyst at RedMonk. And with me today, I have Matt Klein, co-founder and CTO of bitdrift. Matt is alumni of Lyft, Twitter, Amazon, Raytheon, and Microsoft among others. He’s also been deeply involved in the CNCF as a board member and creator of the proxy service Envoy. Matt, thanks so much for joining me on the MonkCast.

Matt Klein (00:33)
Thank you for having me.

Kate Holterhoff (00:34)
All right, so do you wanna add anything to my introduction or does that about cover the highlights? Cause I could also mention that you changed your job title to plumber everywhere on your LinkedIn profile, which made researching your background super exciting.

Matt Klein (00:48)
That is a feature, not a bug. actually, I found out many years ago that if I changed my job title to plumber, I stopped getting recruiting spam. So yeah, was mostly through self preservation.

Kate Holterhoff (00:50)
OK. All right, fair enough. So I invited Matt on to chat with me about two of his recent posts that I think intersect with some of the research that I’ve been doing at RedMonk on front-end observability, the sort of general brokenness of mobile development, and most pertinently to what you’re doing at bitdrift, I would say, is mobile observability. interested in digging into that, I wanna start with your post, “Why Does No One Talk About Mobile Observability?” Can you give us a quick abstract about that post?

Matt Klein (01:33)
Yeah, I guess the short version is that from a larger cloud native perspective, we’ve obviously been doing observability for a long time. People understand that on the server, you have your logs, your metrics, and your traces. And from a reliability perspective, there’s a lot of people that understand the importance of that.

If you look at mobile in general, would say that mobile is 15 or 20 years behind what we have on server, meaning most mobile engineers, they’re used to having a crash reporting solution at best. But if you look at how they understand the actual operation of their applications, there aren’t that many tools out there, meaning how do you get metrics live from your mobile applications? How do you get logs live from your…

mobile applications. And before we dive into why this is hard and why no one talks about it, I’d like to start by saying that it’s actually really frustrating that that’s the case. Because if you look at the way that people use applications these days, almost everyone accesses these web applications through their phone. That is the primary avenue by which people use the applications that we build today.

Obviously people access through web as well, but I think most would agree that app-based access is really the primary mechanism that people use these days. And what’s frustrating to me just from an application developer is that the user experience that people have on their app that actually is the only thing that matters.

And one of the most frustrating things for me, again, coming at it more from an operations perspective, is how many times in my career that we’ve talked about having a 99.99 % success rate on our server infrastructure, and then some application is returning 200 HTTP OK with some JSON, and that JSON is crashing that.

So, you know, that the application’s actual success rate is 0 % and there’s a lot of customers that are super frustrated. So, I’ve always found it frustrating that we invest so much in server-side observability and I’m not saying that isn’t important. It’s obviously extremely important.

But I would argue that the observability that matters is actually what people are experiencing on that true edge, what they’re experiencing on their phone or within the web browser. To answer your question around why does no one talk about it, the quick version is that no one talks about it because it’s really hard. If you look at server observability,

Again, not saying server visibility isn’t difficult and that people have built a lot of amazing systems, but there’s many things about server observability that are substantially simpler. One of those is that on server, we typically do 10, 15, 20, 30 deploys per day at large companies, maybe larger. So if you find that you’re missing a metric or a log or something like that, you just add it and you deploy.

If you look at mobile, the standard release cadence for most mobile applications is two weeks at best, maybe longer. You’ll do an update, you’ll submit it to the app store, maybe it gets approved after a week. Then obviously there’s a long tail of updates. So to get an app version out to a customer could take a month.

So if you think about the way people think about observability on server where they can say, oh, I’m missing this information. Let me go and add this metric or log line. And they can do that within 30 minutes, 30 minutes on server, one month on mobile. So start thinking about that cadence of just, how do I solve this issue? It’s the equivalent of printf-style debugging, but you’re doing it over a one month period.

versus a 30 minute period. that’s the first part. The second part is that on server, although obviously failure happens, we really live in a world with server infrastructure where it’s pretty reliable, right? You know, typically, network is reliable, it’s not oversubscribed.

the containers and the virtual machines where we run these applications, they’re fairly reliable. Failures don’t happen that often. It’s not to say that they don’t happen, but they don’t happen that often. Whereas on mobile, it’s really the inverse. On mobile, the network is not reliable. You’re constantly losing networking. It’s like you think we live in this modern era of excellent networking, but the reality…

is that you’re constantly in cellular dead spots, you know, where you’re losing connectivity. So mobile applications have to be extremely resilient to network problems. That’s part one. And then part two, you know, the mobile operating systems where we’re running these applications, they are particularly aggressive for probably fairly obvious reasons about protecting the performance of that device.

And the way that they do that is that when applications move to the background, they might be suspended. If an application is using too much CPU or memory, it might be terminated. And application developers have very little control over this. So that’s a long way of saying that in mobile, you really have to be prepared for the fact that your applications can be terminated really at any time. So the bar for…

resilience of your observability output is a lot higher. So again, on server, you end up having all of these potential issues, but they’re fairly rare. And then on mobile, they just happen all the time. So the way that you have to protect and code against making sure that you get all of that information that’s happening on mobile back to your infrastructure is much more difficult.

And then I guess the last thing that I would add is on mobile, you’re dealing with such a heterogeneous environment, right? It’s like most people on server, maybe you have some containers that run on x86 and some that run on ARM, but for the most part, your containers are fairly the same. On mobile, you’re running on…

hundreds of different devices, tens of different operating system versions, lots of different app versions in the wild. Users may have different permissions that they’ve granted applications. So it’s very tricky environment, where things are really not the same. So, you know, there’s certain things that are simpler about mobile in the sense that you’re dealing typically with a single user environment versus a multi-tenant environment.

But almost everything else is substantially more complicated. So when we try to bring observability, when we try to bring all of the goodness that we have on server to mobile, it ends up being very complicated.

Kate Holterhoff (08:19)
I think it’s just a really good abstract. I mean, I’ve written a little bit about what makes mobile development broken. And I think that your post actually does a phenomenal job of pulling out the technical reasons why we’ve landed in this condition. mean, there’s some overlap in the things that we’re seeing. what I really enjoyed about your post was the granular detail.

Matt Klein (08:21)
The last thing that I was going to add about mobile that I think might be counterintuitive for a lot of your listeners is that network bandwidth. We already talked about how the network on mobile is lossy. That’s fine, right? But what a lot of people don’t understand is that the amount of data that applications use really, really matters. And again,

You might be wondering, well, aren’t most people on unlimited data plans these days? How much could it actually matter? The reality, and I actually wish that there was more published scientific studies on this, but almost every large web property that I know of has done internal studies, and they’ve come to this conclusion, which is that the more data you send and receive from your mobile applications, again, might

be obvious in hindsight, is that this data, obviously the more data that you send, the higher the latency that it takes to get API responses from your server. And what nearly every web property has measured is that there is a material decrease in conversion, whatever conversion means for your app. If you’re a shopping app, might be less purchases. If you’re a social media app, it might be less people tapping on things and that leads to less ad shown.

every web property has shown that the higher the latency, the longer it takes for things to load, for data to come back from the server, that decreases conversion. And for large web properties, that can be a really huge decrease in potential revenue. And what a lot of properties have also shown, which I think is fascinating, and I’m obviously biased, this leads us directly into why I think bitdrift is interesting, is that a lot of applications

the data that’s sent and received, if you actually look at the breakdown, a greater portion of the data ends up being analytics and observability data than actual application data. So what ends up happening is that these applications are sending or receiving a ton of data. They’re trying to get that observability going.

There’s more analytic data than application data. The analytic data is drowning out the application data and leading to decreased performance. And then at that point, you’re actually decreasing the conversion in your app. So that was the last thing that I wanted to say, which is that mobile observability is really hard for all these reasons. But the last and possibly the biggest one is that it’s

just the act of sending the data can affect the rest of the application, right? Which is pretty hard to your head around because on server, again, people don’t tend to think that way. Like, yes, obviously, you know, gathering logs and metrics takes CPU, but there’s typically ample network bandwidth, right? It’s like people don’t typically think about those things. Whereas on mobile, just the act of collecting all this information potentially all the time can lead to substantial problems.

Kate Holterhoff (11:36)
Yeah, it’s a huge deal. And I’m glad you are emphasizing it here. So again, you have all these excellent examples of what makes mobile development just so challenging and these unique affordances that we have that we don’t see with the server. One that I think is worth talking about a little bit more in detail is the issue of privacy controls. And you mentioned that they’re evolving all the time. And I think this one’s particularly interesting, I mean, following the news.

You’ve got the Apple Store and the Google Store, they set the rules about what can be published. And so you’re beholden to this platform. And PWAs might be a solution for bypassing this, but that doesn’t seem to work for lot of folks. So I guess I’m curious, could you dig in at all to the state of privacy when it comes to meeting the demands of these platforms?

Matt Klein (12:21)
Yeah, I’m going to be perfectly honest. I am not an expert in this space, so I will do the best that I can. Just to touch on your first point around the web apps, at least what we’re seeing is that there was a period of time where I think a lot of people were looking to go to these progressive web apps. But for reasons probably around performance,

At least what we’re seeing is that almost everyone that moved to web apps is now moving back to native. and again, like I suspect a lot of that is just that with the web apps, it’s very difficult to get that native look and feel that people have come to expect from their Android and their iOS applications. So just wanted to point that out.

which is that nearly, we still hear obviously about people that are doing web apps and want to do web app observability, but nearly every case is moving back to native that I know of, or they’re looking across platform solutions like Flutter and React Native. So I think that’s one side. And then on the privacy side of things, I think, you know, I…

I personally applaud the mobile operating systems for caring so much about privacy. I think from an end user perspective, it’s fantastic. I think from an application perspective and an SDK perspective, it ends up getting very complicated. And I don’t have a ton to say in terms of the specifics around the different controls. I’ll just say that in every release, the operating systems tend to get stricter.

right, around what the applications can do, what data they can see. And to your point, I think that does have a lot of implications, especially like not only from an application development perspective, but from an observability perspective, because if you’re trying to understand potentially on a per unique user basis, or you’re trying to whatever, like understand what cellular provider they’re using or other pieces of data, the operating system might actually be

hiding now, it can be a bit more complicated actually to understand some of that breakdown. So I think the privacy bar continues to march upward. And I think from an application development and a observability SDK provider perspective, we just have to keep an eye on that. And we have to keep helping people get the information that they need.

Kate Holterhoff (14:47)
Right. I certainly enjoyed your point that any time you ask your users if they can track your location, it’s a huge pain. I dislike that deeply. And so making those decisions ahead of time of like, is it that important for me to have this telemetry data that I need to bother my users by asking for these permissions? This is a calculation that you need to make ahead of time that maybe we didn’t need to make a year ago. Suddenly iOS is requiring it. I’m not sure if Android does something similar, I’m assuming.

Matt Klein (15:14)
It does, yeah. And one example of where we face that, again, from a mobile observability provider is that one of the feature sets that people like in this space is they like what we call session replay. They like the ability to understand what a user has been doing, and that can involve button presses. But one of the main things that it typically offers is some kind of screenshot support.

Kate Holterhoff (15:15)
Yeah.

Matt Klein (15:40)
And, you know, we, from a company perspective, we tend to take a very privacy forward stance, mostly because that’s something that we personally believe in. But based on what you’re saying, which is true, is that the industry marches forward and the privacy bar gets higher and higher and higher. So, you know, what we decided very early on is that for our session replay, we don’t take screenshots. We actually do a wireframe representation.

which is very privacy-forward. But it is interesting because even with the wireframes, which by definition, because their wireframes don’t contain tags, they don’t contain pixels, they are very no PII-focused (Personally Identifiable Information), but yet we still have customers that ask us for actual pixel screenshots.

And it’s just it’s a great example of there’s always this tension between the privacy bar, making sure that we don’t capture PII, making sure obviously that the companies that we work with can adhere to GDPR and all of the different requirements that are out there. But yet when people are trying to debug these issues, they obviously want all the information. They want all of the HTTP requests and response bodies. They want all of the pixels. So you know.

Trying to work within what the customer wants, what the end user wants from a privacy perspective, what the mobile operating system providers want ends up being very tricky.

Kate Holterhoff (17:02)
I can see that. OK, so let’s shift gears here a little bit and talk about the developer experience. Because I think that the situation for mobile developers is, I don’t know, it’s in a point of transition, at the risk of being a little spicy here, I feel like sometimes mobile development teams are treated as second class citizens when it comes to the back end, front end divide here. ⁓ So yeah, let’s talk about, would you say that it’s a sort

Matt Klein (17:21)
I agree. Yeah. I agree.

Kate Holterhoff (17:26)
cultural problem too, the fact that we are not, that we don’t have the observability tooling resources, attention placed on mobile that we do on the backend.

Matt Klein (17:37)
You’re asking a really interesting question, which to be honest, I don’t know the answer to. I don’t think the industry knows the answer to. ⁓ I can start by telling you what I see is that I think at every company that I’ve worked at, and again, prior to being a vendor, was an end user. I was working, as you said, on a lot of these large internet applications.

Kate Holterhoff (17:46)
Okay.

Matt Klein (18:03)
And every company that I’ve either worked at or I’ve worked with, the mobile engineers and the server engineers rarely talk to each other. And it is perplexing because obviously their work is so intertwined. And it’s not that they never talk. I mean, it’s more that I think that as an industry, as we’ve moved more towards service oriented architectures, microservice architectures,

We’ve moved towards having these strongly typed APIs in a lot of organizations, whether you’re defining your APIs using OpenAPI or Protobuf or something else. You have this API definition. It gets compiled to Swift and Kotlin that your mobile engineers are calling. It gets compiled to Go or Java or something else that your server engineers are working on. And if nothing goes wrong, they each write to the API and they don’t really ever have to talk to each other.

And obviously from an end user customer perspective, that’s suboptimal because we obviously would want the users of our applications to have the best possible experience. And that obviously spans all of the code that runs on the client all the way to the backend. But why is it that like we talked about at the beginning of the show,

that mobile observability is so important, it’s possibly the most important thing to understand in real time what the user is experiencing. Why don’t mobile engineers have access to these tools or why culturally, you know, do they not typically reach for these tools? And I…

I honestly can only come back to the fact that it is incredibly hard to do well. So I think that mobile engineers have lived without these tools for so long, meaning they’ve obviously have had access to crash reporting tools. But if you look again at most modern organizations, this is also what kills me, is that most mature apps, I don’t actually know what the number is on a per company basis, but most companies will tell you that actual crashes are

.1 % of their problems, you know, it’s like 99.9 % of sessions are crash free. So yes, it’s not saying the crashes aren’t important, but there’s so much other potential functionality issues, whether that be just things that are broken or slow API’s or typical observability things that you would take for granted on server. They’re just not available on mobile and you know,

I can only think that it is because of the challenge of building these tools well, number one. And number two, I think that as we have advanced on the server side and provided better tools, the server side observability tools from a budgeting perspective, and I hate to bring in the budget conversation for the fact that we’re having a technical conversation, but that’s just the world that we live in.

is that I think in most organizations, server-side observability is bundled into the infra budget, right? So when people are paying for AWS or their other huge ticket items, again, we can have a whole other conversation about the cost of observability. That’s a whole separate topic. But it’s it’s bundled typically into a very large spend and it’s owned by one budgetary side of things.

And what I see, at least in most organizations, and I’m not saying that it’s right or correct, but I think it is the way that it is at most companies, is that mobile tools tend to roll up into a different budget. They tend to roll up into the mobile org or into the product org or something along those lines, and the budgets tend to be smaller. So I wish I had a better answer for you. I think we’re all searching for that.

answer and I don’t think it’s a great answer, but I think it honestly comes down to just that the tools have historically not been very good or they’re very challenging to build. That’s one. And two, I think that the teams have been separated and the budgets have been separated.

Kate Holterhoff (22:07)
OK, well, I think that is actually a very good answer. And I appreciate your perspective as a vendor, because one of the questions I’ve been trying to answer when I’ve been looking at these front-end observability products is who the user is, right? Because it doesn’t seem to be front-end engineers for front-end observability all the time. And I’m wondering if it’s the same for mobile observability. So yeah, can you talk about who is actually using bitdrift?

Matt Klein (22:28)
Yeah. So I mean, right now we are squarely selling to mobile engineers. And that is, I’m honest with you, that is both good and bad. It is good because we’re obviously trying to solve problems that mobile engineers and product engineers are having. I mean, again, like what we see

Kate Holterhoff (22:37)
Okay.

Matt Klein (22:53)
And from the companies that we work with is like I was saying before, is that crash reporting is a relatively mature thing within this space. But what most mature mobile organizations will OTel you is that, you know, most of their sessions are crash free, yet they’re getting an endless stream of reports from their users. You know, whatever that could be this feature is broken. This thing is slow. this, whatever, this thing doesn’t work or we’re not

converting our payments at a high enough rate, but we don’t actually know where people are clicking. So it’s just like there’s all these things. So we tend to focus on, I like to say the 99.9 % of other issues. And that’s where I think most of these tools fall short. To further answer your question though.

Where I would like to be eventually is that I think the information that we show, meaning the high level success rate, the high level user journey paths, all of those things, I think that these ultimately should appeal to site reliability engineers who are working across the organization, right? Who want to make sure that the entire application is reliable, not just the server side. And again, I think that it’s not that people are not doing it

on purpose, I just think that from an industry we’ve really segmented things where if you look at what SREs do I think most SREs would tell you that they are server focused right it’s like they are not end-to-end focused I personally think that’s wrong I like I personally think that as an industry we have to evolve that but that is where we are now

So where I’d like to be is I’d like to have a tool that obviously spans both server and mobile and that is important to reliability engineers who looking across the stack. I also think that the tool is important for product managers who might want to understand where things are running. think executives might want to understand what the actual end user success rate is. And I think that if you look at the most mature applications, I’m talking like Facebook.

level, like that level of application, they’ve invested a tremendous amount in building one-off tools to allow them to understand all of this from an end user perspective. But 99 % of other smaller organizations have not developed all of this tooling. And I think, again, what we would like to do is not only help people understand all of the things that are not crashes, because there’s tons of those things, but also

really get people the real time analysis, the real time observability that people take for granted on server. But we would love to have on mobile because again, it’s like, wouldn’t you like to know that you just deployed and suddenly you’re having all these problems within your app? I mean, that to me is what we should be striving for.

Kate Holterhoff (25:39)
And I think that’s a good point for us to bring in our second big topic, which is OpenTelemetry. So you wrote another post called Reality Check: OpenTelemetry is not going to solve your observability woes. Again, found it super interesting. OTel is something I’ve been following as well. And they’ve also been very involved in not only front-end observability conversation writ large. Like, how are we going to bring in all the folks working on these applications? Let’s hear a quick synopsis of what your major argument is there.

Matt Klein (26:05)
Sure. Yeah, so the first thing is that I think that OpenTelemetry is a project and a word that is a wrapper for actually a bunch of different sub-projects. And that, I think a lot of people are confused about that. So before giving my answer, I’d like to start by just briefly explaining what is OpenTelemetry. And OpenTelemetry is really composed of three major pieces that are related, but

but they’re also different. The first part is a set of SDKs, meaning language-specific SDKs, that allow you to both easily emit telemetry from your application, so logs, traces, metrics, but also give you out-of-the-box telemetry. So they’re built into common libraries and serving frameworks. They’re going to give you whatever stock metrics and spans for your HTTP requests and all of

those things. So that’s the user-focused side of things. Then there’s another aspect of OpenTelemetry, which is the transport protocol. It’s called OTLP. It’s the Open Telemetry Protocol, And that is a vendor-neutral transport mechanism that allows these SDKs to transport telemetry to backends.

And the idea is that if vendors all implement this underlying protocol, you can switch vendors without having to worry about it. There’s at least theoretically no lock-in. We can come back to that. I think that’s a lie. But at least that’s the idea. And then the third part is a massive effort, which is called the OpenTelemetry Collector.

And this is really, it’s a telemetry observability pipeline system that allows you to accept logs, metrics, traces into this tool and then potentially transform them and send them to a bunch of different backends. So maybe you’re sending it to Splunk or Honeycomb or to some OpenSearch system or something along those lines. So it’s like these are all related, but they’re really very different.

pieces and they have different customers, meaning the OpenTelemetry collector is really for your infra team who’s trying to move things around. The protocols are really for the vendors who might want to interoperate a bit easier. And then the language APIs, these are for the end users. These are for the people that want to instrument their applications. So those are the different pieces of OpenTelemetry.

When I said that OpenTelemetry is not going to solve your observability woes, what I was really referring to is we’ve been doing observability really without too much changes the same for 20 or 30 years. Meaning we emit logs, we send them to some backend where they all get ingested and then we can query them later, same for metrics or same for traces.

And, if you look at the broader industry right now, there’s obviously a lot of talk about how much people are spending on observability. And, what I think most companies will tell you, if they’re willing to admit it, is that, you know, 90, 95 % or more of observability data that’s written is never read. I mean, not by a human, not by a…

machine. that creates a really interesting situation where we obviously have all of these people that are coming in and they’re sending all of this information, and then they’re not getting that value out of it. It’s like they’re spending a lot and then they’re not actually using that data. And what my argument was with OpenTelemetry,

is that OpenTelemetry hasn’t really changed that paradigm. It hasn’t changed the fact that we send all this data to the backend systems. And potentially it’s made it worse because what’s ended up happening is we’re now auto instrumenting all these applications and we’re sending more and more more data. So we’re sending more data that we have to store. We’re not using most of it. And then we’re paying more to do it.

And then the last thing that I would say about OpenTelemetry is that at least theoretically the idea behind the common protocol is that there wouldn’t be lock-in and I’m like saying this as a vendor is that I would argue that the lock-in is still there because most people they’re locked in by the beautiful dashboard They’re locked in by the query engine They’re locked in by all the other stuff and OpenTelemetry doesn’t specify any of those things so it it’s

It’s more that I think what OpenTelemetry does very, very well is it helps people get a better baseline of telemetry. Meaning it helps them auto instrument their applications and get them to a better base state, but it makes almost everything else worse. Meaning they send more data by default. I think they’re largely still locked into whatever vendor solution they’re using. So I think it’s a great.

step forward in terms of bringing up the baseline bar of how we monitor systems. But all the other cost problems, the lock-in problems, all of those things, I don’t think it actually improves anything.

Kate Holterhoff (31:21)
So I’m interested in your relationship with OpenTelemetry as a project of the CNCFs. So you have deep ties to the CNCF, you were a member. Were you on the board for OpenTelemetry at all?

Matt Klein (31:34)
Um, so I, I, I was on, I was, was I on the board? Yes. I, I actually, no, I actually think I, yes, I was briefly. Um, I was on the technical oversight committee and I think I was on the board for a year.

Kate Holterhoff (31:39)
Okay.

Matt Klein (31:47)
I should know this. actually don’t remember. Um, yes, I was, I was there, um, during the migration from like OpenCensus to OpenTracing to OpenTelemetry. And I don’t, I don’t want to make it seem like I’m a

Kate Holterhoff (31:49)
That’s okay.

Matt Klein (32:02)
opposed to OpenTelemetry, I’m not. I think it is an important project and I think it has actually made it easier for people to instrument their applications. I think there’s lots of benefits of the project. I just think that there are vendors in the space that would like

us to believe that by adopting OpenTelemetry, it’s going to make things magically better in terms of getting more value out of our telemetry or getting more value out of our observability spend. And my main argument is that I don’t think it fixes those things. I think that, and again, I’m obviously biased, but I think that we have to take a fundamentally different approach to how we think about observability. And a lot of that is

in my opinion, is making how we send telemetry a lot more dynamic, meaning making it so that we have the knobs to turn things off and on. The end goal being A, that we’re using a lot more of the telemetry that we output, so we’re getting much better return on investment. But the byproduct of that is by making things more dynamic and getting

better return on investment is that now when I want to, I can turn the dial to the max. Like I can get, a thousand times the data when I actually need it. Whereas like a lot of times people they’re being very frugal about what they send. They’re saying, you know, I don’t want to send all this new thing for all my applications. It’s going to cost me so much money. So let me think about, do I need this log line? Do I need this metric?

It makes it very difficult for people to get the right value. don’t think the incentives are aligned properly. And I think that we’re training people to be wary of sending the data that they might need because it will cost too much. And then they end up not having the data that they need to actually solve the problems.

Kate Holterhoff (33:54)
I see. I thought your assessment was fair. And you even title it as OpenTelemetry. It’s more than one thing. There’s three components to it. And that, it’s not a band-aid for observability. There’s a little bit more involved. So it made sense to me. So I would say that it did feel like you were panning OTel. what I was interested in is you really emphasize the idea that standards are important.

Matt Klein (34:07)
Sure. Yeah.

Kate Holterhoff (34:18)
And we should take time making sure that we don’t just have this bloat of everyone having proprietary protocols and SDKs and ways of doing observability, that it does make sense for us to come together as an industry and make some decisions about how we’re going to communicate. so to me, it seems like OTel is the beginnings of a path to that, not the end. then at the end of the article, you actually give some suggestions about how

to go about that. So would you say that’s fair that the one thing that we can salvage, if nothing else, is that standards are good and we might pursue those?

Matt Klein (34:53)
Yes, of course. you know, before people listen to this and say, well, you know, Matt said that, bitdrift doesn’t do any standards, you know, I will say that I’m of two minds, which is that, yes, standards are important. They definitely move things forward. At the same time, I think that

we should be willing to experiment with things outside the standard in order to move the industry forward faster. So I did write about this in the post, which is that the reason that bitdrift itself doesn’t use OTLP today is that OTLP doesn’t include any of the dynamic mechanisms that bitdrift brings to market. So by definition, bitdrift cannot use OTLP. We had to develop our own protocol.

That’s not to say that I would not ultimately like to figure out how to bring some of this dynamic observability mechanisms back into OpenTelemetry because I think that would improve the overall state of the industry. So I think what you said is correct. I absolutely agree. I just want to be honest before people jump on me, which is that I do think that at times it makes sense to experiment and see where we can get.

Kate Holterhoff (36:07)
You invented observability 3.0. You love it when people jump on you. You thrive in these situations.

Matt Klein (36:10)
Sure, sure, of course. Yeah, yeah, that’s true. Yeah, that’s

true. Yeah.

Kate Holterhoff (36:16)
OK. All right. So when I have spoken with folks, especially folks at the top of the stack who would theoretically be buyers for mobile observability, it seems like there are not a lot of folks in the OTel community who actually work in the front-end domain that are mobile developers. Do you see that as well? Do you think that maybe that could be a path towards making OTel something that bitdrift would be able to use as a protocol.

Matt Klein (36:39)
Yeah, sorry. So like I said, just to come back, there’s two different pieces, at least of what we’re talking about now. There’s the SDKs, meaning there’s SDKs that are running on the client. I definitely give, there’s a company called Embrace, they’re a mobile observability company. I give them credit. They have done a bunch of work trying to improve the OpenTelemetry libraries that are running on mobile.

We as a company, we absolutely support those libraries. We will integrate with those libraries. We’ll take the telemetry that’s generated by the client libraries, and we will ingest that into bitdrift. That’s different than the transport protocol, the one where we want to do this dynamic turning things off and on. It would be a huge lift right now.

for us to use the OTLP protocol just because it doesn’t support any of the types of things that we’re doing. So I think what you’re saying is correct. I think that there’s a lot more work done on the auto instrumentation side on server. Again, it’s not super surprising because there’s a lot more companies, there’s a lot more money there right now. I do think that some people are starting to dabble on making the client side libraries better.

We as a company, medium, long term, we will absolutely contribute to that. Like absolutely would love to see that better. I think the part that is trickier is could we come up with an industry standard way of doing some of the dynamic telemetry production that bitdrift does that would require some pretty fundamental changes to OTLP. So that’s where I’m less sure.

in terms of collaborating with folks who are working on the Swift and the Kotlin libraries that people are embedding their applications, absolutely yes.

Kate Holterhoff (38:25)
Okay, all right. I think that’s probably a good point for us to wrap up here. So before we go, how can folks hear more from you? What’s your favorite social channels, your blog? Talk to us about how folks can stay up to date.

Matt Klein (38:38)
These days mostly LinkedIn and Bluesky and I do have a website, mattklein123.dev where I occasionally post blogs. So would love to chat with any and all.

Kate Holterhoff (38:49)
Fantastic. Sounds great. All right, I’ve really enjoyed speaking with you, Matt. My name is Kate Holterhoff, senior analyst at RedMonk. If you enjoyed this conversation, please like, subscribe, and review the MonkCast on your podcast platform of choice. If you are watching us on RedMonk’s YouTube channel, please like, subscribe, and engage with us in the comments.

A RedMonk Conversation: Matt Klein on why Mobile Observability Lags Behind Server Observability

Links

Transcript

No Comments

Leave a Reply Cancel reply