In this RedMonk Conversation, Bob Quillin, CEO and co-founder of ControlTheory, chats with Rachel Stephens, Research Director at RedMonk. They discuss ControlTheory’s launch and the observability landscape including the importance of feedback loops, and how ControlTheory aims to democratize instrumentation through OpenTelemetry. Bob shares insights on integrating with existing observability tools and the target audience for their solutions, emphasizing the need for more intelligent data management in the cloud era.

Transcript

Rachel Stephens (00:12)
Hi everyone, welcome to RedMonk Conversations. I am Rachel Stephens with RedMonk, and with me today I have Bob Quillin. Bob, you are off on a new adventure now. I’m so excited to hear about it, so why don’t you please introduce yourself and tell us all what you are up to.

Bob Quillin (00:26)
Great. Good to be here, Rachel. Great to see you. Yeah. So Bob Quillin CEO, co-founder at ControlTheory. And yeah, we just launched the company a few weeks ago at KubeCon in London. Something we’ve been working on for a little over a year now. So it’s a little work in progress. This came out of stealth. And so it’s kind of, it’s an exciting time. It’s the four co-founders that did StackEngine and StackEngine, you know, came out of the Kubernetes and container world.

the CNCF ecosystem back in the 2014, 2013 or so. yeah, then we’ve seen kind of a similar track potentially maybe happening with OpenTelemetry. So you can talk all about that and kind of where that’s going. But we’re excited to really look at helping people regain control of their observability, which is kind of the ControlTheory theme and focusing on some pain points today, but also looking at where the technology is going. So, fun to get. the actual original startup founders back together, the band back together, so to speak, makes it kind of fun.

Rachel Stephens (01:30)
Yeah, always fun to get the band back together. So four co-founders, anyone else? is it just the four?

Bob Quillin (01:35)
It’s the four of us now and we have a few engineers. So it’s still less than 10 people, still kind of nice, nice and compact. We announced 5 million in funding from Silverton. So we can kind of start growing a little bit. Silverton’s an Austin based venture firm and they funded, you know, actually I’ve been here in Austin almost, almost 13, 14 years, but Hyper9, CopperEgg, StackEngine, and now Onto ControlTheory. So it’s the fourth startup coming through in Austin.

and we work with them a long time and they’re really good partners. And this is also part of the getting the whole band back together and kind of get our ecosystem plugged in and some people moved on, but there’s a lot of folks here that are looking to do what’s next and coming out of COVID and onto the whole AI era, a lot of energy, a lot of fun, a lot of potential, we think so.

Rachel Stephens (02:24)
it’s fun. when people believe in either a problem space or the team or kind of ideally both of them enough that they have decided to start a company, I love to hear that origin story. So tell me, kind of mentioned about the StackEngine team. Right. Tell me about the problem space. Like, there are a lot of observability companies out there. So what is what is the problem space that ControlTheory has seen? Why are you tackling it? Tell me what’s going on.

Bob Quillin (02:48)
Yeah. So we, you know, we were looking at a variety of different kinds of problems as we started to think about getting our team back together. We’ve all been in the observability space for a while. We, we had worked together at CopperEgg back in 2012 and we had a little startup that we competed with called Datadog who’s now a big dog. We were acquired kind of early in that process and watched them grow and be, know, wildly successful and great company.

And as we watched observability for the last few years, it’s something we know and love and coming out of the, we’ve done a lot of work on APM and monitoring and logging and up and down the stack. We really thought that it hadn’t really evolved sufficiently over the last few years. It kind of had flattened out from a technology innovation perspective, a lot of stagnation. And we started talking to customers and people we know in that broader circle, almost over a hundred customers over the last couple of years.

and really found not just the technology had kind of stagnated, but also the costs and other pain points had actually rocketed up because it actually surprised us how much people were paying for observability in six figures, seven figure, easily eating 20, 30, 40 % of a cloud bill. And it’s a big chunk of their spend now. So we said, this is a problem that’s super interesting. We know

the technology kind of needs to change. You kind of looked right now, there’s a lot of, know, fat dumb pipes that go into a big data lake. You pay for ingestion of all that data and people want to use, they want you to take all the data in, ingest it index it and then to retain it. And you pay for all those three elements. And we thought there’s just gotta be a better way to do this, both technologically. We came up with this idea of, you know, more intelligence, more feedback loops in the system.

to actually help be more intelligent about what gets sent where. And that was kind of some initial idea and that kind of got the ControlTheory idea going because ControlTheory is about both observability and controllability. It’s actually a part of the science and math of ControlTheory. Dusted off our old college books in mathematics and looked at in ControlTheory and thought about it a little more deeply. So that’s where the name of the company came from. And as we also looked at kind of where the

technology and the evolution was going, we saw OpenTelemetry begin to emerge over the last few years and begin to mature to the point that it really reached a nice critical mass. We heard over and over that people were either experimenting early on with OpenTelemetry. And then as we went to trade shows and talked to bigger companies, many of them actually had adopted it really full scale. So there’s a wide spectrum of maturity and they’re seeing lot of good success.

kind of just the first level of OpenTelemetry, which is collection instrumentation. So we saw that being like a foundation for maybe starting to democratize and unsettle as a of a building block of what’s coming next and kind of disruption, a disruption element, if you will. And we thought that first disruption domino to fall was OpenTelemetry coming in.

democratizing how people collect, how the instrument. And from there, we started looking at how do you then potentially build a control layer on top of that. Kind of similar if think about the Kubernetes world, building a control layer and orchestrating on top of containers. And we had been through that process in StackEngine. And it kind of reminded us of also what we went through with StackEngine. actually were acquired by Oracle Cloud. We built out their managed Kubernetes service.

And that’s all about data planes and control planes and having a very scalable architecture and a set of patterns that are very scalable and repeatable. So we said, let’s look at a control plane on top of OpenTelemetry and apply that technology as it’s coming into fruition now to start solving some of these problems around cost control and operational control and really helping people control what they have now. And then hopefully move that into the future and solve some of these newer problems going forward. So hopefully that’s a long origin story for a very short question.

Rachel Stephens (06:56)
Tell me more about the feedback loop. like where in the process is the feedback loop? it like coming at the ingestion point after the ingestion point? Where are you trying to apply that?

Bob Quillin (07:05)
Yeah, you know, from a strategic perspective, we look across the whole supply chain for observability from collection, you know, where it comes out of source code into transit and telemetry and all the way out into analysis. And we said, you know, actually really early on, we started looking at, can we go on the developer side and just like identify what needs to be instrumented and be smarter about the instrumentation, like go to the source. But as we looked at the market, the pain point and the need was really right here in the middle.

where we could actually start solving a problem originally and then start working out to the two ends, which is start working with the source code and working out to the analysis side. So one of the problems with telemetry pipelines, which are in the market right now, they’ve been out for a couple of years now. One, they’re very proprietary. So they’re using their own proprietary systems. They didn’t really have the advantage of where OpenTelemetry is now and going forward. So we’ve leveraged OpenTelemetry so that opens up the collection and the control process.

and allows us to keep the data layer in the customer’s environment and we’re just the control layer on top of it. So we’re using the upstream standard contributed collector. We’re not actually changing that at all. So it’s very much a standards approach. And then we can signal in and start having intelligence of how to set pipelines, how to set filters, understanding what the thresholds are, to maybe open up telemetry and when to turn it back down. And that adaptive element, that feedback loop

can also then have feedback from other systems. Maybe it’s a root cause analysis system that wants more data. A lot of AI systems are very data hungry now, and they’re looking for more data that’s curated in a very specific way. So instead of going out and creating your own data, which a lot of these systems have to do, we can actually deliver that data to them on demand. And they get feedback and loop into us when they need more data or when they need less data. Likewise, you can feed that back into the source code side too.

Really the most interesting part right now, that’s kind of a strategic view is helping people just, how do you take OpenTelemetry and make it more adaptive and easier to use and setting the right filters or right way to de-duplicate, right? Picking up the, you know, what kind of logs are coming in? Are they debug logs and where’s the spike coming from and having the intelligence and controls that adapt on top of that. It’s like auto-tuning your OpenTelemetry system. And that auto-tuning concept gets a lot of people excited because they,

They have OpenTelemetry and just using it as just an agent or a collector, but they’re not really taking advantage of the control layer just yet. And that’s, I think, the next layer of disruption where now we’ve got instrumentation and collection democratized. So let’s get control of distribution and transformation. And now we’re starting to build a stack that can work us all the way up to solving root cause analysis and doing more diagnostics and doing some of the higher level issues, better working our way down. working our way up the stack in that particular layer. So does that make sense?

Rachel Stephens (09:56)
It be optimistic to say that instrumentation is democratized.

Bob Quillin (10:04)
Well, let’s just say that it’s the potential to be democratized.

Rachel Stephens (10:10)
There, that I would agree with.

Bob Quillin (10:12)
Okay, yeah. I don’t want to push too far ahead, but what’s nice about OpenTelemetry is that there’s so many sources that have been built in already that you could use anything. So we can go in and actually go into any environment. It could be your syslogs, it can be your Datadog, it could be Grafana, Prometheus, whatever you’re using already. And there’s a bunch of sources that have already been defined. So we don’t have to build those. We get to leverage that community’s work and that they can…

allows us to go into any environment. So you don’t have to really be using OpenTelemetry to take advantage of this architecture just yet. But it’s a way to begin to bring it in and use it on a kind of more aggressive and more kind of inspirational level too. And we see a lot of folks starting to use it on more greenfield environments too, a lot of new AI systems, et cetera, if I’m building out a new stack.

a of the Kubernetes systems are using the nice OpenTelemetry collector in there, because it’s actually very nicely built and instrumented already. And all the vendors are supporting it on the analysis side and the observability side. But yeah, there’s a lot of work to do there. But I think people do understand that they want to try to break free from vendor lock-in and cost issues. The first place to start is start using OpenTelemetry, start finding ways to break out of

If I’m using just the Grafana or a Datadog collector, I’m kind of locked into those guys. And they know that as kind of the idea. That makes it very simple, very easy, and that’s OK. But if you want to able to break out of that, you can start using OpenTelemetry Collector side by side and begin to find ways to maybe have more flexibility in the future too. And we’re seeing that in a variety of different kinds of use cases that are pretty interesting.

Rachel Stephens (11:57)
Yeah, I love that as a side by side and you kind of trade off the simplicity for the flexibility, get yourself migrated and then you can start to see where you go from there.

Bob Quillin (12:07)
Yeah. And the migration word is pretty critical there. You point that out. There’s so many migrations and transformations and consolidations that are going on right now. We see people consolidating to one vendor, or maybe they’re all going into a Datadog getting rid of a Splunk. Maybe they’re actually taking traces, which is a really exciting part that people are trying to find ways to use better for application performance management, should be tracing. Maybe they’re bringing in Honeycomb on the side to do tracing.

because maybe it costs too much to do that through Datadog or maybe the functionality isn’t where they want it. But having that flexibility within your control plane to be able to know where the data should go and be able to be smarter about distributing it out, you have that control to bring it in, consolidate it or redistribute it to where you want to put it. So it puts you back in control and the more you can actually get that instrumentation set up, get the control plane set up, now you’re in a better position to

not have all that intelligence stuck in your analysis system, your big observability tools, it can now redistribute back into the edge closer to the code itself. And eventually, hopefully it gets back into the code where the problem starts in the first place.

Rachel Stephens (13:16)
Gotcha. And as you’re kind of coming into market, are you foreseeing your tools sitting alongside all of these observability vendors and helping manage all of them or is it coming in with a replacement?

Bob Quillin (13:28)
Yeah, that’s been our approach, should say. We think we can actually make it observably better. So you don’t have to get rid of your existing tool. That’s the key point. Thanks for pointing that out. And we thought that philosophy coming in, we don’t want to be another dashboard. In fact, our vision is maybe three to five to seven years from now, there’ll be no more dashboards. The hope is like, we don’t have to have any dashboards. The answer is there, and you can have conversations, and the world changes.

But between now and then there’s a lot of work to do. And that’s kind of part of what we’re investing in the infrastructure and the stack to do that. but yeah, for now use what you have. You have the flexibility to bring in new tools and redistribute. You a lot of folks are looking at, I’ve got all these logs going up into my observability vendor. Maybe I just take my, the logs I want to keep in store and put them in AWS S3 or some other data lake, like a ClickHouse.

and maybe just only send my most important logs up to Datadog, for example. I could put them into cold stores and then rehydrate when I want to. I want to be able to find a spike when something happens, understand where it came from. And that’s another key point. have this idea of metametrics that became pretty clear as we’ve talked to customers about, hey, let’s fix your observability. They’re like, well, I don’t know what the problem is. I just know my bill is too high.

We have to go in and kind of help them through that process. And this discovery process of where it’s coming from, what’s the origination, what’s the attribution, how’s it split up, where are the logs coming from, why are the traces so high? Maybe your custom metrics have a lot of cardinality. That’s a big cost factor for Datadog. 30%, 40 % of some of the people’s cost is around the cardinality or the dimensionality of the metrics you’re tracking. And we can actually control that, detect that problem.

and get them inputting in processes and transformations in place to control and manage that. So you only send what you want when you need it. So lots of really cool little use cases in there, but it’s all about more intelligence at that layer, moving that out of the centralized element into a distributed closer to the edge, more intelligent, more adaptive, and the feedback loops to support that. And it solves problems that are, a customer said to us yesterday, so I’ll give you $5 and you save me 10. that I can pitch that to management today, because right now we’re looking to save money. yeah, that’s a pretty good starting point.

Rachel Stephens (15:51)
And who’s your ideal user here? it, you wanting the developers to kind of be tracing things? Is this an ops team play? Like who’s using it?

Bob Quillin (15:58)
Yeah, it’s definitely a platform engineering, SRE ops team play. We have a great success talking to CTOs and kind of folks, engineering leaders who have, who understand the budget and are the ones that are trying to then implement strategies that fix that problem. So working our way kind of into that, the platform engineers and SREs who then actually have to implement it. They’re usually the team that has to, that gets whipsawed back and forth. Okay, shut down the logs.

too high this month, turn it back up. got an event, we got a problem, an incident we got to solve. And that goes back and forth all the time. It’s kind of a ping pong event. But folks who have a Datadog, Grafana, a larger observability solution, hundreds of engineers, usually six and seven figure observability bills, actually are starting to feel that pain. And that could be even mid to large sized startups that we’re talking to here, and friends in Austin were like,

how’s your Datadog bill? They go Oh my God, it wasn’t bad last year, but it just shot up. Now it’s five figures and it’s moving to six figures very easily and soon it’s gonna be seven. So it really escalates pretty quickly. and we talked to lot of folks who actually, you know, are starting to use OTel, starting to use OpenTelemetry and folks who are like further along in the maturity curve who are now looking to say, I’ve got thousands of OpenTelemetry collectors, but I wanna now begin to find ways to…

manage that fleet, get the pipeline set up, do the configurations, adapt that as we go around those feedback loops as we talked about, and start using all the intelligence that OpenTelemetry is designed in versus just using it as an agent, which is kind of the starting point. Good place to start, you democratize your collection. Now let’s get smarter about how we actually use that and distribute and filter and transform and do all the cool things that are possible to make observability better and more efficient. And then hopefully solve problems, get rid of the the noise and raise the signal and get to the issue faster.

Rachel Stephens (17:44)
Well, this all sounds great. So if somebody out there is excited and wants to check you out where should they go?

Bob Quillin (17:49)
Yep, so controltheory.com, tons of blogs. We have some things that are very high level, but things that go straight into, this is how you do a detailed sampling with your traces. This is how you hook up into a Datadog agent. And so there’s some good hands-on elements there for all the platform engineers, but also some higher level strategies of how to work through a OpenTelemetry deployment. but yeah, controltheory.com and reach out. We’re looking forward to talking to more folks and We’re out there and building up our customers and excited to get involved with the whole ecosystem.

Rachel Stephens (18:21)
Well, Bob, thank you so much for your time. It was great chatting with you.

Bob Quillin (18:24)
Yeah, thanks Rachel. Good to see you.

Rachel Stephens (18:26)
And if you enjoyed this conversation, please like and

A RedMonk Conversation: Bob Quillin talks Observability, OpenTelemetry & ControlTheory

Links

Transcript

No Comments

Leave a Reply Cancel reply