“Data Is What I Can Control”: Why Your AI Strategy Starts at the Data Layer with Boris Bialek

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Get more video from Redmonk, Subscribe!

In this RedMonk Conversation, Rachel Stephens sits down with Boris Bialek, VP of Industries and Global Field CTO at MongoDB, to explore why data remains the unshakable foundation beneath the fast-moving world of AI. Boris paints a vivid picture of an industry where every sector—from banking to automotive to insurance—is racing to adopt AI simultaneously, often betting on the “whole racetrack” at once. The conversation traces the rapid evolution from chatbots to RAG architectures to MCP servers and agentic systems, all within roughly 18 months, and asks: what should enterprises actually anchor to when the layers above keep shifting? Boris makes the case that while LLMs, frameworks, and protocols will come and go, your data is the one asset you truly own and control. Along the way, they dig into real-world use cases like predictive maintenance on factory floors and intelligent airline customer service, the role of real-time vectorization and embedded models (including MongoDB’s acquisition of Voyage AI), and the critical importance of encryption and governance when machines start talking to machines. The takeaway: building a strong, unified data platform isn’t just a database decision — it’s the strategic foundation for enterprise AI transformation.

This RedMonk conversation is sponsored by MongoDB

Links

Transcript

Rachel Stephens (00:04)
Hello everyone and welcome to RedMonk Conversations. I’m Rachel Stephens. I am the research director with RedMonk. And with me today, I am very excited to introduce you to Boris Bialek. Boris is the VP of Industries and a global field CTO for MongoDB. Boris, can you give us a quick introduction to who you are and what you do with Mongo?

Boris Bialek (00:24)
Thanks first Rachel for having me here. It’s really nice to be on your podcast.

And I’ve been with MongoDB for seven years and I’m responsible for industries, which sounds very exciting, which is everything from financial services, insurance to connected cars. And my team is working on use cases, integrating the bleeding and leading and whatever edge into MongoDB and into the solutions of our clients. So we’re all use case people. We’re not really data people, but we’re working a lot with data, obviously. So it’s an exciting job to have.

Rachel Stephens (00:57)
sounds exciting and I’m sure you’ve seen a lot of exciting use cases in the last couple of years as AI has come onto the scene. So I’m excited to dive into that with you and hear more. So our conversation today is talking about kind of what are we as a technology industry doing in an era where this technology is changing and emerging so quickly. It feels nascent at the same time. It also feels inevitable.

things are becoming increasingly not optional, but it’s hard to understand where to make investments. And so we’re here to kind of help people untangle that and figure out where they’re going. And before I go, like, when I say nascent, I think one of the things that people can think when you say bleeding edge or you say nascent is that can feel untested, unproven, unsure what it is that it’s going to be. I think.

One of the things about AI that we are seeing is nascent doesn’t necessarily mean wait on the sidelines in this case, because the market is moving and this AI narrative is building momentum. It is building in terms of skill sets that people need to be developing. And even immature technologies or newer technologies can become ones that are kind of mandatory parts of our discourse and they can influence how we’re expected to compete, behave and build. And so that’s kind of what we’re seeing from the outside view.

But what are you seeing as someone who’s working in all of these industries and with people who are trying to build in these use cases? How are people making smart investments in an environment where the technology is just changing and emerging so rapidly?

Boris Bialek (02:29)
Yeah, I think the most amazing thing is, I’m in this industry here with software for the last 35 years and I’ve never seen every single area affected at the same time. You normally see certain industries moving faster, retail is very aggressive and honestly the regulated industries are normally laggards behind. This is everybody’s involved and engaged. It’s from the biggest banks to automotive to insurance and everything in between. And this is what

I personally find so much amazing and the way how things emerge is as you point out, you cannot afford to wait on the other side, you don’t want to sit on the wrong horse. So people try to bet literally on the whole racetrack and try to figure out how to do that. And that leads to some very funny moments and discussions on the solutioning side, on the use case side, where we came out of this very early experimental stage. I compare this normally to the early days of the internet when people installed the

Rachel Stephens (03:09)
Yes.

Boris Bialek (03:28)
the

Apache HTTP server. And it’s the same thing what you see right now, people install single modules. And I dealt with one client who had 14 different vendors in one single very, very narrow use key solution. And while there was maybe good reason to do that one, I asked him, how do you want to ever productize this? And he looked at me really like, well, we haven’t.

got to that stage yet. And on the other side, and I showed him a competitor who just went live with pretty much the same use case. And the dude was kind of scared because people try to experiment, but the move from experimentation and single use case to an enterprise wide stable movement in AI is a big jump. And I compare this normally when people say we have an AI strategy, we have a chatbot.

And then I said, dude, you may need to transform your company a little bit more than just introducing a chatbot. And that’s normally the starting point of the discussions. And that change is, as you point out, people try to bet on a horse and they try to buy the whole racetrack.

Rachel Stephens (04:38)
great metaphor, I think, for how people are feeling right now, because it does, sometimes it does feel like a gamble and we’re not sure where we’re supposed to be making our investments. We’re not sure what it is that is actually going to be the technology that is the technology that’s foundational. I think one of the things that we have seen though, is that data is one of those strong foundations that people need to both invest in, build upon and making sure that AI or not, like a data foundation is important. So let’s kind of talk more about

data and how people are building this all out.

Boris Bialek (05:13)
People start to really

realize data are really theirs. They can talk about which LLM is nicer, which color do they prefer, but at the end they start to realize data is what they own. Data is what their control point is. And it’s for lot of clients important to still have some control. It’s a normal human, I think, or normal business kind of behavior. Do you want to have the feeling that you’re somewhere in charge of your own destiny?

realize data is where destiny is. And on the other side, data is as well as how you can shape your destiny by enriching data and bringing more data together what we call systems of action where you bring live systems together and have these live interactions of the consumer which could be an internal user or an external client in any form but you want to have these real-time capabilities and if you want to have real-time capabilities you need to get somewhere control of your data.

one spot. It’s not anymore good enough to have 10 data sources and say, well, somehow I get to them. Said, well, how long does this take? About 25 hours and then the last eti. And then he said, well, 25 hours sounds really interesting for a client who’s giving you 50 milliseconds.

That’s normally start point for really funky discussions. But that’s where data and ownership, compliance, security, encryption, all these classical things are still not gone away. And we still see those. And that is what I see every day when you see in automotive space, nobody wants to get their car started by somebody else and themselves.

Rachel Stephens (06:52)
Yeah, that’s fair. I would

One of the things that you touch on here is the broad shape of data. So it can be moving fast. It can be moving in batches. It can have a lot of different modalities. It can live on the edge or in core. It can go in a lot of different ways. And so trying to figure out a platform strategy across all of that can be really important. And I think.

It also kind of ties into what capabilities your platform needs to have. So when I was watching AI, so after LLMs emerged on the scene, we saw vector databases as a standalone category explode. And then almost immediately, we also saw the database world start to incorporate vector capabilities into the existing databases.

And there’s still a lot of people who might need a standalone vector database, but in a lot of cases, we found that users can just use a unified approach and that will meet their needs. so, and that’s not unique to vectors. can have a whole side conversation around general purpose databases. But I guess what are you seeing in terms of market demand for

platforms and unified ability to access data versus kind of specific capabilities and has that changed or become specific in the world of AI?

Boris Bialek (08:09)
and with AI becomes more.

more explicit I think Before it was more nuanced, I had discussions with people on graph pieces, on geospatial data, and they were always, wow, you have geospatial in your data, that’s pretty cool. And it started actually out originally with a tech search and the Lucene engines where we started to build upon. And people start to realize, I actually want that integration, I want the speed, I want the automatism out of that one, but I don’t want to lose the capabilities of a good product.

And that’s the same thing what we see with vector databases and vectorization. We have these strong vector capabilities inside the MongoDB platform, but you are not limited to me to I stash something inside of some blob and call it I store a vector. Because this is what a lot of people do. The advantage of the original vector databases was they actually work with vectors and that’s so do we with MongoDB in our vector capabilities. and the

vectorization.

As you probably know, it’s exactly one year ago that we acquired Voyage AI becoming part of the MongoDB family and this is the second part of the vectorization. It doesn’t help you to have a vector if you can’t embed models in real time.

If I talk right now to Rachel and Rachel would like to know right now something, it doesn’t help me to vectorize this next week and I do an update and I ETL the data out. Now it’s about what is Rachel’s desire right now? What does she want? What is the interaction with Rachel? And to do this one as an agentic system, as an agent, let’s assume I’m actually not a human person. I am, trust me. But if I assume I would be an agentic system, then you start exactly to get into that point.

those real-time capabilities, you need to reflect the sentences you gave me upfront and I need to be able to structure out of this one some magic answers out and yes I am a human being just to reconfirm and when we want to talk about that then the agentic system needs to memorize what is the answer back, what is the

nuances of this discussion and what is the outcome. And if I want to do this in real time, I need this in real time in-memory capabilities, the decision, the act model. And that is where an embedded vector system is kind of absolutely prerequisite. And as I said, there many data types. So I’ve seen a lot of systems which have six or seven different data types.

And from data types, mean really from graph, geospatial, vectorization, text search, the list is really, really long. And this is right now what honestly makes it fun to work at MongoDB. So, yes, so why do you work still?

In my age, when you see that, is just fascinating to bring all these different things together and bring them to life and build out of that one a data foundation where a client can decide and tomorrow I switch my ALM and two weeks later I switch my framework. But the data are mine in my format, in my understanding, in my control, with my security context and my encryption.

Rachel Stephens (11:22)
See, and you already started to answer my next question, but one of the things that I hear a lot in my role and sorry, at RedMonk, we work with vendors up and down the stack and every single vendor in this space. So just like how you talked about how all across all of the industries, people are moving, everyone across the vendor space is also moving in this way and AI is getting incorporated everywhere. So like from the Silicon all the way up to the top of the application.

Boris Bialek (11:50)
Yep.

Rachel Stephens (11:52)
Every place is having AI touch it. And I think one of the things that’s really intimidating as a buyer of this technology is trying to figure out which part of the stack is the part that is like the platform that is truly matters for having AI integrated. And so from your perspective,

what makes the uniquely positioned in this market?

Boris Bialek (12:18)
The data layer is really the only control strong point you have to have ownership on your infrastructure and ownership over the outcomes. Your data layer allows you to monitor and to track what answers coming back, how are they coming back. You can cache as well answers. This is a big thing right now that people don’t want to have rote answers going to an LLM. Why should I get for the question? My flight is late and what is my option? For that one is a rote request and a rote response.

ones.

Typical chatbot behavior, I can go to an LLM but LLM still takes me so much time. If I have the information stored, I can go directly out. But that’s my data. I know what the answer is. I know how I would like to reply to these rote requests. And probably a large airline has hundreds of thousands of exactly that question in multiple forms and notations. But at the end, it’s all the same question. And that’s the same thing. Everything comes down to my data. I need to understand.

Boris is stuck in Atlanta, there’s a snowstorm, I can’t get him out to Chicago, there’s a snowstorm too and I need to do something and Boris is a frequent flyer, so what are my options? And at that time you need to have your data, your decision points at your fingertips to then start to think about, let’s go back to the transaction system and look for the ticket number. With a ticket number I can identify Boris’s flyer status and with that flyer status I

may want to make a decision. This is not possible. I will be very upset. I’m sitting in a snowstorm. So at that point, I would like to have an intelligent answer. Boris, I feel really bad that you sit in a snowstorm. Alone, that sentence shows an emotion from a chat bot and an action that the chat bot understands the situation besides, my flight in Atlanta is delayed.

So, in all of these things coming down to one thing, control of your data, understanding of your data and the flexibility to apply the data to different agentic use cases. When you think about it, what did I just do? I had a chatbot involved who talks to me.

I’m talking to the agentic system to try to make a decision over what are the options we can offer. And I probably have a supervising agent over the whole system trying to make sure that the system doesn’t make funny jokes about me being stuck in a snowstorm. So when we see it this whole picture, what is the only stable component in this whole picture? The data.

Rachel Stephens (14:50)
think one of the things that kind of stands out to me in your answer here is when you’re talking about the data and controlling the data, does not sound like you’re talking specifically just the database, more like data pipelines and a full kind of platform. Is that how you’re thinking about it at Mongo?

Boris Bialek (15:06)
Correct, correct. It is really a platform discussion and that’s why I mentioned Voyage AI. Voyage AI gives you the models to integrate. The vector search allows you the vectorization. Our MCP server is the interface into the LLMs. So when you look at all of these things, you suddenly have a data platform in this picture, which allows you to act with agents at ease in a very, very simple environment. And to be honest, the best use case what I see lately is manufacturing

predictive maintenance. Everybody talks about predictive maintenance for the last 30 years. Everybody talks since fuzzy logic and measurements and it goes long way back but by now we can make this very easy. We can build this agentic systems.

a complete workflow engine which is interpreting data, analyzes that maybe the temperature of this piece is not okay, we need to have a replacement, let’s put a work order in, let’s order the spare part, let’s ensure the spare part and the work order arrive at the right spot at the right time. These were all human systems to be honest and the people who did these jobs didn’t like it much. They were literally putting paper together and stapled it and put it in a folder and put it somewhere outbound. And today

a computer can do that. So it gives a lot of benefits in that regards. But as you can see, what is the core of it? It’s this data platform to drive all of these pieces interacting together and drive the results what the systems are expected to do without hallucination, please. Because my data are clean. I know what my data are. My data are not hallucinating. Hallucinating is somewhere else, not my data.

Rachel Stephens (16:48)
I want to dive in more because you talked about your MCP server. But I think one of the things that this market has evolved so quickly in the last 18 months, we’ve talked about like, here comes LLMs and everybody is amazed and impressed by the chatting ability. Then we kind of went into the rag architecture mode. Then the MCP servers came on the scene. Now everyone’s all about agent AI. All of this happened in 18 to 24 months, I would say.

What are we doing here when we can’t necessarily predict what form of AI integration is going to be coming next? And this is like, what are the core elements of a data foundation that you as a customer need to be focusing on so that you’re in a place where no matter what comes next, we’re ready to go.

Boris Bialek (17:34)
I think it’s really fair to take a look when he started out exactly as he said, the first with the chatbots and came the reg, then came the MC. There’s a natural evolution what happened from a single point solution where people build small sampled summary functions and LLMs are great. I love to use it myself to summarize emails to get fast information. That’s great. But this is all a very small, very narrow point solution. What agentic systems and MCP servers and all of the upcoming frameworks

which are sitting on top are about and they are by now read specific protocols and there’s so much cool stuff happening. All of that stuff is about a digital transformation of the enterprise. We’re not talking anymore the point solution, the chatbot, I love chatbots, admit it, I’m a total fan. But outside of the chatbot, when you take a look, I want to transform my company, I want to transform how I interact with my customers, I want to transform my underwriting, I want to transform how my car is driving,

So at that point people start, wow, but yes, that’s what it is about. And when we take a look at those things, agentic systems are just the next way because they come from this outbound in thinking like the human works. You look at a bigger problem, what do I try to solve? Then you break it out into smaller problems and then you solve each smaller problem and drive it together to an answer. And that is what all agentic is about. But…

Again, it’s based upon the sense of self-serving working for MongoDB, but it’s all about the data at the end. It’s all about my decisions and my agents are based upon what I give them to work with and what knowledge, memory, short-term and long-term they’re able to build out. And then I can look into a bigger picture, what do I try to solve? And back to the predictive maintenance case.

you try to solve a factory which has issues with machines. There are 100 machines in a hall. That is, a human takes exactly that view. He looks at the whole room. What is each machine? What’s each parameters? And the same way an agentic system will emerge. But what is the fundament for this one?

These 100 machines deliver data. These data need to be interpreted. I need to drive them to an LLM to make statements that a human understands it again and says, by the way, machine number 12 looks running really hot versus machine number 10. That is an interaction with an LLM. All of this is based upon the data I’m collecting in the factory floor.

So everything comes down to data and this is a boring part a little bit that database people again say data are the center of the world. They love to say that in ERP days, if you look up 30 years ago, ERP vendor says that data is gold, then data became the new oil. But we are back to the same situation that the data is what I can control and the data is what I make my decisions upon. Everything above, to be honest, will emerge.

I’m not thinking that we’re at the end of the agentic platforms yet.

Rachel Stephens (20:40)
I think that’s fair. And as a former of those statements just resonate with me. I can’t tell you how many times in my enterprise life I embarked on somebody who was doing a single source of truth endeavor to get all of the data aligned and get everybody into the right place. And it’s a hard process and it’s a cultural process for a lot of teams to get that all aligned.

Boris Bialek (21:04)
A bank just

told me last week, whenever we run a project and we have 50 databases today, after the single source of truth, we have 51.

And that sounds so mean and so disgraceful, but they didn’t mean it as such. What they tried to say was that they’re looking for a way how to get the data activated. Because it’s not that they need only a single source of truth. That’s kind of a statement of direction. What they want is data in context connected. And this is where the document model becomes important with MongoDB. We’re able to enrich and enhance data in context. And when you say

Context, ha, MCP, Context Protocol. That’s where it comes directly and that’s where the linkage is. You bring data of multiple sources together into MongoDB and with the context we build inside of each data set.

about Boris and the airplanes or Boris on the manufacturing floor. I can send this as a single data set right into my LLM to get interpretation out based on my data and I can look at the result what I can back and can validate this one against what I actually see if it does make sense. So this way I can avoid even hallucinations. And then what did I not do?

I need embeddings, I need to build vectors, I need to have the comparison. Why is machine number 10 better than machine number 12? What are the problems? Oh, by the way, the filter, air filter is broken and by the way, should I order the spare part automatically for you? Click yes. Okay, the work order is out, the part when available, part will be replaced. That is what you want as a result. All driven by your data on your shop floor.

Rachel Stephens (22:49)
Absolutely. So I think one of the things that people have been struggling with over the, I guess, for all of time, but especially in the AI era, is what is compliance, governance, security, guardrails, all of those things are so important when you are talking about your own data. How does that fit in to how people are building these systems in the AI world?

Boris Bialek (23:15)
The biggest part is your agentic systems and your agents act and they will act with your data. So you need to ensure that whatever agent has access to data gets only the data they should see. And for that one with MongoDB we have our curable encryption which allows you that only an agent can see the data they are allowed to see in the context they are allowed to see it. And this is one of the cool features what we have in MongoDB which allows you to have

Agent A sees parts of the story, but they don’t see the whole book, because as soon as you have agentic systems involved, you have machines talking to machines. And you want to make sure that there’s not a bad machine, a bad actor machine on the other end of the protocol, which has maybe different meanings than the agent understands, because at the end, it’s machine talk to machine. And the second part of the discussion obviously is when we talk machines talking to machines,

Most systems are designed for users. Specifically, I see this in insurance space, underwriting. Underwriting was a very human activity. Now you need to rethink how you do that because now machines are doing the underwriting and the API work.

It’s not about APIs. It’s about the data, the data format and the understanding, the enrichment of the data in real time, what the system is doing. And so there is a complete rethinking necessarily from a UI driven human approach to a human supervision approach in the system where 90 % of the cases run automatically to 5 %

get a little bit supervision, 10 % off, another 5 % gain, a lot of

So, and these kind of pieces to put together, this is where right now the real art and the enterprise transformation of AI starts. And to bring these things together. And encryption is a key part of this one.

You mentioned guardrails, obviously all kinds of audit trails, who’s touching the data, which agent made the decision based on which input data. You see the word data, data, data, data all the way. That is pretty much where the story goes right now.

Rachel Stephens (25:27)
Yeah. I think that sums up our conversation pretty well. It’s data, data, data in a world that is moving very fast, the data layer.

Boris Bialek (25:34)
So you asked me what are you hearing from other vendors but the data are really it’s maybe not the new snake oil or the new oil but data is really the fundament and whatever frameworks I’m utilizing right now I want to quote a CTO of a very large retailer who after we implemented the solution says so Boris we are done it’s obsolete.

And I was completely shocked. I was so proud what we achieved. And he says, Boris, it’s obsolete because it’s live. And I was like, huh. And it felt really heartbroken. But then the next sentence actually made me think. And he said, but the data was Mongo. That’s my fundament. I can replace whatever other LLM I can replace the frameworks on top, but the data are mine. And I don’t want to change those and redo those because that effort is too expensive.

And that made me think and that’s a very, good line.

Rachel Stephens (26:30)
Yeah, that is, think a great way to think of it is like, what is the foundation that we can build on and what can we abstract away to make it easier for other things that need to move more quickly to change, but we have a solid foundation.

Boris Bialek (26:42)
And the speed of things, I started out with the situation where we have a dozen different components to build a single point solution. One of the things what I start to see is people start to realize they don’t want to have certain components to build one single solution.

They need the data foundation, they need the vectorization, they need the embedders, all of that stuff we deliver out of one hand in a reliable form, in a transparent fashion. We have our vector search, as you know, in preview for on-premises as well, which is specific in Europe becoming very, very important for a lot of people. And when we take a look on top of that, you need still the LLM, you need the MCP, which we deliver as well, and then you’re off to the races and you can transform your company.

Rachel Stephens (27:29)
Well, Boris, thank you so much for your time today. This has been such a fun discussion. If people want to learn more about what Mongo is doing, where should we set them?

Boris Bialek (27:39)
The best part to learn things about is our solution library, which is part of our document system. So if you go to MongoDB Docs and you look at the left side, you see the solution library. You find actually very practical use cases there. All of the things that we talked about, the whole code, GitHub repositories, LLM prompting, everything what your heart desires is there. And obviously, MongoDB.com, we have a lot of examples right off the homepage for our AI offerings.

Rachel Stephens (28:07)
Wonderful. Thank you so much, Boris.

Boris Bialek (28:09)
Thank you for having me.

More in this series

Conversations (122)