A RedMonk Conversation: Transactions at Scale, Bonkers Numbers

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Get more video from Redmonk, Subscribe!

In this RedMonk conversation James Governor catches up with Boris Bialek, Field CTO at MongoDB on industry solutions and scalability. Given the story of the company includes one of the most well known memes in database history – “MongoDB is web scale” we felt it might be time to take another look. Bialek has been working on database scale for decades, including work on IBM’s venerable relational database DB2, so he’s seen it all. He’s enjoying life at MongoDB working on customer problems at scale with MongoDB, and that’s what the show is about.

The conversation looks at ACID compliance and transactionality, something MongoDB isn’t known for, but which it has supported for some time. It explores the complexity and scale of managing ecommerce financial transactions at scale with Black Friday events.. From luxury brands to real-time data management in cars, Boris highlights MongoDB’s real-time integration functionality. The discussion also touches upon scalability and optimization in database systems for real-time decision-making. What are your thoughts on MongoDB’s real-time transaction support and Boris’s insights on industry solutions? Share your thoughts in the comments below.

Rather listen to this conversation as a podcast?

 

Transcript:

James Governor: Hi, this is James Governor, co-founder of RedMonk. We’re here for another RedMonk Conversation. This is a good one. I’m excited today that we’ve got Boris Bialek from MongoDB here. He’s one of the Field CTOs at the company. Are you the only Field CTO or are you —

Boris Bialek: No no no no no. One is not enough these days anymore.

James: One is never — so let’s have some more field CTOs!

Boris: Yeah, so but yeah, my area is industry solutions. So I’m working mostly with large scalable industry things from FSI, manufacturing, retail and all in between.

James: Okay, so Field CTO, you’re getting out there and actually working with the customers. Is that right?

Boris: Correct. Correct. Yeah. I’m a really dangerous person. That means I work with clients. I actually implement what I talk about and I see normally the outcomes as well, good and bad sometimes.

James: Okay. Okay. Well let’s talk all about the bad. Well, we’ll talk about a bit about both. But so here’s the thing. I’ve known you for quite a while. I knew you in the the good old days of working at IBM with DB2 technology. Lots and lots of claims about how scalable that database was at the time. You’ve ended up at MongoDB, and I think here’s the thing. You’re someone that has been working at the cutting edge of database scale for many years now, I guess decades of experience. Here’s the thing. MongoDB, I mean, it’s sort of a reputation. We even had that joke. It’s web scale, sort of the joke that it actually didn’t scale. There’s been a lot of people that got this considering, oh, MongoDB. Yeah. No, it’s not as good as these other databases. I think the reason I wanted to talk to you today is you have some examples. We’ve been chatting, and I think maybe we have… maybe you’re not just like a tiny little putt putt VW. And in fact, a bit of a high performance engine now. So let’s talk about that, about scale. Real scale. And I think you’ve got some good examples that we can talk about, where MongoDB, frankly, is showing some impressive throughput.

Boris: Absolutely. And the interesting part is, what the biggest part is, where I’m always stumbling about is when people ask me, oh, and you’re a no SQL, so you are not asset compliant, are you? That’s normally the starting point of an interesting discussion, because first thing is MongoDB is asset compliant, each document gets logged with beautiful log files. What people really appreciate in the database space and all the things surrounding it. But as soon as you start logging, you start around discussing the old problems. How can you log enough transactions? How is transaction behavior happening? And one of the most amazing one was worked with Temenos. What we’ve done last year, it’s been now a year ago, 2023. Can you imagine that?

James: What’s Temenos?

Boris: Temenos is a core banking platform. And as somebody who’s working as long as me in databases, never trust a benchmark, which you’ve done yourself, is a standard term. And specifically, vendor benchmarks are very famous. When a third party is using your–

James: You’re saying that we couldn’t trust TPC-C and it’s pump? What do we–

Boris: Oh, sure. Remember the 0.1 cent per transaction depending on the single hard drive for a database? No, but going back to the point, there’s some truth to it. You want to see some real workloads, and there are not that many comparable workloads out there where people put it through the motions. And when Tony Coleman, the CTO of Temenos, asked us, are you guys ready? We took basically the work from the last years and put it through a real high powered test. So the long story short, 150,000 banking transactions per second. Now people say, oh man, we had billions of transactions an hour.

James: I thought you weren’t allowed to bank on MongoDB, aren’t you?

Boris: Yeah, that’s exactly the funny part, right. And when you look in banking, each transaction suddenly becomes 20 database transactions involving 6 or 8 different, what we call collections. In the old lingo, you could have called tables. It’s not fully comparable. But then we’re talking actually multi document transactions. Hello. Like multi table joins, as some people may remember the old days. And then when you take it to this benchmark load and we are doubling performance versus earlier systems and the relational space and things like code paths things coming into play. Transparency of data, real Json data which are really–

James: And these are financial transactions. This isn’t like business transaction in the sense of customer service or some other…

Boris: Correct. This is like opening an account which consists of creating the account owner, creating the account, creating the balance system. So there’s a lot of things involved on that… or processing mortgage or things like that. So these are quite complex things which happen in the background. And that is the part where it became exciting for me when we’ve done this test, it was to me, yeah, if you can do this one. And the assumption was by the way, a really funny number now again, 100 million clients with 200 million accounts, something like that. So that’s a pretty decent size of the bank. And after that one, we can do any system in the transactional space. And this is the scale, what we talk today about. The fun part was we used only a single MongoDB partition. We’re not even get into sharding. We had one single MongoDB. We could have scaled this out to 100 partitions. We have clients out there with 1000 shards in the system.

James: Okay. So let’s talk about some of those really big scale examples that you’ve got because I think some of the numbers are pretty impressive, not necessarily what people expect.

Boris: Yeah. You see, by now, clients in the infamous petabyte size, you and I, we were remembering the days when people said, I’m in the terabyte club and people are really, really excited. But today, a petabyte is a little bit different discussion from the size in between. And we see that people get really, really into large scale data. The interesting part is these are transactional systems. These are not in the data warehouse space people built. Even in the old days, really funky large data warehouses. We talk about transactional system running millions of transactions, interpretating data on the fly in real time and making decisions out of that one on single data sets. So this is the thing where people get a little bit lost, and I always call it the hoodies and the coats. You have classic the lab coats in the data science department who come out with great spreadsheets at the end saying, 42. And what we are doing is really the transaction out there. You have a transaction, something hot running, somebody is ordering shoes, and the system tells them, dude, you really want to have the size nine? What about nine and a half? Because accidentally we know out of 100,000 orders that you should be nine and a half and not to nine based out of your other purchase behaviors. That is, well, it sounds a little bit more creepy than it is. Saves you a return. Happier client, happier shoe.

James: Okay, okay.

Boris: And now we’re talking millions of those in parallel. In large systems, people don’t even realize how many retailers rely on us. And if you take a look, there’s a very large part of us commerce tools. They’re providing the infrastructure for some of the largest luxury brands in the world. And sites like Verizon.com is one of their reference clients. If you check out their website. And now imagine people ordering the new iPhone. We’re talking suddenly spike and unbelievable loads happening. Everybody wants the latest order. We handle all of that one with them as an ISV. But the underlying database is MongoDB for these kind of solutions, which is quite amazing to see. And for the way how we’re working today, transactions per second is translating into baskets per seconds, mortgages per seconds, requests per seconds. So all very industry driven. As you know I’m the industry dude.

James: So I think one of the other things that you’ve been doing some work is in and around automotive, which is a slightly different use case. Obviously that’s not so transactional. But again, some when you’re looking at telemetry, data collection, there are some, pretty interesting data points in that space as well, right?

Boris: Oh, absolutely. And the funny part is it becomes very transactional today. Historically, when people collected data from cars, it was you drive the car to the garage and then comes this, what’s your mileage today? And that was a data point they had. And then they maybe look it up and saying, oh, that model needs uh, we have actually a callback. We [need to] update your software. That was already the luxury version. Today we do this in real time. MongoDB, we have demos running. We just demoed a plug here with at CES together with COVESA. That’s the vehicle vendors together in a software alliance. And we demoed how we can run an integration of MongoDB inside of a car. Think about managing electrical batteries in real time. Suddenly the transaction density becomes really, really high. Think about braking systems. So how many transactions you need to check for optimized braking system for an automatic braking behavior?

James: There’s a lot of people out there that are not going to be — they’re going to be feeling a little bit weird about MongoDB controlling their braking systems.

Boris: Yeah, it’s not MongoDB controlling it, right? It’s just about tracking it. And it wants to tell you by the way, your brake pad is really gone. And I will be very slowed down on the highway. You’re not driving 220 miles on the I-10 in Los Angeles anymore? No. I’ll slowly put you to the exit and oh, by the way, the next repair shop is already informed that your brake pad ripped and I get a repair. These are life changing situations, right?

James: What are the automotive companies that are linked to COVESA?

Boris: [Everyone] in COVESA is really everybody of the world by now. So, if you look up the website, it ranges really from BMW to the Japanese vendors to Ford. Everybody’s in there. So not everybody is using MongoDB. To be very clear, we’re just one option, obviously, but we have a lot of partners out there who are doing cool stuff. And when you check out some of the presentations, there are the right things happening and the data amounts are we can generate terabytes of data in a car. By the hour. So we suddenly say, well, but even if you pre condense data, we talk about the braking system. We don’t really care what happened in the last hour. What you care about is what is the long term view. So that’s time series data. Classical time series data manufacturing or in vehicles. And those we can deal with data in real time on large volume. Then connect to the cloud and decide maybe, oh, you want to open the car. Yes you can do that too with your cell phone. And that is actually a lot of interest in communication. Is your cell phone connected to the cloud, or are you having only a Bluetooth connection right to the car, which then needs to authenticate over different security mechanisms? You see where I’m going? There’s a lot of data behind this, and all this data make this stuff really, really exciting.

James: Okay, so talking of exciting and excited Boris, when we spoke, we spoke a couple of months back, in London at a MongoDB event. And yeah, I mean, basically you were just coming up with sort of these high scale numbers after high scale numbers give me the most excited you’ve been in terms of throughput, in terms of some of the scalability characteristics. You know, some of the stuff that you found most exciting, in terms of the work you’ve done as a field CTO in the past, I don’t know, a couple of years.

Boris: The most exciting ones are the smallest ones as well. Partially because when we think about this back to your car example, we’re not talking to the vendor, oh by the way, we put a real cool data center in your trunk. That’s really not something which is a selling argument. We’re talking very, very small embedded SoC systems on a chip which run MongoDB on one side and run these transaction workloads, what we talk about in the thousands. And then let’s move to the complete other side on the back end, where we consolidate millions of car information into large workloads on the cloud, workloads on global scale, global distributed database, let’s say 10-20 million cars. So that is a differentiation, what we see. And then the most exciting workloads are really the large transactional banking systems. What we’re running on these days. It’s amazing to see what people can do with MongoDB specifically on the trading, trading settlement. And we’re not talking necessarily high frequency trading. People get them really, really confused about that one. Those systems are in memory and normally not really persisted, but sooner or later these traits become a real trade and need to be persisted on a system. And that’s where MongoDB shines by now. Think about something very simple. Each document can be a financial instrument. You don’t need hundreds, thousands of different tables. You can write an ETF purchase, which you just did yesterday for your pension fund. And today you want to buy MongoDB stock because you think we are a really, really great suggestion.

So when you take a look at that picture — or maybe you want to buy somebody else. So Microsoft. Buy Microsoft, they’re just trillion dollar company. So if we look at that picture, all these things historically were different tables, very large scale, very complex. We can write this into a single table, into a single collection and write them down, hundreds of millions of them into one system distributed globally. So because, James, you have a great business and because I can’t use the real bank we’re using this one for. But imagine you have Tokyo, London, New York to cover. Now we can build a distributed system which runs one database over all three locations that everybody has all their data. But the trades in New York happen first, get committed into New York fully redundant, fully recoverable with multiple nodes. But then they have a copy in London and copy in Tokyo, which maybe takes nowadays a runtime delay. We can’t optimize still speed of lights for optics, but that one we have. And then this is way better than before. Oh yeah. End of the day we copy the data. Yeah, end of the day versus subsecond. Think about counterparty credit risk. Think about the Lehman moment right. Make sure you don’t pay out the money one second after they’re bankrupt. Things like that can become very exciting. And those are the systems in large scale which we are running when we talk to the hundreds of millions of transactions running by the hour.

James: Okay, so hundreds of millions of transactions per hour, financial transactions, and that’s the sort of throughput that you’re increasingly seeing in the customers that you’re working with.

Boris: Correct. And when you think about wait, you talk system on a chip in a car. Now you talk financial transaction, globally distributed clusters… that is the piece which makes it so exciting. And this ranges from things like post sales. Post-trade settlement to things like risk, risk computation. How much do somebody owe me to mortgage onboarding and figuring out, hey, James wants to get a mortgage. Oh, look at we have all the data about James because he’s a long standing client of us. So instead of making him print out 500 pages of documents, why don’t we use the information we have and say James, you’re a highly valued client. You get two percentages off your next mortgage, which would be amazing.

James: My mortgage is locked in. And that was the one financial thing that I’ve done quite sensibly, recently. Took a ten year. I’m very happy with my mortgage at the moment with the interest rates before they spike, so that was good news. But at some point I will need to probably do this again. Okay. So I think that was the key thing that I really wanted to talk to, this question of scale. It’s just to finish up Boris. There’s different dimensions of scale. Obviously, this is throughput, you know, sort of reliability, availability. Well, what are the key dimensions in scale as you see it? And how are MongoDB and the Mongo storage engine optimized for these different characteristics today? Like, you know, where do you think those sweet spots are? What are the optimizations you can do? As you said, I mean, we both remember that sort of you always had the decision support, as opposed to the transaction processing different things. Like what from an engineering perspective does scale mean in 2024? Where are we going from here and how can you enable optimizations accordingly?

Boris: And this is really the question. And it’s a good one because availability today is a table stake. When we started there were failover systems and people talked about what failover time we had. And then you saw some companies were down over weekends because certain database vendors, luckily not mine at that time, could not recover in time or the backups were damaged and stuff like that. We are talking now about a zero downtime, zero tolerance actually from the clients. When you think about it you cannot take an online store offline because you run a maintenance window on your back end. So MongoDB is by default a cluster system. So we always have normally in the larger system — not in development, you can have a local system on your laptop, no problem. But when you talk about you want to run 100 million financial transactions now, you may want to have redundancy built in from the beginning. So you start with three nodes. And my joke normally is one is active, one is working and the second one is on vacation. So and that’s pretty much the same picture here. But the key part is all nodes are active. You have what we call a principal, but the other two nodes are delivering read performance.

And now on top of that one we can scale this one out. We can say, well honestly we had this example about the multi distributed environment. Geographically we have three nodes in New York, two in Tokyo maybe and two in London, because we don’t even want to risk that London gets completely shut down and London is maybe even distributed over two data centers so suddenly we talk about seven nodes inside of one of these partitions. And now we multiply this by three. And because we have London Prime, Tokyo Prime and New York Prime. So you have suddenly this orchestration. These kind of systems, this is managed inside of MongoDB. You as a person don’t need to understand it. You don’t need to plan with it. The developer doesn’t need to deal with it anymore. You and I remember times and people needed to fine tune on which partition does all the stuff exist. Mongo takes care, understands what is the nearest solution. You take it, New York it is. And the application goes first to New York for the New York transactions. Done. This kind of simplicity are table stakes today. But the next part is then obviously now I want to scale out.

I want to scale in size. The example with terminals where we went to a bigger system, or you want to say, you know what, I don’t want to have all these big baskets. There are economic reasons as well. Smaller systems are cheaper in comparison. So give me 20 shards, distribute my data over 20 multiplied by the 3 or 5 nodes. And now comes the next level. We want to make real time decision making. I want a real time search on all my data. I don’t want to be dependent on an export to a search engine. All the stuff is integrated in MongoDB. We call this workload isolated nodes. You can run search in real time on this data. Now we’re getting back to my comparable part about the hoodies, right? Developers want to have real time decision making. If you had a new product you don’t want to wait. Oh yeah, the next upgrade on the search comes in 24 hours. You lose 24 hours of sales. Worst case, the thing is sold out before we have it in the search index. No, you need this real time integrated. And the next part is obviously GenAI. How can we have a podcast without mentioning GenAI James?

James: We’ve done pretty well. We’ve got through quite a few minutes before we got to that, before the marketing hat came on. But you know.

Boris: Yeah, it’s not marketing, but it’s the same thing again, the lab suits with the data science versus the hoodies. And the hoodies side wants to have GenAI real time. I want to suggest the right shoe size for you. I want to make sure that the mortgage is correctly processed for you and that you don’t need to deal with all the 500 pages. All of these kind of things are real time applications. Now we have things like vector search and real time integrated in MongoDB. So you can actually run your data. Let’s think about the financial transaction. You build a vector over this one and analyze in real time the behavior of this and maybe the impact on the counterparty credit risk, the Lehman discussion again. So when you do these kind of things, this is what makes Mongo different. That we have all these things integrated. And this is what makes me, as a somebody who’s really big in transactional processing so happy to talk about it, because you can do things without copy. As soon as you have ETL, we both know you lose minutes or hours. Real time. And not only oh yeah, I have there maybe, uh, Kafka and I can… You can do that. There is nothing wrong with that design for many solutions. But when you talk real time, the transaction comes in, and I need to make a decision based on this transaction now. And that is what MongoDB can give you. And with the scalability and the reasons come in why people do that. And that includes GenAI, Vector search or the integration to LLM on the shoe size.

James: Okay, there we go. So I think that’s enough. That’s enough! I got it! Here’s the thing, folks. Basically, this was the key point why I wanted to to record this with Boris, is Boris is an old time transaction guy. He’s been about transactions his whole career. I think for a lot of you, you don’t necessarily think of MongoDB and transactions. And I just thought rather than all of the other things that we could talk about developer experience, GenAI, any of that stuff. Let’s just talk about the basics of scale and transactions, where MongoDB is. As you can see, Boris is pretty excited about this stuff. So that’s all to the good. Boris, thanks for joining us. That’s another RedMonk conversation. And if you’re interested in this content, if you’d like to see more, please subscribe. Share this stuff. You know we are trying to foster a conversation. We’re always interested in engagement with the community. So thanks for joining us and thanks, Boris, and I’ll see you all soon.

Boris: Thank you, James. Bye bye.

 

More in this series

Conversations (82)