MongoDB’s Erik Hatcher (Staff Developer Advocate, Atlas Search) joins RedMonk’s Kelly Fitzpatrick for an introduction to MongoDB Atlas Search: a full-text search engine integrated into MongoDB’s fully-managed developer data platform (Atlas). After discussing some of the advantages of leveraging search on a data platform like Atlas (unified developer experience and API, seamless synchronization, reduced operational overhead), Erik jumps right into a demo of Atlas Search using platform-supplied sample datasets (we use a film dataset) and walking through the tools and steps a Java developer might use.
This was a RedMonk video sponsored by MongoDB.
Resources
- Atlas Search
- Learn more about Atlas Search: https://mdb.link/atlas-search-lp-redmonk
- All Atlas Search tutorials: https://mdb.link/atlas-search-tutorials-redmonk
- Get started with Atlas Search in Java (as demonstrated in the video): https://mdb.link/atlas-search-java-redmonk
- Landing page for MongoDB Atlas: https://mdb.link/atlas-lp-redmonk
- General MongoDB learning and developer resources
- Overview of MongoDB and the Document Model (free MongoDB University course): https://mdb.link/document-model-redmonk
- Check out other courses offered through MongoDB University: https://mdb.link/mongo-university-redmonk
- MongoDB’s Developer Center: https://mdb.link/dev-center-redmonk
Transcript
Kelly Fitzpatrick: Hello and welcome. This is Kelly Fitzpatrick with RedMonk here with a video on: What is MongoDB Atlas Search, How to help users find what they need. With me today is Erik from MongoDB. Erik, would you mind saying a little bit about who you are and what you do?
Erik Hatcher: Hi there. Thanks, Kelly. My name is Erik Hatcher. I am a Staff Developer Advocate at MongoDB, specifically on the Atlas Search platform. And my background has for the last whole bunch of decades, been focused on search, specifically things built up and around Apache Lucene. So that’s been my expertise over the years.
Kelly: And I love that. Not only do we have someone who is a search expert to talk about Atlas Search, but you have a proper title, Staff Developer Advocate.
Erik: I try to downplay it because I don’t really know. I don’t feel like, you know, I haven’t even reached my year tenure. I’m about to reach that. So I’m very proud to be at MongoDB. It’s a great company, great people, and so definitely proud of it. But, you know, I’m certainly not worthy.
Kelly: [Laughs] Well, thank you for joining us today. And MongoDB, I have been following them for quite a while. But for folks out there who don’t know, MongoDB is probably best known for its document model database. Document stores emerged quite a few years ago as part of this whole NoSQL movement that sought alternatives to relational databases. If you are new to the document model and you want to learn more about it, we’ll put some resources in the show notes. Atlas is MongoDB’s fully managed data platform, and one of the great things about data platforms is that as a developer, it can save you a lot of time having to stitch together different types of data stores. One of them of course, being Search, which is where Atlas Search comes in. And Erik, I’m going to turn things over to you because you are like 10X the expert on anything having to do with search. I mean like 100X times the expert than me.
Erik: Well, thank you so much, Kelly, and thank you so much for that introduction. You basically just covered my next two slides. So I have this slide here. It’s kind of busy, but it shows you the breadth of what MongoDB Atlas provides as far as what we call a developer data platform. This encompasses all the kind of things that you need when you put your data in somewhere. Backups, replication, scalability, GraphQL authentication, you know, rules and triggers and, you know, multi-cloud support, all these kinds of things that are difficult for us as developers to really think about when we’re trying to build an application. So Atlas takes a lot of that pain away from you as a developer and unifies it all into one kind of aggregation pipeline semantics, which we’ll see as we talk about the search platform aspect of it.
Kelly: Yeah. And I know I covered a little bit of this slide, but the visualization of it I think is really, really great. All the things that I don’t want to have to worry about as a developer are on this slide.
Erik: And, you know, frankly, I’m a search person, so all of this stuff is awesome. But I focus on the search aspect of it. I am thankful that there is the whole thing there that makes the search nice and easy and and just, you know, scalable and all that kind of stuff because of the whole Atlas platform. So there we go.
So carry on, right? So as Kelly mentioned, also the main thrust of what’s made MongoDB special over the years is what’s called the document data model. And it’s not just a physical thing. It’s kind of a mindset thing as well, how you model your data as what we call documents where all things related to a particular entity or a thing is co-located in this document structure often represented as what you see on the right here as just a Json document structure with name value pairs. Compare that to the relational model where you they turn things into different tables and have foreign keys and these types of things. Totally a reasonable way to do things and a reasonable way to represent things. But MongoDB Atlas represents it as this document data model, which segues very nicely into Search because what you feed the search engine itself are documents. And documents like we saw on the previous slide here, fit very nicely into a search engine because that’s effectively what the search engine underneath of Atlas Search actually calls what you put into it — is a document. So it is a very natural 1 to 1 kind of model there. So in terms of Atlas Search, what you can do is enable Atlas search for your collection and we’ll see that in the next section on the How To.
And what that will do is provide you a full text search engine on top of your data for whatever collections that you opt in on this. And it provides all of these kind of features you see here — everything that’s full text search related. So the main ones to me are the rich query language, being able to filter and facet, have fuzzy, inexact search, but with relevancy ranking going on in there. So the better matches to a particular query come to the top and other supporting features such as multi-language support, highlighting — so the query terms you see get highlighted in context of the document itself or the fields that are matching that particular query and so on. Synonyms, more-like this-similarity… And yeah, so that’s the features that Atlas Search itself supports. So the way that Atlas Search does its thing is in contrast to how you may have built or tacked on a search engine to your data in the past where you are attaching a third party search engine to your data. What we do in Atlas Search is your data is in Atlas, so we receive synchronization notifications of your content as it changes, as you add new content, as you update content, we keep the search index in sync automatically within under a second in almost every case. So as you modify your data, your search index is staying in sync with your changes. And again, it’s all under one unified interface through the Atlas, the MongoDB query language interface, where you specify an aggregation pipeline and go through a search pipeline within that one interface that a developer speaks to.
Kelly: And I really like the term synchronization tax that you use on this slide because I think it very much speaks to the extra time or extra effort that is put into learning other technologies. And that kind of bolted-on search scenario I think is a really good one because it’s not just learning it and setting it up and then maintaining it and making everything update when it needs to be.
Erik: And interestingly, as I was thinking about this presentation over the last couple of days and thinking about the document model, actually one of the things that you do in the synchronization tax that you end up paying is turning your content wherever it may reside, a relational database into documents. So you end up forming the document model to hand to the search engine anyway. So that is the synchronization tax that you end up paying, is to documentify your domain, effectively, whatever you want back from the search index.
Kelly: I think the word “documentify” is something that is going to get added into my dictionary as well.
Erik: Yeah. And that’s the first utterance of it. We’ll see if that exists elsewhere in the universe. And underneath the covers of this whole thing is this thing called Apache Lucene. So the next slide is going to show the Atlas Search architecture. This architecture — I’m not going to go into all the arrows that are showing how a user makes a query and that happens and how content comes in and gets synchronized — all those things are processes that we know as developers need to happen. That’s what Atlas Search is doing behind the scenes. So at the heart and soul of what Atlas Search is, is this very popular search engine library called Lucene. It comes from the Apache Software Foundation. It is Apache software licensed. It’s embedded in all kinds of search engines that people have been using for decades. And we’ve put it inside of Atlas and the Atlas search process here and speak to its API. So we benefit from all of the goodness of the relevancy ranking, the rich query language and the indexing performance and the flexibility with multilingual and all that kind of stuff comes from this underlying Lucene library.
And, you know, I didn’t put my timer on. We had the seven minute thing and I was trying to make sure we did the right elevator pitch for this thing. And really I just want to close in terms of the part of What Is Atlas search with, Atlas search is Lucene at the heart of it that brings to bear all of the goodness that Atlas — the developer data platform — brings as well. So you get the reliability, the clustering, the scalability. It’s just really just a one click, turn it on for your particular collection type of thing. And there’s a lot of customizability if you want to fine tune how the search stuff works. That gets into a lot of gory details of how full text and inverted indexes work, but that’s all kind of possible there, even though the one click easy path makes things very nice and powerful without giving a lot of thought to those things. So again, Atlas Search benefits from Atlas, the platform and Atlas Search provides immense searchability/findability for your content that you’re putting into the Atlas platform.
Kelly: So thank you Erik. That was a great What Is MongoDB Atlas Search in the larger context. I believe it’s time for a demo and I know that one of my questions is always like what type of application is this suited for? And I know that you are going to actually, in the demo, walk us through a bunch of sample scenarios.
Erik: Yeah. So that’s a great point. And in fact, it is really — as a search enthusiast, a search professional over the last bunch of years — one of the things that is necessary is to ground the stuff into reality. Like we can say we want full text search and all of this magical relevancy ranking. But really what does that mean in terms of the usability? What are we doing with it within our applications? And that is going to drive backwards towards the types of features and capabilities and configurations that we’re going to turn on. So I like to think about application development, not from the bottom up of the configuration, up to the application, but more from the application and the usability back down into let’s configure the search engine for those needs.
Kelly: What does the application have to do as opposed to like, let us just start with all of the things and see what we can build.
Erik: And so in this particular case, and we’ll just segue into the demo and Kelly, you just, feel free to drive our direction in which way you like here. But my canned demo is to show you what you do with Atlas and in terms of turning on Atlas Search, configuring it and then building an application effectively on top of the Atlas Search API effectively. And so the way we do this for easy demoability and just to have a common data set that many people know and it’s not going to change over time, even though of course new movies are going to come out, but we’re going to use the movies dataset. And the reason that makes a lot of sense here is that we actually have the facility to load sample data sets into Atlas Search and Atlas very easily. And so when you click this load sample data set, I’ve already done that, so I’m not going to do that now. And it takes a couple of minutes, if even that. And then it gives you all of these things that are called sample underscore here. These are all these different databases that come with the sample. It’s not a lot of data, but it’s kind of diverse. And in this particular example, we’re going to focus on movies. So I guess if we wanted to have a faux but not unrealistic application here would be, you know, we’re back to Blockbuster partying like 1999. And we are providing a movie search service. And this kind of models the Netflix data, we call it the “mflix” dataset here. So there’s a movie database. It’s got right here, 21,439 movies. So it’s not every movie in existence. And actually the data set cuts off in the 2010s somewhere. So it’s not the current movies. So we definitely could do some searches that are going to end in some dead ends if we’re thinking that we’re going to get some more recent movies because they’re just not in this database.
Kelly: So and just to clarify, these are sample sets that any user trying out MongoDB Atlas Search can access. So probably not good if you want to do anything with like a Barbenheimer cast search or anything like that. But some slightly older movies would work out.
Erik: That’s right. Yeah. I mean, it’s a limited set of movies. These are just going to be the popular movies from — and we’ll see the range of years actually — as I show off the indexing feature of Atlas Search. So through here, what we can do is say, go to this search service over here, and then what it’s going to do is drive us through some wizards here. I’ve already set up a search index for the movies collection here, but what that would look like to set it up for a new one, if you didn’t have any, there would be a big create search index button right in the middle of the screen. Here we can create one. We can say we’re going to just use the visual editor and just use the click, click, click real easy path here. And we come down here and we pick the movies collection and we already have a default thing so we can say new_index and just use the defaults here and just click and I don’t even have to think about all these. There are a lot of settings up here that are worthy of drilling into, but it’s beyond the scope of a seven minute thing. We just want to click this. And when we do that, this little robot here goes behind the scenes and is setting up a chain stream listener and going through that process that was in that diagram that we just showed. I’ll just bounce back to that so we can have that while it’s doing its thing.
It is doing the initial synchronization and a change stream listener here and bouncing to the Lucene index. So it is indexing all of our movies as it spins right here. And it’s probably done already. And you can see already it just changed to initial sync. So it’s already kicked into process here. And we can we can watch the status here. So it sets up automatically for us. And this is all available. I skipped over this as I jumped into the Atlas UI here. As a developer, you can create a free Atlas account and have access to Atlas Search in this manner and that will give you behind the scenes a primary and two secondaries for replicated index structure. So you actually have more search horsepower than mere mortals know what to do with by just doing a few clicks like this. So this actually gives you a very powerful index behind the scenes here, just with a few clicks. And to leverage that, and we’ll just do that through the one that I’ve already created here, you can click this query button here and do your queries for Keanu Reeves if you want here and hit the search button and that will give you back particular movies that he was in as probably in the cast here.
So here he is in the cast, right? So this is a basic search tester interface that’s built into the Atlas UI. It gives you the code behind the scenes. If you click this right here, you can copy and paste that code over into your application, or you can do some more sophisticated things like go over to MongoDB Atlas, which is a MongoDB product that provides, in this case, a Apple Mac interface to MongoDB and to Atlas. So I’ve got this thing connected to my M10 instance, and I’m doing searches basically through code here and just playing around to see what the end results are. So I’m doing a search for Keanu Reeves in this case. And I’m faceting across various year buckets and across all genres. And I want ten of these back. I’m just kind of going over this code without going into gory detail of it, because there is a lot of kind of gory detail here. But the basic search stage is effectively this right here, what you see. And it provides back these ten documents and it provides back facets. So I can see across all the movies how many movies Keanu Reeves acted in that were drama? How many were action movies? And so that’s the data I get back. Now we need to turn that into an application. I don’t know if this is a good time or if you have any questions, Kelly. I’ll pause before I jump into the application side of things.
Kelly: No. And I think having the search tester in the UI and then being able to start there and then pop out to other whatever tool works best for you is a really neat aspect of this. So more of a comment than a question.
Erik: And related, so I couldn’t have planned your questions better because actually that’s a perfect question for what I had intended to show here, and that is this export to language. So I’ve developed as — I’m just tinkering around and I’ve done this query here and seeing the output. And now as a developer, I want to take this query and turn it into code so I can come over here and export this to various languages. And I’m coding in Java here. So I literally just hit this copy button and I’ve done this twice with two different queries that I’m going to show you. And gone over into my code editor and pasted that code that it gave me back literally right here and right here. Two different queries. I could have done my code a little bit cleaner without duplication, but just copy and pasted a couple of times and that’s going to give me this facet view that I’m going to show you in just a second. So let’s just think through as a developer, we’re going to try out a few different queries. So this is just a graphical and a Java application way to show a couple of different queries. So I’ve got the query RedMonk and I’m sending it to the aggregation pipeline in a couple of different ways. In like, I don’t know, three, four different ways. So we’re going to send RedMonk as a query over and I’ve got this kind of silly animation here that you’re going to see. And this is just to kind of show you more graphically what happens when you do this query RedMonk using basically the kind of query that gets generated from the search tester and this is what the results look like. So whether these are good or bad, it’s really hard to say without exploring, you know, what queries there are that are best for that and don’t have the movie expertise to tell you that.
Kelly: I’m definitely learning about movies with both Red and Monk in the title that I did not know about before.
Erik: And so that’s again, you’re very good at the segue thing. So one thing that I do next is do the query Red or Monk in just the title. So you get to pick and choose. This last query goes across all fields and excuse the kind of weird animation there. I’m doing it across all fields in this particular case and in the next case I’m just doing it against the title. So really now we’re doing Red or Monk in the title. So that changes at least the relevancy order of the results. And in this case, I’m changing this to be a phrase so this means this needs to be Red Monk literally as a consecutive couple of terms in either the title cast or plot fields. So that’s what this particular query is doing. And as you can see, there were no documents found in that particular case. So. And sorry this was a bad segue for me was this is what we call facets. So facets are the ability to bucketize. And I think that probably is a word that’s been out there — bucketize is to bucketize the documents into groups. And in this case, I’m doing it against two different fields. I’m doing it against the year field that the movie was released, and I’m doing it against the genre field. I actually don’t show that it says genre or year, but that’s what it is. So this is against all 21,349 movies. 48 of them came out in the 1920s, not just 1920, but 1920s. I should have put an S beside that. And this is the count, and kind of a ratio of this count to this count right here in terms of the bar graph. Not very pretty, but I just cobbled this together this morning to show graphically how we could do this. And this is showing the genres across all movies and what their genre breakdown is here. Does that make sense, Kelly?
Kelly: No, it does. And yes, Bucketize is in fact a word. Or if it’s not I’ve decided it is a word.
Erik: Documentify and Bucketize. We’re on it. So now we’re going to do — and this is very much like your favorite shopping systems that are more than likely powered by Lucene as well, is when you do a query, you also get the results back. The facets change. So in this case, and you can’t really see the query itself very well, But in the title bar of this is a window here, it says Movie facets cast Keanu Reeves. So I’ve done a query for cast Keanu Reeves and we can go look at the code of that. And he wasn’t in any movies until the 1980s, so he only shows up in movies in this year range here. And he was in these different types of movies out of those 27. So 16 of those were drama movies.
Kelly: Also fun to learn how many drama movies Keanu Reeves was in.
Erik: Yeah, he’s quite prolific. I like to use him as an example because he just kind of — seems like a good example.
Kelly: Very much so.
Erik: So, yeah, that is effectively the usability and showing the end to end where me as a developer, I can go click, click, click in the UI or configure it through an API. We have configuration API with Atlas Search as well. And now you’re enabled for search and now to plug it in and use it is pretty straightforward. So it took me just this morning to build this slide right here. So I didn’t really prettify it. I just wanted to to show the data that was behind the scenes of what you see basically right here. So this is the data for Keanu Reeves. Like literally 16 movies are drama, and it’s all in this Json response structure right here. And as a developer, all I had to do was do this and then come over here, my code and do this sort of thing. And then there’s a little bit of code over here, where I’m sorry to go into code here, but you know, I’m looking at the meta element and I’m looking at the facets and I’m getting the year facet and the genre facet and I’m looping through and it’s a little bit of hard coded stuff where I’m looping through the years and then I’ll loop through the genres and I’ll draw a rectangle, right? So I draw the, you know, which drama parentheses ten or whatever it is, and then draw a rectangle that is the ratio across the screen that looks like that. So pretty straightforward to go from data to visualization that way?
Kelly: Yeah, very much. I just foresee this as being extremely useful, especially for developers who are just learning how to deal with search and incorporate that into their applications.
Erik: Yeah, yeah, for sure. And the nice thing about Atlas Search is making it quite easy. I don’t want to sugarcoat that too much. Like the ease is pretty cool and you can go pretty far with just some clicks, but let’s not fool ourselves that we need to do a little bit of pondering about our configuration. So let me just show you why that’s kind of relevant and important here. Behind the scenes is a configuration file that we can use that gets generated from the visual editor where we just click, click, click and just accept the defaults. This is the one I’ve customized a little bit. And rather than just doing naive, let’s just tokenize our text as if whitespace is the separator and that’s it. And we’re going to lowercase and that sort of thing where we’re building an index of the words in our text. I want to be able to do things that are relevant to our particular language. So the plots of the movies in our database, are always in English. So what we do is we turn the English analyzer on that one instead of the standard. The English one is actually a little bit smarter about English words in that it will stem them. So if you have, for example, the word search and the word searching and searches, the stemmer in this English analyzer will turn all of those into the word search in the inverted index. And the same logic applies, the same process applies when a query is sent to that particular field, it will apply that same stemming. So if someone searches for searching, they still find words that say searches or searched or search. So there is some very useful and necessary language capabilities there, that you do kind of have to peel the covers off a little bit to take advantage of these things. But the power is really there for that kind of clever searchability.
Kelly: Yeah, absolutely. So I know you probably could run this demo for an hour, but we are kind of getting close to time. Any other last pieces of this that you think people should actually see?
Erik: Yeah. Again, Kelly, thank you for the great segue. So the one final thing that I wanted to demonstrate, and it is one of the reasons that I would say makes Atlas Search a really compelling platform, is that we have this thing called query analytics. And first of all, what I’m going to do is, it doesn’t really matter, but I have another slide here that’s just going through some random strings of queries and it’s just doing it as fast as you’re seeing it go by here. And I’ll let that run for a second and then I’ll just show you what that generates on the back end here. So if you come over here under the search section and then you go to your query analytics, you can get these query analytics where you can see graphs of the top terms that have been searched for, and you also get a graph and data about queries that get no results. And that’s an important one because people searched in my application. And of course these are all queries that I just generated. Some of these say what they are, but that’s not really the query that’s behind them because I was debugging my thing. But anyway, this particular query where it’s Matrix Reloaded without a space in there, there’s techniques that I can do in more advanced configurations where I can make this be resilient to whether it’s a space there or not a space there. Right now, it’s necessary. It’s relevant that there’s no space there and that makes it so it doesn’t match anything. So those are the types of things as a search engineer, you have these challenges and this query analytics gives you the ability to see what are the top queries that are coming through. And my thing was just generating queries very rapidly. So I’m trying to generate some data to have some more interesting graphs here over time.
Kelly: Yeah. And the thing I like about this part — so talk about the How To — how to help users find what they need. You know, how can you refine the way that you’re processing search so that someone does not put the space in Matrix Reloaded. They can find their Keanu Reeves movie.
Erik: And the thing is that that people implement search on their websites and they don’t pay attention to those details. And as a search engineer, this wasn’t our pet query, so we never saw it. Our boss isn’t checking it. And so these queries end up being users getting the not found. There’s no documents found. And that’s just, you know, horrifying if you’re trying to sell stuff online. So people pay very close attention to these types of reports here. It’s a really, you know, there’s a lot of dollars behind that one. Right? It could be, for a particular query. And if nobody’s finding anything, we need to at least do our best and show them something that’s close to being related, either based off their persona or, any demographics or, all these other tricks that you would play to show people stuff that they might want to click on. But, for sure we can do better with our actual lexical searches on this one.
Kelly: Yeah, absolutely. So one last call for anything else that we need to we need to show people in the demo. And if not, I believe you have a couple of resources that people can check out.
Erik: Yeah, let me just go to that. Yeah, thank you for doing the demo. Thanks for the time to do that. I had fun actually, kind of building it and refining that over the last few days.
Kelly: I mean, thank you. You did all the work. I just got to sit here and watch you demonstrate some very cool stuff.
Erik: You just ask great questions and actually just led the way. So we have some resources here. We have a developer center. This is where I am employed as a Developer Advocate is under this /developer at Mongodb.com. And then we have MongoDB University as well. And if you want to scan that barcode, that QR code, it goes to this link right here, which is a short link that takes you to the Atlas Search homepage at the moment and allows you to kind of go from there and create a free account and just click right through. So you can just kind of — let us show you here. And you can just drive straight through and try free and next thing you know, you’re in a wizard where you’re signing up. But once you sign up and get a free account, you can create your Atlas in any of the clouds that are out there. You could upsize it when you need to go more production with it. But it gives the developer the chance to try this stuff out without any credit card or anything like that.
Kelly: So cool. Thank you, Erik, for taking the time to to kind of show us MongoDB Atlas Search and then also leaving us with these these great resources.
Erik: Yeah, my pleasure. Thank you so much, Kelly, for having me on this What is How to.