A RedMonk Conversation: Haptics, Hallucinations, Retrieval-Augmentation and the multi-model LLM future

Get more video from Redmonk, Subscribe!

OpenAI will be a winner, but not the only one. A concept you’ll be hearing a lot more about is Retrieval Augmentation – in terms of improving models. Again we cover that in the conversation. So dive in! So watch the video, and tell me what you think, here or on Youtube, but in the meantime I will leave you with a story from deepset about a gentleman in his 80s that runs a legal publishing firm in Germany. He called deepset just before Christmas last year to insist on a meeting before the end of the year to discuss ChatGPT’s potential implications on his business, and how he could do something similar but without giving his own information away. ChatGPT only launched on November 30th 2022. That’s the scale of the challenge, and the opportunity.

This was a RedMonk video, sponsored by deepset.

Rather listen to this conversation as a podcast?

Transcript

James Governor: Hi, this is James Governor, co-founder of RedMonk, and we’re here for another conversation with RedMonk. We’re talking about NLP and LLMs and all that good stuff that we’re all getting really, really excited about today as an industry. Luckily enough today I have somebody that’s been involved in AI and machine learning for some time. Malte Pietsch, the co-founder of deepset. And we’re going to talk a bit about what’s going on from an industry perspective and probably from an enterprise adoption perspective. So welcome, Malte. Good to see you.

Malte Pietsch: Hi, James. Good to see you. Thanks for having me.

James: So let’s jump off. Here’s the thing that is really worth thinking about, I think, which is the explosion of interest, certainly in large language models. I mean, two years ago, they were a thing. Now they’re THE thing. And I think that’s been part of this this whole change, but I had a brilliant story, told to me by Rachel Stevens, one of my colleagues, yesterday. And she said, I have not yet tried Chat GPT. But my mother has. And I said pardon? And so her mother… basically, they had a built in microwave oven — built in of a particular size built into a cabinet. And when it broke, they didn’t know what to do. And Google could never help them find the correct measurements that would fit into this space. And so they’d sort of given up hope. And then one day, Rachel’s mother hears about Chat GPT, so gets herself an account, is lucky enough to get access. As Rachel says, “I keep on not being able to do that.” And, oh, no, there was one they found. But I think the microwave oven that fitted was $1,300. And I think her mother said, “I would rather not have a microwave oven than pay $1,300 for one.” And if you know Rachel, and you know her affinity for spreadsheets, and her affinity for compliance and finance, I could definitely see a bit of a similarity. Anyway, Mother said no way until she got to Chat GPT within half an hour. And she made a query. It didn’t come up with what she needed. Then she came up with some term that involved the type of cabinet or enclosure. And sure enough, something popped out. They’ve ordered a new microwave for I think about $200, $250. And everyone’s happy.

Malte: Oh, wow.

James: But to live in a world where a technology that was considered esoteric, even six months ago, is now something that our mother is using. And by the way, I hate the — it’s so easy even my mother could use it. That’s not the point of the story. The point is, you know, leading edge adopters are coming from all sorts of spaces. I think you’ve got some interesting stories about that, the sorts of people that are asking questions about their business. But yeah, so one of the ways I’ve been thinking about AI is Here Comes Everybody, and that was a brilliant example of that. Why do you think — what’s happening now that is changed the game? What is it that’s making people — why is AI more tangible now?

Malte: Yeah, I mean it’s really crazy. Like what kind of development we saw over the last year and I think, yeah, to some degree there were also advancements on the technology side. So yes, models became better, performance became better. I still think it’s not really the reason for what we see or what we’ve seen in the last month — that this awareness, this explosion of interest, it’s not because GPT or any other model became just better on the performance side. I think it’s mostly because it became easier to experience what it means to have NLP, to have this kind of language interaction. And this I think is something that OpenAI did in a very great way, like having really an easy interface that you can explore, you can try out, you can experience this kind of technology. So it became easy for Rachel’s mom to try it out. And same thing happened with my mom. Like, I think I tried to explain to her what I’m doing for five years now? Never really succeeded. But now she comes over on Christmas and says, “Oh yeah, yeah, I tried this Chat GPT and I read in the newspaper about it — is this what you’re doing?” And yeah, pretty much. So yeah, I think it just really helps people to understand, like touch it basically, right? If you talk about, I don’t know, some code, some models, some APIs, that’s very abstract for most users. But at the moment —

James: You’ve got a word for that, you’ve got a crafty little word for that haven’t you. That making it more —

Malte: Yeah. More haptic, like you can really, you can touch it, you can almost feel how NLP works and it’s not something you write down, but you really experience it firsthand. And I think these haptics basically changed, I would say, on this user experience side, you can really play around with it. You can spark your own imagination, what this might mean maybe for your own business, for your own use cases. I think also the haptics change on the workflow side for developers. So it’s way more easier to build a fast demo, spin something up, share it with colleagues. And I think all of that is kind of accelerating right now, the interest in it. We have some other stories like this. Many people reach out to us these days. And I will say one example that stuck with us was a few days before Christmas. You can imagine a family business in Germany more than 100 years old, really a market leader in their segment, reinvented themselves quite a few times over the last 100 years. And there was this owner like very seasoned, professional, not a technologist, but really expert in his field, probably in his 80s.

Malte: And he was reaching out to us and saying basically, “hey, guys, I saw Chat GPT and this seems really revolutionary… this will disrupt my business. Like it feels to me like this is 2000s, when Internet came, I need to act, what should I do? How can I get this basically into my product, into my offering?” And we started this conversation. And I mean, for us, I think was interesting because when we started deepset five years ago, the early days, it was a lot about talking to early adopters, to technologists. But now these kind of conversations happen more often where it’s business owners or managers seeing, hey, okay, this is what we can do. It sparks the interest and they do this translation to, okay, what does it mean for my business? How will it impact me? And I think this is what kind of changed in the last year.

James: Sense of urgency too. Didn’t he demand that your co-founder that had just had a baby, and it was like a couple of days before Christmas, had to be in a different city within two days… there was, you know.

Malte: Yeah. He was basically already on parental leave because the baby was just born. Two days before Christmas, everyone was just busy with finishing stuff before the holidays here. But then you have this opportunity and of course you jump on it. And yeah, that was definitely worth it and interesting to explore then with this person, like how and where and makes sense in that product.

James: Yeah. Love that story, love that story. So, it’s a much broader funnel in terms of interest. I think maybe there’s an assumption now that everything works just like Chat GPT does, which is good, but not always good because Chat GPT does have a tendency to what they call “hallucinate,” where it will basically just tell you something with a strong sort of confidence. Oh yeah, this must be right, because this computer has gone out and has a huge data set and it’s been able to pull this together. The text sounds really convincing, but it’s nonsense. And apparently in businesses like legal or finance with contracts, or medical and so on, hallucination is not considered quite such a good thing.

Malte: Yeah, I mean, this is like a big problem. And as I said, I think with all this awareness, first of all, there’s also a lot of noise. You also need to separate, okay, what is now really possible? What is meaningful, what really then creates value at the end? And then, yeah, I think it’s a lot also about assessing risks and failure modes. And I would say the most prominent one is for sure hallucinations. And yeah, I think the big problem is really that LLMs are wrong from time to time, and then they’re confident about it. So it’s not very easy to spot. Like the model would come up with a lot of arguments, a very well formulated explanation why this and this and that happened. So for example, if you ask GPT did Silicon Valley Bank collapse, it will most likely say no or make up some weird story how it collapsed in the financial crisis 2009 and bring up a lot of arguments. And we had this case I think last week in a demo. So I think people —

James: I mean given that we’re in an industry of tech people, the idea that they would say things and have arguments for them that were strong arguments whilst being completely wrong, I mean, you know, it doesn’t make it any different from us as far as I can see. But, yeah, there are sectors where that’s a problem.

Malte: And yeah, it just makes them very hard to spot, right? So you can basically not really trust and you really have to double check every kind of prediction that comes from these models. And it depends a lot on your use case. I mean, if it’s, I don’t know, doing a first draft of a contract that maybe is similar to a legal junior lawyer that does it and then you still check it? Perfectly fine. But what if you have a customer facing application? Will your customers really spot them — these hallucinations and mistakes? And I think this is where we are right now, investing a lot of time working with customers to basically figure out how can you spot them and how to actually also reduce these hallucinations.

James: Okay. So would you have like a toolset for doing that with your customers? Is that what you’re establishing?

Malte: Yeah. I mean, in the industry already, I would say a few options, a few tools that you can use. One that played very well for us and we have good experience with this retrieval augmentation. So basically the idea of — let’s take a plain vanilla LLM, there you would just put in your text, your prompt, like what you see in Chat GPT. Just the text. And then the model would query it internally, its own knowledge, its own parameter and spit out the answer. And the idea of retrieval augmentation is basically that you connect your language model to some dataset, some database, let’s say document store. And when you ask a question, the model first gets relevant pieces of information, relevant documents, and then uses that as a foundation for generating the answer. So all the generations the LLM will come up with are based on your documents and a subset of these documents. And this really helps to ground the language model on facts and actual information on these documents. And it also helps with explainability because you can always then trace back and say, oh, this is the answer that got generated. What kind of documents were used for that?

James: Okay. But so on the retrieval augmentation side, though, I mean, one of the — certainly one of the potential concerns is: hang on a minute – if we just all spend all of our time making OpenAI better then what’s our advantage if I’m the, you know, the German dude in his 80’s sort of thinking he’s got an information business and he doesn’t want to throw all of that into a third party platform necessarily. So in terms of retrieval augmentation, I mean, what is the concern or what are the approaches people can take in order that they can maintain or at least have a hope of some kind of moat?

Malte: Yeah, I mean, we are mainly talking and working with enterprise customers and I would say everything around data sovereignty, data privacy is obviously a big theme. It is for retrieval augmentation where you have your documents connected to the LLM. But it’s also, I would say, any more general use case where it might be just querying the OpenAI REST API. So in all of these cases, I think there are use cases where the data is critical, the queries are critical that you ask these models and you simply cannot send them to an external provider like OpenAI and often they can’t even leave your network. So yeah —

James: From a legal perspective, they can’t.

Malte: Yes. From simply from a legal perspective and I mean like yes, there are definitely use cases also in the enterprise where this is possible. But what we see is usually that there is always a mix and for some use cases it’s simply not possible. It’s under some restrictions. The data is highly, highly sensible. Talked about the one example already. There’s plenty of others from private equity firms, from banks, from aircraft manufacturers who simply say, hey, we know we have a few use cases for that, OpenAI perfectly fine. But actually there’s also, maybe a more interesting one, closer to our core business. But for those no, this won’t happen, we can’t give out that data. And so I would say kind of the data that is underlying a use case always dictates what kind of model you can choose. And this is I think right now a situation we are in for enterprise use cases. And what also makes me believe that the future won’t be a, like a monopoly, like a model monopoly, let’s say. Because if you think maybe back in December, January, February at least I felt like, wow, OpenAI is going to dominate the world and dominating at least the kind of model game. I think by now the dust has settled a bit and at least I can see it a bit clearer now, like my perspective on it.

James: Well it’s funny because there’s more dust, I think. I mean, there’s just so much going on every week. There’s more stuff. I mean, you know, if we look at what’s been happening with the local open source models, like LLaMA and so on, there’s just so much going on.

Malte: Yes. Yes. It’s insane to follow, but also very cool to see, especially on the open source side right now, how fast and how powerful the open source community is as well. So I think they really picked up pace after Chat GPT was released and a lot of research labs on the side, but also company side, just started training models, recreating alternatives, creating alternatives basically to to this closed source proprietary model of OpenAI. So we have LLaMA you already mentioned, there’s Alpaca from Stanford, which got released based on LLaMA. There’s Dall-e from Databricks, also like a very small model, but recreating most of the GPT experience. There’s just — two days ago there was Vicuna released from another group of UC Berkeley, Stanford, CMU. I think. So there’s a lot of development right now. And I’m sure there’re even more research labs right now and teams working on training similar models. And I think the interesting part is that training many of these models is not as expensive as one might think. So the ones that are not mentioned are based on LLaMA. But then the important part, the fine tuning part, just costed around $300, $400. It’s not the same budget you need for training such —

James: So now OpenAI is going to be disrupted. Is that what you’re telling us?

Malte: No, I think it will stay. I think they are definitely on a good track. But I think this market may be a bit like the cloud market. There will be different segments and people will find, I think, their spot. And I don’t think that OpenAI will just rule all of them because they can’t cover all these enterprise requirements that we discussed, like data sovereignty. I think not what they will ultimately rule. So I think as a company, you have good reasons to be more like multimodal similar to multi cloud and be kind of agnostic and say, okay, we have maybe some use cases that will go to Open AI, make sense there. For others, maybe performance is better with other models or because of data reasons we go for something more local. I think this is how I would see the future in one year from now.

James: So we haven’t really spoken at all about deepset. We’ve talked about the market as a whole. You came out of the world of natural language processing. That’s the thing that you’ve really been working on with an open source project called Haystack. Where does that fit into this world? If I’m a developer or, you know, an 88 year old German businessman, why should I come to your website? Why should I find out what you’re doing? What your role in this world? Yeah, I’d love it slightly business, but definitely from a developer perspective, you know, where and why should I bother trying to find you?

Malte: Yeah, so. We have both products. Haystack on the open source side, deepset Cloud on the commercial side. So Haystack is basically targeting what we call pragmatic builders. So it’s a developer framework written in Python. And we’re not targeting researchers, but rather people who say, Hey, like, I want to build this application, I want to get NLP in my into my product. And it basically helps them to connect all these underlying technologies that you need for it. So to really ship LNLP, you need models, yes, but there’s also much more databases, vector DBs for example. You need to assemble pipelines to take your raw documents, pre-process them, clean them, put them in this database, you need pipelines or nodes coming up like agents to really deal with your query, resolve it and give the answer. So it’s all about basically assembling those underlying technologies into a real NLP application. And this is basically what I would say the core of Haystack is, that you can do all of that in a very fast way and you don’t need to place any early technology bets. So you can start building a prototype with, I don’t know, vector DB A. And then later when you move to production, you realize, oh, maybe I need to switch it to another vector database and the interface won’t change your application, logic won’t change; you can easily swap them out.

James: What you’re effectively saying is you yourself see yourselves as a good place to get started because you’re not locking yourselves into a particular technology vis a vis database model and so on.

Malte: Yes, you get started quickly and you can basically really focus on building your application rather than writing all this kind of glue code from, I don’t know, this technology component to another one. So the focus is really on what you want to build with NLP. And yeah, this is basically on the open source side, on Haystack, on deepset Cloud. This is basically our commercial end to end developer platform. So not just a Python framework, but really a developer platform targeting enterprise use cases. And there it’s a lot about collaboration. So if you think about developing traditional software — I think it also changed a lot over the last 20 years from waterfall-ish processes to more iterative, more agile, allowing collaboration between different types of developers and the same thing is needed for basically NLP applications. We still are in a quite waterfall-y development flow often and with deepset cloud, we basically break this and say, Hey, here’s how you can build a fast demo in an hour. You can share it with end users to really collect feedback and understand if this is going in the right direction. You can collaborate between data scientists, between DevOps, between other engineers to really kind of ship fast and iterate on it.

James: Okay. So there you go. Back to the beginning. So what you’re saying is deepset Cloud: haptic AI! You’ve got that iteration, the ability to feel it, touch it and so on. Okay, good. We’re going to be doing a webinar as well, which I think we’re going to, you know, wear our suits and ties. Well not really, but yeah, that was great. Thank you very much. That’s a RedMonk Conversation. It was good to get into some of the technical stuff there at the end. Malte Pietsch, thank you so much for joining us. That’s another RedMonk Conversation.

Malte: Thank you, James.

James: Thank you.

Rather listen to this conversation as a podcast?

Transcript

More in this series

Conversations (111)