Milos Rusic, Malte Pietsch, and Timo Möller founded deepset in 2018 in Berlin. Inspired by the launch of Google’s seminal natural language (NLP) processing model – Bidirectional Encoder Representations from Transformers (BERT) – the trio trained a BERT model in German, and soon after created an enterprise services company to address semantic search opportunities at European industrial giants such as Airbus, Siemens and BaFIN, the financial regulatory authority for Germany.
As companies needed help and hand-holding in addressing the opportunities of AI, deepset led with consulting, training and solution engineering. The company became a more product-focused concern when it created and open sourced the Haystack Python framework, but now it’s moving to a clearer focus on selling product rather than services with its deepset Cloud platform – in order to take advantage of the cloud/AI flywheel.
Today, while startups and tech giants alike are scrambling to address the large language model (LLM) opportunity, deepset can credibly claim 5 years of experience building tools to address the opportunities of AI-driven natural language processing.
deepset’s vision statement
We help enterprise product teams successfully build and launch NLP-powered applications. There’s still a huge disconnect — when implementing NLP — between the data science and the application development teams, not mentioning the challenges of the enterprise product owners. The NLP product lifecycle often resembles a long-cycle ‘waterfall’ approach, doesn’t align with the modern application development, and is still very far from being an ‘agile’ NLP. We’ve put years of our field expertise and know-how’s into deepset Cloud so that the product teams can define their application architecture, start building, and show results quickly to the business users — minimizing the risks of failure, greatly improving time-to-results, and optimizing the costs of implementation.
- 50 employees
- double digit customers
- $43M raised to date (excluding pre-seed and seed rounds):
- Series A – April 2022: $14M led by GV with Harpoon Ventures, Acequia Capital.
- Series B – August 2023: $30M led by Balderton Capital with participation from GV and Harpoon Ventures
- Initial sales focus in EMEA, now targeting sales in the USA and growing headcount accordingly
deepset Cloud is an enterprise-featured SaaS for building custom NLP-powered applications. The managed platform supports enterprise features for search, content summarisation and generation, and Retrieval-Augmented Generation (RAG), a reinforcement learning technique designed to reduce LLM’s AI “hallucinations” – that is, wrong answers confidently stated by an LLM-based system.
RAG is all the rage in the fast-moving LLM world right now, potentially offering the means to reduce the risks of “hallucinations”, one of LLM’s biggest drawbacks. It allows you to blend both generative and retrieval-based approaches. Information retrieval is about fetching information from datasets in order to provide relevant responses, while generative models–such as Generative Pre-trained Transformer (GPT) models–generate text from the model itself – creative perhaps but sometimes deeply wrong. RAG then is likely to be a critically important approach in safer, more trusted AI; deepset is working with customers using RAG to improve the NLP products they’re building. Enterprises want to use their own datasets to augment LLM models.
deepset Cloud also offers the ability to extract named entities or recurring information from unstructured text data, a compelling use case for enterprises with large document corpuses. Aimed at developers creating AI-driven pipelines and applications, the pitch is all about simplicity. The story according to deepset:
You’ll have your first prototype at breakfast, feedback by lunch, and an integrated NLP service before dinner. The deepset Cloud platform solves problems like infrastructure, evaluation, demo UI, and feedback mechanisms so that you can focus on what truly drives your business forward.
deepset Cloud helps customers that want to adopt a multi-model approach rather than relying solely on, say, OpenAI. The platform can be used to compare different models such as GPT-4, Llama-v2 or Claude, allowing enterprises to experiment with different models and compare results.
With deepset Cloud customers can manage access with multi factor authentication and single sign on, with proprietary company data resident in your virtual private cloud (VPC).
The AI market has frankly become hot to the point of absurdity in 2023. Every tech pitch is an LLM pitch. As such it can be hard to be heard among the noise. OpenAI, Hugging Face and LangChain have all gained a great deal of attention, but there is room for enterprise class, trusted LLM plays. As such, deepset sees its primary competition in hyperscale cloud providers such as Amazon Web Services, Google Cloud, and Microsoft Azure.
The enterprise market for LLM technology is itself becoming increasingly crowded, as vendors strive to address concerns about data privacy, the risk of hallucinations, automated bias and so on. Microsoft, for example, offers a combination of Microsoft Azure OpenAI and Azure Cognitive Search to allow customers to “safely” train their own OpenAI models based on their own data. Meanwhile OpenAI in August launched ChatGPT Enterprise, which claims to offer “enterprise-grade security and privacy”. From a customer perspective, however, there is a lot of fear, concern and uncertainty around sharing data or information or text archives with OpenAI.
That’s one reason Microsoft has committed that inputs, outputs, embedding etc, are not available to other customers and are not used to improve OpenAI models, or any Microsoft or 3rd party products or services. Similarly, Salesforce launched what it calls the Einstein GPT Trust Layer to help prevent large-language models (LLMs) from retaining sensitive customer data.
Go to Market
deepset has always worked directly with large enterprise customers. It already has significant experience working with the largest enterprises and public sector organisations to deliver natural language processing apps, and as such is in a decent position to allay concerns about leaks, hallucinations and so on. While deepset offers a cloud platform, it’s happy to work with enterprises that want to host and/or run their own platforms.
Per the list of enterprise features denoted in the “Product Information” section above, deepset Cloud is not just “hosted Haystack.” However, deepset leverages interest in its Haystack open source project to drive product sales, and it’s worth spending some time on the open source framework because it helps to define deepset’s approach, particularly its focus on implementing a development process more akin to agile than waterfall. Haystack is designed for software engineers rather than data scientists. deepset is focusing on allowing developers to “just build”, while taking advantage of AI and LLM technology as it emerges (at a furious rate). deepset’s focus is on allowing customers to build NLP enabled products and applications, with iterative feedback loops. Haystack has more than 10k GitHub stars and 1,400 forks, clearly a healthy project – deepset even claims it for status as a defacto standard. Haystack doesn’t compete directly with Hugging Face, and actually takes advantage of the Hugging Face model hub to enable faster feedback loops.
In terms of vertical industry use cases, deepset is initially focusing on finance, legal, insurance, and public sector. Text extraction and classification are at a premium in these industries.
deepset plans to establish a “pincer movement” of top down and bottom up, selling to business owners it identifies that are empowered to build teams to build AI applications, but also appealing directly to developers as influencers and builders.
Partnerships & Ecosystem:
The complexity and fast moving nature of the space means that deepset is focusing on partnerships, notably channel partners like Atos and Evotek, cloud marketplaces including AWS, and tech GTM partners Doppler and Snyk.
Disclosure: deepset is a RedMonk client, but is an independent piece of research that has not been commissioned by any entity. AWS, GitHub, Google Cloud, Microsoft, and Salesforce are also RedMonk clients.