tecosystems

A Return to the General Purpose Database

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Pendulum

A little over fifteen years ago, Adam Bosworth – then with Google, and formerly of BEA and Microsoft – noticed something interesting. For all that they represented the state of the art, the leading database vendors of the time – all of which were relational, of course – were no longer delivering what the market wanted. While their raw transactional performance and other typical criteria of importance to traditional technology executives continued to improve, their ability to handle ad-hoc schema changes or partition themselves across fleets of machines had not meaningfully changed.

Which meant that the best databases the market had to offer were neither developer friendly nor equipped to handle a shift to the scale out architectures that were already standard within large web shops and would become exponentially more popular with the advent of cloud infrastructure two years later.

This realization became more common over time, and just five years after Bosworth’s post was published there was an event called NoSQL 2009 which featured presentations from teams building “Hypertable, HBase, Voldemort, Dynomite, and Cassandra.” The NoSQL era, for all intents and purposes, was underway.

The term NoSQL itself was perhaps not the best choice, not least because basically every surviving database project that once prided itself on not having a query language would go on to add a query language – many of which are explicitly and deliberately SQL-like. But as a shorthand descriptor for non-relational databases it worked reasonably enough. The next decade plus of the NoSQL era was history that is well known at this point. Where once relational databases were, with rare exceptions such as BerkeleyDB, the general purpose data engine behind all manner of applications and workloads, the database market exploded into half a dozen or more categories, each of which had multiple competitive projects.

Relational remained a major category, to be sure, but instead of being the only category, it became one of several. Enterprises continued to buy relational databases, but increasingly also bought document databases, graph databases, columnar databases, in memory databases, key value stores, streaming platforms, search engines and more. The era of a single category of general purpose databases gave way to a time of specialization, with database types selected based on workload and need. The old three tier model in which the data tier was always a relational database exploded into multiple database types used alongside one another, often backing the same, single application in separate layers.

The affinity developers had and have for these more specialized database tools created enormous commercial opportunities. While many of the original NoSQL projects faltered, at times in spite of some inspired technical vision – think Riak – multiple database vendors have emerged from the original ill-named NoSQL category.

MongoDB went public in four years ago this month in October of 2017, Elastic followed a year later in October of 2018 and Confluent went public last June, Couchbase in July. Snowflake, for its part – which conflates hardware and software to the degree that they’re inseparable – had an offering that is arguably the largest for a software company ever. And many vendors that haven’t gone public yet are still popular, viable commercial entities. Neo4J raised $325M in June on a $2B valuation, the same valuation Redis Labs received when it raised $310M in April. Datastax, meanwhile, has been rumored to be headed to the public market for a few years now, and the list goes on: QuestDB ($2.3M), SingleStore ($80M) and TimescaleDB ($40M) have all taken money recently.

It’s not notable or surprising, therefore, that NoSQL companies emerged to meet demand and were rewarded by the market for that. What is interesting, however, is how many of these once specialized database providers are – as expected – gradually moving back towards more general purpose models.

This is driven by equal parts opportunism and customer demand. Each additional workload that a vendor can target, clearly, is a net new revenue opportunity. But it’s not simply a matter of vendors refusing to leave money on the table.

In many cases, enterprises are pushing for these functional expansions out of a desire to not have to context switch between different datastores, because they want the ability to perform things like analytics on a given in place dataset without having to migrate it, because they want to consolidate the sheer volume of vendors they deal with, or some combination of all of the above.

Developers, for their part, are looking at this as something of an opportunity to pave over some of the gaps in their experience. While no one wants to return to a world where the only realistic option is relational storage, the overhead today of having to learn and interact with multiple databases has become more burden than boon.

In any event, it is apparent that many datastores that were once specialized are becoming less so. A few examples:

  • Datastax, which commercializes Cassandra, the wide column store, has recently introduced competitive streaming capabilities via the Pulsar project.
  • Elastic, the driver of the Elasticsearch project, has expanded into adjacent markets like observability and security.
  • MongoDB, once strictly a document datastore, added ACID transactional guarantees a while back and now offers data lake and time series capabilities, not to mention charts and other analytical functionality.
  • Redis Labs, the company behind Redis, a popular in memory database, now offers modules for graph, JSON, search, time series and more.
  • SingleStore, which began its life as MemSQL, an in memory row store, has been rebranded to reflect its ambitions as a single datastore capable of handling relational data alongside of graph and time series data as well as JSON documents.

It’s worth noting, of course, that just as the specialized datastores are gravitating back in the direction of general purpose platforms, the general purpose platforms have been adding their own specialized capabilities – most notably in PostgreSQL with its ability to handle JSON, time series and GIS workloads in addition to traditional relational usage.

There are many questions that remain about this long anticipated shift in the market – among them: will this trend accelerate due to competitive pressures? What impacts will this have on database packaging, and by extension, adoption? What will the dynamics be of a market in which developers and enterprises are offered specialized primitives from large clouds versus independent general purpose database platforms? And will the various specialized database markets be asymmetrically vulnerable to this kind of intrusion from adjacent competitors?

But what does not seem arguable is the idea that the pendulum in the database market that spent the last decade plus swinging away from general purpose workloads, has clearly changed direction and is now headed back towards them at a rate and pace yet to be determined.

Disclosure: Couchbase, Datastax, MongoDB, Neo4J, QuestDB, Redis Labs and SingleStore are RedMonk clients. Confluent, Elastic, Snowflake and TimescaleDB are not currently clients.

6 comments

  1. The pendulum metaphor is apt. It might be illuminating to consider NoSQL’s poor cousin – NewSQL which never caught attention like NoSQL did. [I think MemSQL might have worn that mantle once] – Fair to say NewSQL and NoSQL fates have merged under banner of global distributed database, a place where Google Spanner deserves a shout out.

  2. Disclaimer: I work for couchbase.

    Oh boy, I could spend hours talking about that. In fact, I have a talk exactly about this topic.
    I think on the surface looks like databases are converging, but they aren’t. They are just covering small use cases that in the past would force you to buy/use a new database.

    For instance, Postgres extended its support for JSONB, which is awesome for small scenarios, but it can clearly not get anywhere close to being used as a Document Database. Couchbase added support for transactions, but it is not designed to run all your financial operations in it, etc. So the movement looks correct to me, you can pick a specialized data store for the core requirement of your application and leverage the same infrastructure for some additional use cases. I think it is all about TCO. But if you overuse stuff that is not part of the core, you will eventually need to migrate to something else.

  3. Oh honey, NoSQL and time series databases are not new things, were not new things 15 years ago and even before relational databases were a thing, there was THE database that can do it all! The legacy of MUMPS/ANSI-M/MSM and so many commercial implementations lives on in Intersystems IRIS (former knowns as Cache & M-11) and the public domain GT.M

  4. […] A Return to the General Purpose Database 💥 […]

  5. I think a lot of app architects where choosing tech like NoSQL datastores simply because they were trendy and we all like the new shiny thing. But what get’s underestimated is how much ramp up time is required to go from a rudimentary understanding to being able to successfully implement and support this new thing in production. Not that we shouldn’t utilize alt db technologies that are purpose built for specific scenarios. App cache, yeah i’m going Redis, not traditional rdbms (although technically yes, SQL server for example can used an in memory cache). But, i think there’s gotta be a strong and very specific use case – versus taking 15 minutes to standup a Postgres db (just as an example) that everyone already knows how to use and will meet the reqs with no issue.

    On the flip side, you want to let your team try new things and keep up data on new tech… so they’re that angle too.

  6. Legacy databases also added capabilities in the areas mentioned. They just weren’t considered sexy by fresh grads out of school. Robber-baron licensing fees didn’t help either. Makes me wonder though if the new databases will end up looking like the old databases as time goes by as the amount of clutter builds up and support costs rise and presto we are back at square one.

Leave a Reply

Your email address will not be published. Required fields are marked *