tecosystems

A Return to the General Purpose Database

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

Pendulum

A little over fifteen years ago, Adam Bosworth – then with Google, and formerly of BEA and Microsoft – noticed something interesting. For all that they represented the state of the art, the leading database vendors of the time – all of which were relational, of course – were no longer delivering what the market wanted. While their raw transactional performance and other typical criteria of importance to traditional technology executives continued to improve, their ability to handle ad-hoc schema changes or partition themselves across fleets of machines had not meaningfully changed.

Which meant that the best databases the market had to offer were neither developer friendly nor equipped to handle a shift to the scale out architectures that were already standard within large web shops and would become exponentially more popular with the advent of cloud infrastructure two years later.

This realization became more common over time, and just five years after Bosworth’s post was published there was an event called NoSQL 2009 which featured presentations from teams building “Hypertable, HBase, Voldemort, Dynomite, and Cassandra.” The NoSQL era, for all intents and purposes, was underway.

The term NoSQL itself was perhaps not the best choice, not least because basically every surviving database project that once prided itself on not having a query language would go on to add a query language – many of which are explicitly and deliberately SQL-like. But as a shorthand descriptor for non-relational databases it worked reasonably enough. The next decade plus of the NoSQL era was history that is well known at this point. Where once relational databases were, with rare exceptions such as BerkeleyDB, the general purpose data engine behind all manner of applications and workloads, the database market exploded into half a dozen or more categories, each of which had multiple competitive projects.

Relational remained a major category, to be sure, but instead of being the only category, it became one of several. Enterprises continued to buy relational databases, but increasingly also bought document databases, graph databases, columnar databases, in memory databases, key value stores, streaming platforms, search engines and more. The era of a single category of general purpose databases gave way to a time of specialization, with database types selected based on workload and need. The old three tier model in which the data tier was always a relational database exploded into multiple database types used alongside one another, often backing the same, single application in separate layers.

The affinity developers had and have for these more specialized database tools created enormous commercial opportunities. While many of the original NoSQL projects faltered, at times in spite of some inspired technical vision – think Riak – multiple database vendors have emerged from the original ill-named NoSQL category.

MongoDB went public in four years ago this month in October of 2017, Elastic followed a year later in October of 2018 and Confluent went public last June, Couchbase in July. Snowflake, for its part – which conflates hardware and software to the degree that they’re inseparable – had an offering that is arguably the largest for a software company ever. And many vendors that haven’t gone public yet are still popular, viable commercial entities. Neo4J raised $325M in June on a $2B valuation, the same valuation Redis Labs received when it raised $310M in April. Datastax, meanwhile, has been rumored to be headed to the public market for a few years now, and the list goes on: QuestDB ($2.3M), SingleStore ($80M) and TimescaleDB ($40M) have all taken money recently.

It’s not notable or surprising, therefore, that NoSQL companies emerged to meet demand and were rewarded by the market for that. What is interesting, however, is how many of these once specialized database providers are – as expected – gradually moving back towards more general purpose models.

This is driven by equal parts opportunism and customer demand. Each additional workload that a vendor can target, clearly, is a net new revenue opportunity. But it’s not simply a matter of vendors refusing to leave money on the table.

In many cases, enterprises are pushing for these functional expansions out of a desire to not have to context switch between different datastores, because they want the ability to perform things like analytics on a given in place dataset without having to migrate it, because they want to consolidate the sheer volume of vendors they deal with, or some combination of all of the above.

Developers, for their part, are looking at this as something of an opportunity to pave over some of the gaps in their experience. While no one wants to return to a world where the only realistic option is relational storage, the overhead today of having to learn and interact with multiple databases has become more burden than boon.

In any event, it is apparent that many datastores that were once specialized are becoming less so. A few examples:

  • Datastax, which commercializes Cassandra, the wide column store, has recently introduced competitive streaming capabilities via the Pulsar project.
  • Elastic, the driver of the Elasticsearch project, has expanded into adjacent markets like observability and security.
  • MongoDB, once strictly a document datastore, added ACID transactional guarantees a while back and now offers data lake and time series capabilities, not to mention charts and other analytical functionality.
  • Redis Labs, the company behind Redis, a popular in memory database, now offers modules for graph, JSON, search, time series and more.
  • SingleStore, which began its life as MemSQL, an in memory row store, has been rebranded to reflect its ambitions as a single datastore capable of handling relational data alongside of graph and time series data as well as JSON documents.

It’s worth noting, of course, that just as the specialized datastores are gravitating back in the direction of general purpose platforms, the general purpose platforms have been adding their own specialized capabilities – most notably in PostgreSQL with its ability to handle JSON, time series and GIS workloads in addition to traditional relational usage.

There are many questions that remain about this long anticipated shift in the market – among them: will this trend accelerate due to competitive pressures? What impacts will this have on database packaging, and by extension, adoption? What will the dynamics be of a market in which developers and enterprises are offered specialized primitives from large clouds versus independent general purpose database platforms? And will the various specialized database markets be asymmetrically vulnerable to this kind of intrusion from adjacent competitors?

But what does not seem arguable is the idea that the pendulum in the database market that spent the last decade plus swinging away from general purpose workloads, has clearly changed direction and is now headed back towards them at a rate and pace yet to be determined.

Disclosure: Couchbase, Datastax, MongoDB, Neo4J, QuestDB, Redis Labs and SingleStore are RedMonk clients. Confluent, Elastic, Snowflake and TimescaleDB are not currently clients.