It’s hard to remember now, but a decade ago the idea of non-relational databases was a foreign one. Outside of successful and widely adopted alternatives such as Berkeley DB, generally the word database could reasonably be assumed to mean relational database. When we wrote about the possibility of non-relational alternatives then eleven years ago last March, the general reaction was a shrug, consternation or both.
As developers increasingly took control of the decision making processes around technology selection, however, they looked outside the enterprise to the likes of Google for architectural inspiration, and non-relational databases first emerged and then exploded. From a consolidated handful of enterprise-oriented relational databases which are still the backbone for millions of existing applications, the database market added a wide variety of new specialized database types: columnar, distributed storage and process, document, graph, in-memory, key-value and more.
Each of these categories began with the creation of specialized engines that excelled at a particular task, but that also involved tradeoffs traditional database buyers were unfamiliar with. Hadoop’s Map Reduce, for example, was less accessible to traditional DBAs (at least until companies such as Facebook wrote SQL-like interfaces such as Hive), but it could attack larger scale datasets than was practical with traditional relational databases, and it could do so far more efficiently.
The database market today, then, looks very different than the database market of a decade ago. The traditional relational databases are all still around, but they are increasingly one of many databases employed in a given business rather than the database employed.
Just as it was clear a decade ago that the market would be expanded, however, it is equally apparent today that the database market is poised for change. Functionally, we will continue to see a steady, even accelerating evolution of new approaches – fueled in large part by the release or replication of technologies developed at companies occupying the bleeding edge of web scale. Strategically, however, the available evidence suggests we should look for two major shifts in market.
A Return to General Purpose Datastores
By necessity, most of the major emergent non-relational database platforms of the last decade – projects such as Cassandra, Hadoop, MongoDB or Redis – were specialized in their design. In order to compete with the incumbent general purpose relational database platforms, their focus was asymmetric. Of all of the technology categories, database buyers have perhaps the least tolerance for risk. Which means that to justify using something other than the tried and true relational database technologies that had evolved and been improved over decades, alternatives couldn’t just be a little bit faster or a little bit more accessible: they had to be an order of magnitude improvement or more.
This, plus their built-from-scratch nature, inevitably produced a host of new database software that was highly differentiated from the traditional relational databases in approach, scale and function. Which is how we ended up with a database market composed of half a dozen or more relatively distinct categories.
Inevitably, however, these specialized platforms will seek to become less specialized over time. Much as lightweight, developer–friendly MySQL steadily added features such as stored procedures and triggers due to enterprise demand, many of today’s vertical non-relational stores will trend back towards their general purpose, relational ancestors.
In many ways, this transition has been underway for some time.
- The category, for example, that once defined itself by its universal lack of a structured query language has been, in many cases, working to add that functionality. Software once born out of frustration with databases shackled to SQL, in other words, have been compelled over time to add query languages that look very like SQL.
- Or look back to MongoDB’s acquisition of WiredTiger, a storage engine built by some of the same architects of Berkeley DB once upon a time. In many respects, the WiredTiger acquisition was a crucial step for MongoDB, because it allowed the database once prized for its ease of use and accessibility to add features such as better write performance, compression and document-level locking. The kinds of features, again, which enterprises look for.
- Cloudera, for its part, began as a company with a single mission: applying Map Reduce and HDFS to data problems of unusual size and complexity. Today, its message is much broader, encompassing the traditional batch workloads, but search, streaming and SQL on top of those. Much more general purpose.
This trend will continue. As enterprises have acclimated to more complex functional markets, their willingness to purchase commercial solutions or support in specialized categories has ticked up. This is incentive both for players in market, but also players in nearby or adjacent markets. Importantly, open source has also lowered barriers to entry between these markets, as some of the core developmental costs can be offset and because in some cases necessary integration work between projects has already been performed.
The key question facing the market around this development, however, concerns developers. There is little question, historically, that enterprise buyers have preferred to consolidate purchases amongst the fewest possible number of suppliers. Which means that all-in-one, general purpose offerings will be welcome to CIOs and other purchasing agents. Developers, however, have historically advantaged more specialized, single purpose offerings over general purpose alternatives.
Which means that while the trend towards general purpose commercial datastores is seemingly inevitable, its outcome is not. It will be important for commercial vendors making such a transition to ensure that their developer engagement and narrative is at least as strong as the one-throat-to-choke message they can present buyers. Because otherwise, as the market performance of general purpose relational databases suggests, even perfect buyer messaging cannot make up for a lack of developer interest and adoption.
The Rise of as-a-Service Offerings
If the developer appetite for general purpose non-relational on premise solutions is uncertain, however, the interest in Database-as-a-Service offerings is not. This has been evident for some time, as the two fastest growing services in the history of Amazon’s fast growing Web Services business are Redshift (datacenter-as-a-service) and Aurora (MySQL-compatible proprietary database). Merger and acquisition activity in the space has likewise been steady: IBM, after previously acquiring Cloudant, bought Compose. Elastic purchased Found. Both of which follow acquisitions such as CenturyLink/Orchestrate.io and Rackspace/ObjectRocket. And of course there is organic development. MongoDB’s Atlas service [coverage], for example, is very likely the shape of things to come. As are updates such as Cloudera’s 5.8 drop which added Impala support for AWS’s S3.
As-a-service offerings have limitations relative to on premises alternatives: they have less of a track record, latency between the database and compute tiers can be an issue, and it can be difficult to migrate large scale datastores to the cloud if only because of network limitations. But the advantages of instant-on, pay-as-you-go services that allow developers to make the database and everything that comes with it someone else’s problem have proven to be more than attractive enough to offset those and other concerns. Convenience, as ever, will trump just about everything more often than it won’t. Faced with the prospect of a fraught selection of the appropriate database, scaling it as required, protecting it and keeping it backed up, many developers are opting out.
Further, while as-a-service databases are compelling enough as stand-alone options, they also stand to benefit from general market adoption of cloud services. If your workloads are on premises, DB-as-a-service offerings are significantly disadvantaged, but with so much growth coming from cloud at the expense of on premises alternatives, the growth opportunities for hosted databases are substantial. This is true for base IaaS, but even more true for cloud services operating at higher levels of abstraction such as PaaS or serverless. If you’re content to outsource the infrastructure for your application, you’re more likely to do so for your database as well.
The Net
Given that the database market is subject to the same market forces as other enterprise categories, on premises software both specialized and general purpose is likely to be a tightening market over time. There is a great deal of revenue to be had in the category, without question, but it will be more difficult to obtain as there is more competition generally, more competition from open source specifically and on premises alternatives increasingly compete directly with service based alternatives. These price pressures are one reason vendors are increasingly moving back towards general purpose datastores from specialized roots: the broader the functional capabilities, the wider the addressable market, at least in theory.
Even long time database incumbents, however, are scrambling to develop or acquire their way into service-based businesses because that is where much of the growth will occur. Whether adopted as more convenient stand-alone alternatives to on premises databases or deployed in conjunction with other cloud infrastructure, DBaaS offerings are attractive to both developers and their employers, if for very different reasons. Importantly, this is the case in spite of the fact that the DBaaS market is in its infancy; many popular databases are not yet available as services, and those that are don’t yet have the provider choice that they will ultimately. Which implies that the DBaaS market has been successful, to a degree, in spite of itself.
From a provider perspective, then, a choice is implied. The existing spend on on premises relational solutions is measured in tens of billions of dollars, which is why many database providers today still regard their primary market competition as Oracle, even if they’re selling non-relational solutions. Vendors focused on trajectory, however, tend to see Amazon as the more important target, given that the most common report when talking to purveyors of on premises software is that a significant percentage of their existing customers are already in the cloud and most of those are on Amazon.
Implied choice or not, however, a legitimate market approach is also not to choose. On premises providers in most cases will need to follow the lead of players like MongoDB, because competing with DBaaS players without an as-a-service option is a non-starter. Worse, competitors that operate as-a-service businesses have an enormous intelligence advantage over purely on premises competitors, thanks to the difference in available operational telemetry. But neither should on premises vendors deprecate their existing business; instead, they can differentiate from pure play as-a-service options by offering customers their choice of running in an existing datacenter or in the cloud.
What is clear, however, is that a status quo approach in this market is one that will lead to diminishing returns over time. Choose your path carefully.
Disclosure: Amazon, CenturyLink, Cloudera, MongoDB and Oracle are RedMonk customers. Rackspace is not.