The database has been the center of gravity in data architecture for decades. It owned the storage, the query layer, and the interface.

That center of gravity is changing.

We’re seeing this structural erosion happening from multiple directions at once: cheaper and more reliable object storage, open standards like Apache Iceberg, commoditizing database engines, and AI-driven interfaces that bypass the database entirely. These advances collectively move the database from being the fulcrum of the data landscape to a component within it.

Hypothesis 1: The database engine is already not the point of differentiation

Before I joined RedMonk, I was a database administrator. I’ve watched the database market go through cycles of specialization and consolidation for my entire career. The current cycle is consolidation, and it’s been underway for a while.

My colleague Stephen wrote about the return to the general purpose database back in 2021, observing that the pendulum had swung back from specialized data stores toward consolidation, a trend that’s been driven by database vendors and enterprises alike.

This is driven by equal parts opportunism and customer demand. Each additional workload that a vendor can target, clearly, is a net new revenue opportunity. But it’s not simply a matter of vendors refusing to leave money on the table.

In many cases, enterprises are pushing for these functional expansions out of a desire to not have to context switch between different datastores, because they want the ability to perform things like analytics on a given in place dataset without having to migrate it, because they want to consolidate the sheer volume of vendors they deal with, or some combination of all of the above.

- A Return to the General Purpose Database

Evidence in the intervening years has reinforced the thesis. For example, vector databases exploded as a category in 2022-2023, and almost immediately the general purpose databases added vector extensions to their products. There is still a market for special purpose vector databases, but in many cases users have found adding vector functionality within their existing database to be “good enough.”

All of this implies: the database engine is already not the point of differentiation for many workloads.

Hypothesis 2: Object storage + Apache Iceberg is a structural shift

Object storage is designed to store massive volumes of unstructured data at low cost. Unlike a database, it doesn’t provide query capabilities or transaction management. It just stores objects reliably, durably, and cheaply. For a long time, that made it useful for backups and data archives but not much else.

The storyline began to shift in 2020 when Amazon S3 introduced strong read-after-write consistency, removing one of the key objections to using object storage as a foundation for data workloads.

And Apache Iceberg takes object storage’s usability story further.

Iceberg is an open table format specification. It adds a metadata layer on top of object storage to give users table-level semantics (schema, partitioning, snapshots, ACID transactions) without requiring a database to manage them. Essentially it allows blob storage to behave like tables rather than just … blobs.

Iceberg makes items in storage addressable by databases, data warehouses, and lakehouses in ways that weren’t previously practical.

Hypothesis 3: Transactions and analytics are colliding

The line between analytic and transaction engines has been blurring for a long time. The OLAP-OLTP distinction was often more about optimization tradeoffs than a hard architectural boundary. But the challenges of combining a transactional system with strong consistency guarantees optimized for writes with an analytical system designed for reads and data aggregation were real enough that most organizations ended up duplicating data between separate systems anyway.

The convergence of OLAP-OLTP systems is coming from several angles.

At the simplest level, people frequently choose to make a database work for both types of workloads. PostgreSQL is the obvious example. It’s a transactional database that plenty of organizations have pressed into service for reporting and analytics, even if it’s not optimized for it.

More architecturally significant is the trend of databases decoupling compute from storage. Traditionally a database bundled the query engine and the data together as one monolithic system, a la Oracle. Some databases have been built (or rearchitected) to separate the two to allow compute and storage to scale independently.

But not all decoupling is created equal in terms of this hypothesis. Some databases decoupled to proprietary storage layers, and others have decoupled to open standards like Iceberg. Once you decouple compute and storage using an open standard, the database starts to look less like a system of record and more like an engine.

Then there are systems natively designed to sit on top of external storage. DuckDB, for instance, can query directly against object stores and Iceberg tables. It doesn’t need to own the data at all.

And then there are the lakehouses, which started as analytical platforms and are acquiring their way into transactional territory. Both Databricks and Snowflake have acquired companies specializing in Postgres. Databricks acquired Neon and launched Lakebase. Snowflake acquired Crunchy Data and launched Snowflake Postgres.

So if I’m summarizing the range of approaches listed here:

Start with a transactional database and use it for analytics too
Architect a database so the compute and storage scale independently, but storage technology is proprietary
Architect a database to decouple storage and compute using open object storage standards
Start with an analytics platform, acquire a transactional engine and plug it in

The implication of all of this: OLAP-OLTP have been converging for a long time, but there are new and significant investments happening to bring transactional capabilities into analytics platforms. Plugging proven transactional engines into a platform built on object storage has economic benefits both of cheaper storage and potential to share workloads.

Hypotheses 2 and 3 together imply: the database increasingly does not own the data.

The database is still doing critical work, but it’s increasingly doing that work as a component within a larger architecture rather than as the center of gravity.

Hypothesis 4: An incoming new category

The previous hypotheses are about the database losing ownership of data down to the storage layer. This one is about the database losing ownership of the interface upward to models.

Right now the general public is rightfully skeptical about LLMs’ ability to do math, which can make it seem foolish to combine AI with data and analytics. But the ability to ask questions in natural language is fundamentally reshaping people’s expectations on how to use and interact with technology, and many vendors are exploring how to bring AI query functionality into data backends.

For example, Fundamental just emerged from stealth with $255M in funding and a product called Nexus, which they describe as a Large Tabular Model (LTM). It’s not just querying data; it’s interpreting data via a foundation model purpose-built for structured, tabular information. Unlike LLMs, it’s deterministic.

It’s early. But this is a space that will almost certainly see more investment and evolution.

What all this implies together

Key portions of the historic database value proposition are being disintermediated.

Specifically:

Standalone database products are increasingly competing not just with other databases but with adjacent categories.
Standardization of storage formats and the rise of zero-copy data in analytics means data gravity is less of a barrier to switching than it used to be.
Specialized query engines and proprietary query languages are not a durable point of value when natural language interpretation of data is on the table.
The rise of models and data platforms means the database is increasingly not the interface point for humans or agents interacting with data.

Across these shifts, value is moving toward shared storage, control planes, and higher-level semantic layers.

Major caveats

Databases aren’t going anywhere. A few important qualifications:

Databases are remarkably sticky technologies. It is a massive undertaking to rip and replace a database. While there is possibly a migration story here, most of these hypotheses apply to net-new or greenfield workflows.
Only a subset of systems will have transactional requirements flexible enough to tolerate the OLAP-OLTP convergence before things break. Plenty of workloads still need the guarantees that a dedicated OLTP system provides.
The economics of AI systems are unproven at scale, especially compared to traditional indexing and query optimization. A $255M Series A investment in LTMs as a category is a bet, not a proof point.

Conclusion

Databases are architecturally being subsumed. They’re being disintermediated as standalone products, even as they remain essential as components.

Disclosure: Amazon and Oracle are RedMonk clients. Databricks, DuckDB, Fundamental and Snowflake are not. This post represents my own analysis and hypotheses.

Alt + E S V

The Disintermediation of Databases

Hypothesis 1: The database engine is already not the point of differentiation

Hypothesis 2: Object storage + Apache Iceberg is a structural shift

Hypothesis 3: Transactions and analytics are colliding

Hypothesis 4: An incoming new category

What all this implies together

Major caveats

Conclusion

No Comments

Leave a Reply Cancel reply

About

Subscribe via Email

Recent Posts

Categories

Archives

Recent Comments