tecosystems

Data Layer Diversity: It’s Not Just Relational Anymore

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

When I mentioned back in March that relational databases were no longer the only persistence choice in a developer’s toolkit, I got some very interesting offline feedback.[1] A couple of folks doubted the long term potential of non-relational datastores, contending that while developers weren’t terribly fond of the relational structure nor the SQL used to navigate it, they’ve more or less learned to live with them. Others noted – with some smugness – that introducing multiple repository styles into a single architecture ran contrary towards my simplicity bias.

On the face of it, both of those criticisms are well taken: developers choose a platform such as MySQL in droves in part, at least, because it’s something they’ve learned to live with and is not overly complicated. The adoption numbers for MySQL alone are staggering; as Zack covers here, the Wall Street Journal reported not too long ago that MySQL’s been downloaded around 70 million times – and that number presumably doesn’t include folks like me who download MySQL from a convenient Gentoo mirror and run it on three or four in house machines. To put that in context, Firefox – a consumer product that’s usable to an order of magnitude more people than MySQL will ever be – has been downloaded somewhere north of 100 million times. And that’s not even discussing increasingly popular projects like Postgres – a commercially supported version of which is available from EnterpriseDB, Pervasive and Sun, among others – or Derby (AKA Cloudscape AKA Sun DB).

So the trend is clear, right? Relational stores are the once and future data persistence layer. Well, not so much yes as much as no. The success of those projects, to me, is more about the underpenetration in certain segments of the relational database market than the ongoing dominance of that particular mechanism. The reason I say that is simple: I’m speaking with more and more application providers that are having tremendous problems scaling, and are considering increasingly radical alternatives (That should not – please note – be taken to mean that RDBMS generally or Derby/MySQL/Postgres specifically cannot scale; I’m not one of those “LAMP is nothing more than an on ramp” people. Flickr, Google, LiveJournal and others have successfully proven to me that for certain workloads, LAMP can certainly be made to scale.)

But depending on the type of data, how often the data needs to be read, how often written and so on – RDBMS may or may not be the appropriate choice for persistence. It seems crazy to me, in this day and age, but I know a few developers that simply could not find a data engine to suit their needs and were forced to roll their own. This observation was validated for me yesterday when a data management vendor I spoke with informed me that one survey they’d seen indicated that a sizable percentage of embedded customers were indeed actively writing their own data stores.

Besides the wizards out there that are capable of creating their own repositories from scratch, I’m beginning to see a remarkable degree of heterogeneity emerge within the data layer. Rather than choose between, relational, non-relational, object oriented and the like – developers are choosing the all of the above option. We’re seeing hybridized infrastructures that use, for various intents and purposes, non-relational stores like Berkeley DB, object oriented stores such as db4objects, relational stores such as MySQL and Postgres, and – fascinatingly – file system technologies like MogileFS or ZFS.

Perhaps more interesting is that some of the projects and vendors perceive this need and are responding. db4objects, for example, just released a replication product that allows for the bidirectional synchronization of its object oriented database with traditional RDBMS like Oracle and MySQL via Hibernate. That’s not a bad feature to have in today’s heterogenous data layers.

While the ‘best-of-breed’ style data layer is in many cases old hat to architects of more complex, high scale systems – just ask some of Sleepycat’s customers – I find it remarkable that I’m beginning to see this amongst tiny startups and smaller internal projects. When smaller projects begin contemplating a multi-layered data layer incorporating various databases and filesystems, you know that scalability is a serious issue. Developers are still be interested in what’s simple, but not at the expense of scalability – and for many, that means something beyond the ‘cram-everything-in-a-relational-DB’ approach.

Disclaimer: db4objects, IBM, Sleepycat, and Sun are RedMonk clients and we’ve done work with Pervasive, while EnterpriseDB, MySQL, and Oracle are not.

[1] Dan Brackett, Mike Champion and anonymous commenter have some interesting public feedback on the piece as well – see the comments.