tecosystems

Data Layer Diversity: It’s Not Just Relational Anymore

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

When I mentioned back in March that relational databases were no longer the only persistence choice in a developer’s toolkit, I got some very interesting offline feedback.[1] A couple of folks doubted the long term potential of non-relational datastores, contending that while developers weren’t terribly fond of the relational structure nor the SQL used to navigate it, they’ve more or less learned to live with them. Others noted – with some smugness – that introducing multiple repository styles into a single architecture ran contrary towards my simplicity bias.

On the face of it, both of those criticisms are well taken: developers choose a platform such as MySQL in droves in part, at least, because it’s something they’ve learned to live with and is not overly complicated. The adoption numbers for MySQL alone are staggering; as Zack covers here, the Wall Street Journal reported not too long ago that MySQL’s been downloaded around 70 million times – and that number presumably doesn’t include folks like me who download MySQL from a convenient Gentoo mirror and run it on three or four in house machines. To put that in context, Firefox – a consumer product that’s usable to an order of magnitude more people than MySQL will ever be – has been downloaded somewhere north of 100 million times. And that’s not even discussing increasingly popular projects like Postgres – a commercially supported version of which is available from EnterpriseDB, Pervasive and Sun, among others – or Derby (AKA Cloudscape AKA Sun DB).

So the trend is clear, right? Relational stores are the once and future data persistence layer. Well, not so much yes as much as no. The success of those projects, to me, is more about the underpenetration in certain segments of the relational database market than the ongoing dominance of that particular mechanism. The reason I say that is simple: I’m speaking with more and more application providers that are having tremendous problems scaling, and are considering increasingly radical alternatives (That should not – please note – be taken to mean that RDBMS generally or Derby/MySQL/Postgres specifically cannot scale; I’m not one of those “LAMP is nothing more than an on ramp” people. Flickr, Google, LiveJournal and others have successfully proven to me that for certain workloads, LAMP can certainly be made to scale.)

But depending on the type of data, how often the data needs to be read, how often written and so on – RDBMS may or may not be the appropriate choice for persistence. It seems crazy to me, in this day and age, but I know a few developers that simply could not find a data engine to suit their needs and were forced to roll their own. This observation was validated for me yesterday when a data management vendor I spoke with informed me that one survey they’d seen indicated that a sizable percentage of embedded customers were indeed actively writing their own data stores.

Besides the wizards out there that are capable of creating their own repositories from scratch, I’m beginning to see a remarkable degree of heterogeneity emerge within the data layer. Rather than choose between, relational, non-relational, object oriented and the like – developers are choosing the all of the above option. We’re seeing hybridized infrastructures that use, for various intents and purposes, non-relational stores like Berkeley DB, object oriented stores such as db4objects, relational stores such as MySQL and Postgres, and – fascinatingly – file system technologies like MogileFS or ZFS.

Perhaps more interesting is that some of the projects and vendors perceive this need and are responding. db4objects, for example, just released a replication product that allows for the bidirectional synchronization of its object oriented database with traditional RDBMS like Oracle and MySQL via Hibernate. That’s not a bad feature to have in today’s heterogenous data layers.

While the ‘best-of-breed’ style data layer is in many cases old hat to architects of more complex, high scale systems – just ask some of Sleepycat’s customers – I find it remarkable that I’m beginning to see this amongst tiny startups and smaller internal projects. When smaller projects begin contemplating a multi-layered data layer incorporating various databases and filesystems, you know that scalability is a serious issue. Developers are still be interested in what’s simple, but not at the expense of scalability – and for many, that means something beyond the ‘cram-everything-in-a-relational-DB’ approach.

Disclaimer: db4objects, IBM, Sleepycat, and Sun are RedMonk clients and we’ve done work with Pervasive, while EnterpriseDB, MySQL, and Oracle are not.

[1] Dan Brackett, Mike Champion and anonymous commenter have some interesting public feedback on the piece as well – see the comments.

5 comments

  1. Do you have any info on MySQL and other OSS DB's being used as an "embeded" or bunbled DB in products? And how that scales?

    The reason I ask is because many of the scaling success stories — "Flickr, Google, LiveJournal" — are SAAS-based apps where scaling can mean "throw more hardware at it. But, in the product case (when the DB is embeded/bundled), that's not so much the case.

    As I'm sure you know, I'm not one of those "on ramp" people either: BUT, I haven't come across much about using the LAMP stack (or parts of it like MySQL, etc.) in "shrink-wrapped" products.

  2. Some years ago, Netscape launched a e-commerce store that was an Web server serving some HTML forms and, in the backend, an LDAP server (Netscape Directory Server 3.x).
    I'm not saying that LDAP is the best protocol for it but, since those stores were in production in several sites, it proves that it can be done.

  3. cote: i'd have to dig around, but most of the open source vendors have some embedded success stories. db4ojects, in particular, has had a substantial degree of success in that arena.

    Jaime: interesting – what's the advantage of using LDAP in that fashion? the lightweight aspects?

  4. Well the reasonings at the time were cost (Oracle and all were too expensive), Administering an LDAP server was a lot easier than adminestering a DB and, Netscape was interested in promoting LDAP (and had a lot of know how in developing over LDAP).
    I don't think most of the reasons still hold today but, in a world where one size does not fit all, this just proves that there are other ways.

  5. Sometimes an RDBMS is just plain overkill. FeedLounge does a very large number of key->value lookups, and the database was just overloaded with the writes.

    Moving some of those reads/writes to a Berkeley DB backend uses almost no CPU, and the database is free for other work.

    FeedLounge is also starting to use MogileFS to offload a large portion of the content processing.

    Since our application is not just 'read mostly', the architecture has to get a bit more complex to deal with it. Tradeoffs for the greater good.

    Note that FeedLounge has always and will always use a RDBMS for a portion of the system, but sometimes other storage/persistence models are more suited for the job.

Leave a Reply to Cote' Cancel reply

Your email address will not be published. Required fields are marked *