For about a year now, I’ve been making the rounds with vendors arguing a very simple point: that the traditional approach to persistence – cram everything into a relational database – was not appropriate for a substantial number of development challenges. I’m certainly not claiming credit for this insight; in many respects, I was merely the messenger for the developers that I was working with, who were desperately canvasing the field of specialized data storage and retrieval products in search of something – anything – that would meet their needs.
In the end, the majority of the relational critics did demphasize the relational store in favor of alternative or hybrid approaches, some even wrote their own data engines from scratch. It’s not that they – or I – would contend that the relational database doesn’t have a place of importance; it does, and will continue to. The argument, rather, was that the relational store shouldn’t be the only means of storing and manipulating data. The data layer, in other words, was bound to become more heterogeneous than it is today. Seems obvious, I know. But trust me, it wasn’t to lots of folks – and there are many who still don’t buy it.
But the problem was acute enough, and felt by enough high profile developers, that I expected a response sooner or later. Probably later, if the relational database vendors had anything to do with it. And while there’ve been glimmers here and there that indicate that others perceive the nature of the problem and the opportunity it represents – Oracle picking up Sleepycat, for example – there’s been no sea change from an architectural perspective.
Is that poised to change, however? The folks from O’Reilly – whom I’ve discussed this problem with previously, in an interchange with Nat – seem to think so. Tim O’Reilly just ran a terrific series of database focused entries, entitled database war stories, you can go get them here (1 (Second Life), 2 (Bloglines / memeorandum), 3 (Flickr), 4 (NASA), 5 (craigslist)). While it’s difficult to draw any general conclusions from the feedback, as the approaches taken differ significantly, it’s clear that heterogeneity is the rule, not the exception.
While all of the above is interesting, none of it really surprised me, because those stories are similar to ones I’ve heard before. Based on stories like the above, I think we’ll be seeing more and more hybrid data layers employed to address less traditional workloads, but the question that’s now occupying me is what’s next? Most of the approaches described above – leveraging distributed file systems, flat files, and so on – are not revolutionary, but harken back to mainframe era approaches. Is there room for anything new? I think so, and GData might be the first indication of what that “new” thing looks like.
Late in ’04, Adam Bosworth penned a piece called “Where Have All the Good Databases Gone?” He began it by saying:
About five years ago I started to notice an odd thing. The products that the database vendors were building had less and less to do with what the customers wanted. This is not just an artifact of talking to enterprise customers while at BEA. Google itself (and I’d bet a lot Yahoo too) have similar needs to the ones Federal Express or Morgan Stanley or Ford or others described, quite eloquently to me. So, what is this growing disconnect?
In that entry, Bosworth goes on to describe the three major problems not being adequately solved by traditional commercial databases: 1) Dynamic schema, 2) Dynamic partitioning, and 3) Modern indexing.
Why is this relevant? Because like Dare Obasanjo and Jeremy Zawodny, it seems to me that GData is almost certainly the product of Bosworth’s efforts. While the above entry details some of the problems, it doesn’t go into much depth as to what the solution to those problems might look like. Fortunately, he went into significant detail during a talk he gave at the MySQL Users Conference last year, but to my way of thinking it is this presentation (Powerpoint warning) that actually spells out his vision for a wire protocol, based on standards, that is:
- Massively Scalable
- Fully federated
- Completely loosely coupled
- Easy to implement
- Extend existing web protocols/formats
In case you’re following along at home, GData would appear to be a good start towards those goals. What comes next? It’s too early to say, but when I begin to think about repositories and engines with GData APIs, as Zawodny proposes for MySQL and the Lucene folks might build, things get very interesting.
Why? Because it could turn the web into a massive, writable, repository. It could enable a whole new generation of repository technologies.
Either way, I’m a believer that the next couple of years of should be interesting to follow; the current experimentation in data layer approaches could be nothing compared to what we see when GData and similar protocols become more common.