Everyone seems to have their own problems with the NoSQL term, so here’s mine: it doesn’t mean anything. Not that terms have to be specific to be useful: is the term database really that descriptive?
But the challenge with NoSQL is that the name implies that it means something, and that’s enough for folks new to the space to form opinions on the matter. For better, maybe, but mostly for worse.
The reason NoSQL exists is simple: the long time assumption that if persistence is the question, a relational database is the answer. As far back as March 2005, I’ve been skeptical of the sustainability of that assumption and expecting increased acceptance of non-relational datastores. The predicted adoption took a wee bit longer than I anticipated, but it’s here now. This outcome was inevitable.
Not because relational databases are inherently flawed and poised to go the way of the dinosaur: they’re going to be around as long as I’m in this business. Adoption was inevitable because, just as in every other walk of life, there are different tools for different jobs in the technology world. Which brings us to the issue at hand: when different jobs refers to any workload where a relational database is a less than ideal solution, your bucket is too big. As we see, daily, from the inquiries.
Lumped into the “NoSQL” bucket right now are tools as diverse as column databases, distributed databases, distributed filesystems, document databases, key value stores and even graph oriented databases. Even those categories blur: what’s the difference between a distributed database and a column database?
Exactly.
What do they have in common? Not a lot. Wikipedia implies that it’s about big data, and indeed some of the NoSQL stores scale remarkably well. But some don’t, intentionally. For all the talk of avoiding joins, eventual consistency, non-ACID compliance and such, the real common denominator for NoSQL stores is that they are generally not row/table oriented and they are mostly SQL ignorant. Except that, as Brian Aker points out, these distinctions may be nothing more than semantics. And as if that wasn’t complicated enough, SQL-like features are periodically being reintroduced via projects such as Pig.
Whatever your feelings on whether or not NoSQL is actually about SQL – Michael Stonebraker certainly doesn’t think so – defining an entire category of software by what it doesn’t do rather than what it does seems like a problem.
Which is part of the challenge for projects like Cassandra, CouchDB, Hadoop, HBase, HyperTable, InfiniDB, Memcache, MongoDB, Redis, Riak, Tokyo Cabinet/Tyrant, Voldemort et al. And part of the challenge, frankly, for folks that do what we do.
The good news is that the need for such tools is very real. As we’ve seen with projects like Drizzle, which was forked specifically because the design trajectory was not meeting the needs of a certain class of customer. Flawed though it may be, the NoSQL term is being applied to a real and accelerating pattern of adoption, and we’re seeing spikes in interest across the board. Which is why I anticipate strong, though not mainstream, uptake of the tools in 2010.
I wish we had a better designation than NoSQL, but I know better than to try to push that rock up a hill. Besides, some smart folks are only too happy to leave SQL behind. Looking further out, as we see heavier adoption within individual NoSQL software types, the unhelpful umbrella term may yet be retired, but until it does expect NoSQL to be a trending topic in 2010. Love it or hate it, the term isn’t going anywhere for a while. Even if it doesn’t mean anything.
Disclosure: Basho, a commerical backer of Riak, Cloudera, a commercial backer of Hadoop, IBM a commercial backer of CouchDB, Hadoop and Casssandra are RedMonk customers.