tecosystems

YourSQL, MySQL, and NoSQL: The MySQL Conference Report

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

The basic reality is that the risks that scare people and the risks that kill people are very different.” – Peter Sandman, the New York Times via Freakonomics

There has never been a time, in my opinion, that MySQL has faced a more diverse set of threats than at present. Of these, one gets a disproportionate amount of the attention: Oracle’s stewardship, and the implications this has for the future of the database.

This is understandable. For most in the MySQL orbit, Oracle was the enemy, making it a suboptimal home for the most popular open source database on the planet. But the real question facing that community is whether or not Oracle should be the primary concern, or whether it’s focusing on the risks that scare it at the expense of those that could kill it. To explore this question, let’s turn to the Q&A.

Q: Before we begin, do you have anything to disclose?
A: Yes indeed. MySQL and Sun were, prior to the acquisition, RedMonk customers. We’ve also worked with Oracle in the past. Certain MySQL and Oracle competitors such as IBM and Microsoft are also RedMonk customers. So read into the following what you will.

Q: Ok. Let’s start with some of the easy stuff first: how was the conference attendance and such?
A: Pretty good. The show floor was pretty sparse, but that, I’m told, is partially due to the disruption caused by the extended acqusition process of Sun/MySQL by Oracle. This acted to depress sales of booths and so on. The attendee head count was only slightly down, from what I’m told, with the keynotes packed and many of the sessions similarly full.

Q: How about the general product news? Good or bad?
A: Users certainly seem happy. Facebook’s Mark Callaghan said in his keynote that 5.1 had turned out to be a good release – “surprise!” – and Craigslist’s Jeremy Zawodny and Smugmug’s Don MacAskill are both very excited by what they see coming in 5.5.4. So the immediate future looks positive from a development standpoint.

Q: Let’s turn back to the opener at the top. I’m confused: are you arguing that Oracle’s stewardship is not a concern? That we should just trust Oracle?
A: No. I’m arguing that there are greater and more pressing concerns than Oracle’s handling of the project.

Q: Given the fact that Oracle owns the copyright and trademark to the MySQL asset, still employs most of the developers, and is the only entity entitled to dual license the codebase, can there be a more important question than its stewardship?
A: Let’s consider the plausible outcomes:

  1. Oracle intends to keep developing MySQL as a weapon against, among other products, Microsoft’s SQL Server
  2. Oracle intends to kill the project, and will do so explicitly by limiting or narrowly focusing development.
  3. Oracle intends to kill the project, and will do so implicitly by deliberate indifference.

Personally, I still believe the first outcome is the most likely. I’m also on record as arguing that the second is much less of a problem than people believe, because if Oracle acts publicly in bad faith, this would create – immediately – massive commercial opportunity for a second player or players. The last scenario is clearly the most problematic, as an Oracle that didn’t do anything bad but simply failed to do anything good could leave the community paralyzed and unable to muster a suitable response. Still, I consider this the least likely of the three potential outcomes.

Is Oracle’s handling of the MySQL project important? Quite obviously the answer is yes. Whether you agreed with Sun’s valuation of the project at one billion, MySQL is clearly a valuable asset.

It’s my contention, however, that there are more pressing concerns than this facing the MySQL community.

Q: And those would be?
A: What MySQL is to be going forward.

Q: What does that mean?
A: How would you describe the high level trajectory of the MySQL project? A superficial sketch might read something like this: a small, easy to use developer database goes open source, gets major traction in the web stack and becomes in a relatively short period of time the most popular database, period, on the planet. With revenue a challenge, the company makes two decisions important to the future of the project. First, they leverage the dual license model which trades the ability to incent and consume community contributions for revenue. Second, they target the enterprise relational database market. The good news is that enterprise customers actually want to pay for their software. The bad news is that the enterprise wants features that make the database less relevant and less usable for the web customers, MYSQL’s base. MySQL is able to finesse the needs of different markets well enough to turn its sub $100 million revenue stream into a 10X multiple, but the tension between what enterprise users ask for and what everyone else wants is more and more apparent, until some of the developers cry uncle and fork the project.

Q: The fork being Drizzle?
A: Correct. Drizzle can be thought of as a return to MySQL’s roots. A refactoring that repositions the MySQL codebase as the default database for web providers. Except that web providers, not getting what they wanted out of MySQL or the alternatives, were frantically rolling their own datastores and releasing them as open source projects.

Q: Which web providers? And what were they building?
A: Web providers such as Amazon, Facebook, Google and Twitter. Amazon has Dynamo, Google BigTable, Facebook tried to combine the best of both of those in Cassandra, and Twitter’s using that project as well as rolling their own graph database, FlockDB. These developments represent a challenge for MySQL.

Q: Why?
A: Because the projects are frequently directly competitive with MySQL. While NoSQL and relational database advocates may passionately debate the merits of one approach or the other, the fact is that some very smart people working for these properties looked around, found the available relational solutions – MySQL included – wanting and built their own non-relational engines. Twitter, for example, currently runs off of MySQL but has parallel Cassandra nodes running continuously. The latter has advantages for Twitter, most notably that they can do rolling restarts while the MySQL infrastructure cannot be efficiently taken down.

Q: So MySQL is doomed? Facebook et al are wholesale migrating to NoSQL alternatives?
A: Not at all. Callaghan, for example, gave an excellent talk at the MySQL conference about how Facebook uses the relational database to service some tremendously impressive workloads: 170 million reads a second, for example. MySQL still has, and will in all likelihood continue to have, a role to play at the web firms. Few if any of the NoSQL data stores are mature enough to handle the broad range of workloads a RDBMS can, so they are most often used in conjunction with one another or with databases like MySQL. The issue is that it is no longer the default choice for these players; it’s but one choice among many.

Q: Meaning that there’s a potential identity crisis brewing?
A: Exactly. If MySQL isn’t the default database of the web anymore, what is it? It’s not likely to be the default database of the enterprise – the big three are too entrenched, and Postgres is preferred by many for those kinds of workloads. Adding to the problem are the distributions.

Q: What distributions?
A: MySQL has had for a few years now distributions of the original MySQL code that are distinct from one another. Monty Widenius, one of the founders of MySQL, maintains one such over at Monty AB called MariaDB. Percona has another, OurDelta one more. Drizzle was mentioned above. One of the points of discussion at this conference were the relative identity needs for these flavors of MySQL. Customers need to understand why they might pick one over another given their individual requirements, and with the exception of Drizzle which has a fairly clearly mandate, this is not currently the case. Confusion about choices is never good for adoption.

Q: Where does the cloud fit into all of this?
A: The cloud’s a challenge for a few reasons. The Platform-as-a-Service cloud platforms act to conflate previously distinct software layers, for one. What database will you use with App Engine, Azure or Force.com, for example? The one they give you, which in none of those cases is a flavor of MySQL. Hosted versions of MySQL such as Amazon’s RDS are certainly available, and have been for years from providers like Rackspace, but the majority of the available PaaS cloud platforms use proprietary databases over MySQL. Second, the cloud poses certain architectural challenges. Drizzle’s approach acts to mitigate these, which probably explains why Rackspace has hired that entire team, but the traditional MySQL codebase needs to adapt itself to the cloud.

Q: Between the NoSQL stores and some of the non-relational cloud databases like the version of BigTable available in App Engine, isn’t the value of SQL itself questionable at this point?
A: You hear that a lot, and certainly there are some use cases where a non-relational store is simply a better fit than a traditional relational database. Key-value stores like Redis or Voldemort, for example, will be a fundamentally better option for a certain class of persistence needs than MySQL or any other relational database. There’s a reason people use memcached and MySQL side by side, remember.

That said, reports of the death of SQL are, to borrow from Twain, greatly exaggerated. There’s a reason that Hive is among the most popular interfaces for the Hadoop project: as it turns out, a lot more people know how to query a dataset using SQL than how to write a MapReduce job. It’s far from perfect, but SQL is effective, well known and understood, and time tested. It’s not going anywhere, in other words, NoSQL or no NoSQL.

Q: What do you think MySQL and its ecosystem should be doing about all of the above?
A: A few things.

  1. Oracle needs to begin setting expectations. Bad news, we like to say at RedMonk, is always better than no news. While it’s understandable that Oracle’s being cautious with its messaging given that many of MySQL’s worldwide employees are not yet integrated as Oracle employees, some questions can and should be answered now. Oracle’s silence is causing problems and generating uncertainty. Even if some of the decisions being made are likely to be bad news, being definitive about them can only help.
  2. The individual members of the MySQL ecosystem need to do a better job of articulating their particular focus areas. Granted, they’re all MySQL derivatives so there’s going to be a fair amount of overlap, but they tend to have differentiated customers. Being clear and articulate about the kind of customers each is targeting would be of substantial benefit to newcomers to the community, many of whom have only recently become aware there are such things as MySQL “distributions.”
  3. Related to the above, MySQL and its derivatives need some clarity on the types of workloads they intend to service. The project has had success, particularly in the revenue area, straying from its base customer – the web properties, but future success in either area is not guaranteed. What is MySQL today, and what does it want to be in future? It was the database of the web, but that’s increasingly a title that it is ceding not to a single competitor, but to dozens. It is probably not, in Oracle’s hands, going to be the database of the enterprise. What does that leave? What’s the market it intends to own?
  4. MySQL should be attempting, wherever possible, to interoperate and integrate with NoSQL stores. Rather than trying to fight the tide, as many in the RDBMS space are, MySQL should figure out – as it clearly has with memcache – how to play nicely with the other children. The database that is friendliest towards these data stores is going to be a popular database indeed.

Q: Has MySQL peaked?
A: On a relative popularity basis, probably. It is not likely to ever eclipse the current marketshare in future, if only because there is so much more competition than there used to be. Time was that MySQL had a few competitors in the open source RDBMS space like PostgreSQL, a few competitors in the embedded space, and obvious competiton for enterprise workloads with the web a virtual greenfield. These days there are quite literally new datastores coming out of the woodwork weekly. Most of these will not survive, but in such a crowded marketplace, it will be difficult if not impossible for any competitor to achieve again the share that MySQL enjoys today. Including MySQL itself.

But on an actual relevance basis, it is far from clear that MySQL has peaked. MySQL’s open source nature and ability to consume other data stores as storage engines makes it flexible enough to adapt to a fast changing data persistence landscape. When I spoke with Eric Day of Rackspace and the Drizzle team, we discussed the possibility of combining Cassandra and Drizzle in some interesting and creative ways. These are the conversations that the MySQL ecosystem should be having, and it’s a real positive to hear that that is taking place.

Either way, MySQL remains the most popular database in the world. If it stopped being developed today, it’s still not likely that that fact would change in the next five years. So whatever Oracle decides to do, it remains one of the most important open source projects in the world. And if MySQL can look beyond the short term Oracle concerns to the bigger picture changing data persistence landscape, it will be one for a long time to come.

Update: Somehow autosave got mixed up and forgot to include a thank you to the Drizzle project’s Brian Aker, both for organizing the MySQL Ecosystem Summit at the conference and extending to me an invitation. I also need to thank Tim O’Reilly and his team for providing space and logistical support, not to mention power strips.