tecosystems

Amazon DynamoDB: First Look

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

This paper described Dynamo, a highly available and scalable data store, used for storing state of a number of core services of Amazon.com’s e-commerce platform. Dynamo has provided the desired levels of availability and performance and has been successful in handling server failures, data center failures and network partitions. Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request load.

Dynamo allows service owners to customize their storage system to meet their desired performance, durability and consistency SLAs by allowing them to tune the parameters N, R, and W.
– “Dynamo: Amazon’s Highly Available Key-value Store [PDF],” Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels


In October 2007, Amazon published a paper describing an internal data store called Dynamo. Incorporating ideas from both the database and key-value store worlds, the paper served as the inspiration for a number of open source projects, Cassandra and Riak being perhaps the most visible of the implementations. Until yesterday, these and other derivative projects were the only available Dynamo implementations available to the public, because Amazon did not expose the internally developed database as an external service. With Wednesday’s launch of Amazon DynamoDB, however, that is no longer true. Customers now are able to add Amazon to their potential list of NoSQL suppliers, although to be fair they’ve technically been in market with SimpleDB previously.

The following are some points of consideration regarding the release, its impact on the market and likely customer questions.

AWS versus Hosted

The most obvious advantage of DynamoDB versus its current market competition is the fact that it’s already in the cloud, managed and offering consolidated billing for AWS customers. Requiring minimal setup and configuration versus native tooling, a subset of the addressable market is likely to be of a similar mindset to this commenter on the DataStax blog:

“Cassandra’s tech is superior, as far as I can tell. But we’ll probably be using DynamoDB until there is an equivalent managed host service for Cassandra. Moving to Cassandra is simply too expensive right now.

All those are clearly better served by a service like DynamoDB than trying to run their own Cassandra clusters unless they happen to be very proficient in Cassandra administration and want to dedicate precious human resources to administration. That takes a lot of the benefits of “cloud” away from small and mid-sized companies where cost and management are the limiting factors.”

For many, outsourcing the installation, configuration and ongoing management of a data infrastructure is a major attraction, one that easily offsets a reduced featureset. Like Platform-as-a-Service (PaaS) offerings, DynamoDB offers time to market and theoretical cost advantages when required capital expense and resource loading are factored in.

Like the initial wave of PaaS platforms, however, DynamoDB is available only through a single provider. Unlike Amazon’s RDS, which is essentially compatible with MySQL, DynamoDB users will be unable to migrate off of the service seamlessly. The featureset can be replicated using externally available code – via those projects that were originally inspired by DynamoDB, for example – but you cannot at this time download, install and run DynamoDB locally.

It’s true that the practical implications of this lack of availability are uncertain. NetFlix’ Adrian Cockroft, for example, asserts that migration between NoSQL stores is less problematic than between equivalent relational alternatives, because of the lower complexity of the storage, saying “it doesn’t take a year to move between NoSQL, takes a week or so.” It remains true, however, that there are customers that postpone upgrades to newer versions of the same database because of the complexity involved. And that’s without considering the skills question. Given the uncertainty involved, then, it seems fair to conclude that the proprietary nature of DynamoDB and the potential switching costs will be – at least in some contexts – a barrier to entry.

The question for users is then similar to that facing would be adopters of first generation PaaS solutions: is the featureset sufficient to compel the jeopardizing of later substitutability? Amazon clearly believes that it is, its competitors less so. EMC’s Mark Chmarny, additionally, notes that Amazon may be advantaging adoption at the expense of migration in its pricing model.

Competition

DynamoDB clearly has the attention of competitive projects. Basho – the primary authors of Riak – welcomed DynamoDB in this post while pointing out the primary limitation, and DataStax wasted little time spinning up a favorable comparison table. One interesting aside: the Hacker News discussion of the launch mentioned Riak 23 times to Cassandra’s three.

Basho and Datastax are right to be concerned, because the combination of Amazon’s increasingly powerful branding and the managed nature of the product make it formidable competition indeed. The question facing both Amazon and competitors is to what extent substitutability matters within the database space. Proprietary databases have had a role in throttling the adoption of PaaS services like Force.com and Google App Engine in the past, but we have very few market examples of standalone, proprietary Database-as-a-Service (DaaS) offerings from which to forecast. Will DaaS or more properly NoSQL-as-a-Service be amenable to single vendor products or will they advantage, as they have in the PaaS space, standardized platforms that permit vendor choice?

The answer to that is unclear at present, but in the meantime expect Amazon to highlight the ease of adoption and vendors like Basho and DataStax to emphasize the potential difficulties in exiting, while aggressively exploring deeper cloud partnerships.

NoSQL Significance

It’s being argued in some quarters that DynamoDB is the final, necessary validation of the NoSQL market. I do not subscribe to this viewpoint. By our metrics, the relevance of distinctly non-relational datastores has been apparent for some years now. Hadoop’s recent commercial surge alone should have been sufficient to convince even the most skeptical relational orthodoxies that traditional databases will be complemented or in limited circumstances replaced by non-relational alternatives in a growing number of enterprises.

Throughput Reservation

Perhaps the most compelling new feature of Amazon’s new offering isn’t, technically speaking, a feature. Functionally, the product is (yet) another implementation of the ideas in the Dynamo paper; Alex Popescu has comprehensive notes on the feature list. Receiving the most attention aren’t technical capabilities like range queries but rather the concept of provisioned throughput, levels which can be dynamically adjusted up or down.

This type of atomic service level provisioning is both differentiating and compelling for certain customer types. Promising single digit latency at a selected throughput level with zero customer effort required is likely to be attractive for customers that require – or think they require – a particular service level. And by requiring customers to manually determine their required provisioning level, Amazon stands to benefit from customer overprovisioning; customers will feel pain if they’re under-provisioned and react, but conversely may fail to observe that they’re over. Much like mobile carriers, Amazon wins in both scenarios.

Timing

With DynamoDB having been extant in some form since at least 2007, one logical question is: why now? Amazon did not detail their intent with respect to timing when they prebriefed us last week, but their track record demonstrates a willingness to be first to market balanced with an understanding of timing.

In 2006, Amazon launched EC2 and S3, effectively creating the cloud market. This entrance, however, was built in part from the success of the Software-as-a-Service (SaaS) market that preceded it; Salesforce, remember, went public in 2004. With enterprises now acclimated to renting software via the network, the market could be considered primed for similar consumption models oriented around hardware and storage.

Three years later after the debut of EC2 and S3, and one year after MySQL had achieved ubiquity sufficient to realize a billion dollar valuation from Sun [coverage], Amazon launched the first cloud based MySQL-as-a-Service offering [coverage]. That same year, the first year that Hadoop was mainstream enough to justify its own HadoopWorld conference, Amazon launched Elastic MapReduce.

The pattern is clear: Amazon is unafraid to create a market, but attempts to temper the introductions with market readiness. Logic suggests that the same tactic is at work here.

NoSQL has, as a category, crossed the chasm from interesting science project to alternative data persistence mechanism. But while NoSQL tools like Cassandra and Riak are available in managed form via providers like Joyent and Heroku, DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.”

It is also possible that SSD pricing contributed directly to the launch timing, with pricing for the drive type down to levels where the economics of a low cost shared service finally make sense.

SSDs

One underdiscussed aspect to the Dynamo launch is the underlying physical infrastructure, which consists solely of SSDs. This is likely one of the major contributing factors to the performance of the system, and in some cases will be another incentive to use Amazon’s platform as many traditional datacenters will not have equivalent SSD hardware available to them.

The Net

While discussion of the DynamoDB offering will necessarily focus on functional differentiation between it and competitive projects, it is likely that initial adoption and uptake will be primarily a function of attitudes regarding lock-in. For customers that want to run the same NoSQL store on premise and in the cloud, DynamoDB will be a poor fit. Those who are optimizing for convenience and cost predictability, however, may well prefer Amazon’s offering.

Amazon would clearly prefer the latter outcome, but both are likely acceptable. Amazon’s history is built on releasing products early and often, adjusting both offerings and pricing based on adoption and usage.

In any event, this is a notable launch and one that will continue to drive competition on and off the cloud in the months ahead.

Disclosure: Basho is a RedMonk client, while Amazon and DataStax are not.

7 comments

  1. […] Amazon DynamoDB: First Look: NoSQL has, as a category, crossed the chasm from interesting science project to alternative data persistence mechanism. But while NoSQL tools like Cassandra and Riak are available in managed form via providers like Joyent and Heroku, DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.” […]

  2. […] a great opportunity to do it. In fact, Stephen O’ Grady of Redmonk highlights this fact in his blog post. The most obvious advantage of DynamoDB versus its current market competition is the fact that […]

  3. […] have a great opportunity to do it. In fact, Stephen O’ Grady of Redmonk highlights this fact in his blog post. The most obvious advantage of DynamoDB versus its current market competition is the fact that […]

  4. Good points

    You can create up to 256 tables, each provisioned for 10,000 reads and 10,000 writes per seconds
    http://aws.typepad.com/aws/2012/01/amazon-dynamodb-internet-scale-data-storage-the-nosql-way.html
    1/ DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.”

    2/ Dynamo has provided the desired levels of availability and performance and has been successful in handling server failures, data center failures and network partitions.
    – Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request load.

    3/Throughput Reservation: Receiving the most attention aren’t technical capabilities like range queries but rather the concept of provisioned throughput, levels which can be dynamically adjusted up or down.
    4/ SSD: One underdiscussed aspect to the Dynamo launch is the underlying physical infrastructure, which consists solely of SSDs. This is likely one of the major contributing factors to the performance of the system,
    It is also possible that SSD pricing contributed directly to the launch timing, with pricing for the drive type down to levels where the economics of a low cost shared service finally make sense.

  5. […] issue surfaced for me again the other day. I was reading Stephen O'Grady's excellent write-up of Amazon's DynamoDB release. I know that the space around DynamoDB is important, so I reached out […]

  6. […] That’s correct as well – by satisfying the advanced customer need for provisioned throughput, Amazon introduces a new pricing model, as summarized in section Throughput Reservation within the elaborate technical and business analysis Amazon DynamoDB: First Look: […]

Leave a Reply

Your email address will not be published. Required fields are marked *