Blogs

RedMonk

Skip to content

Amazon DynamoDB: First Look

This paper described Dynamo, a highly available and scalable data store, used for storing state of a number of core services of Amazon.com’s e-commerce platform. Dynamo has provided the desired levels of availability and performance and has been successful in handling server failures, data center failures and network partitions. Dynamo is incrementally scalable and allows service owners to scale up and down based on their current request load.

Dynamo allows service owners to customize their storage system to meet their desired performance, durability and consistency SLAs by allowing them to tune the parameters N, R, and W.
- “Dynamo: Amazon’s Highly Available Key-value Store [PDF],” Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels


In October 2007, Amazon published a paper describing an internal data store called Dynamo. Incorporating ideas from both the database and key-value store worlds, the paper served as the inspiration for a number of open source projects, Cassandra and Riak being perhaps the most visible of the implementations. Until yesterday, these and other derivative projects were the only available Dynamo implementations available to the public, because Amazon did not expose the internally developed database as an external service. With Wednesday’s launch of Amazon DynamoDB, however, that is no longer true. Customers now are able to add Amazon to their potential list of NoSQL suppliers, although to be fair they’ve technically been in market with SimpleDB previously.

The following are some points of consideration regarding the release, its impact on the market and likely customer questions.

AWS versus Hosted

The most obvious advantage of DynamoDB versus its current market competition is the fact that it’s already in the cloud, managed and offering consolidated billing for AWS customers. Requiring minimal setup and configuration versus native tooling, a subset of the addressable market is likely to be of a similar mindset to this commenter on the DataStax blog:

“Cassandra’s tech is superior, as far as I can tell. But we’ll probably be using DynamoDB until there is an equivalent managed host service for Cassandra. Moving to Cassandra is simply too expensive right now.

All those are clearly better served by a service like DynamoDB than trying to run their own Cassandra clusters unless they happen to be very proficient in Cassandra administration and want to dedicate precious human resources to administration. That takes a lot of the benefits of “cloud” away from small and mid-sized companies where cost and management are the limiting factors.”

For many, outsourcing the installation, configuration and ongoing management of a data infrastructure is a major attraction, one that easily offsets a reduced featureset. Like Platform-as-a-Service (PaaS) offerings, DynamoDB offers time to market and theoretical cost advantages when required capital expense and resource loading are factored in.

Like the initial wave of PaaS platforms, however, DynamoDB is available only through a single provider. Unlike Amazon’s RDS, which is essentially compatible with MySQL, DynamoDB users will be unable to migrate off of the service seamlessly. The featureset can be replicated using externally available code – via those projects that were originally inspired by DynamoDB, for example – but you cannot at this time download, install and run DynamoDB locally.

It’s true that the practical implications of this lack of availability are uncertain. NetFlix’ Adrian Cockroft, for example, asserts that migration between NoSQL stores is less problematic than between equivalent relational alternatives, because of the lower complexity of the storage, saying “it doesn’t take a year to move between NoSQL, takes a week or so.” It remains true, however, that there are customers that postpone upgrades to newer versions of the same database because of the complexity involved. And that’s without considering the skills question. Given the uncertainty involved, then, it seems fair to conclude that the proprietary nature of DynamoDB and the potential switching costs will be – at least in some contexts – a barrier to entry.

The question for users is then similar to that facing would be adopters of first generation PaaS solutions: is the featureset sufficient to compel the jeopardizing of later substitutability? Amazon clearly believes that it is, its competitors less so. EMC’s Mark Chmarny, additionally, notes that Amazon may be advantaging adoption at the expense of migration in its pricing model.

Competition

DynamoDB clearly has the attention of competitive projects. Basho – the primary authors of Riak – welcomed DynamoDB in this post while pointing out the primary limitation, and DataStax wasted little time spinning up a favorable comparison table. One interesting aside: the Hacker News discussion of the launch mentioned Riak 23 times to Cassandra’s three.

Basho and Datastax are right to be concerned, because the combination of Amazon’s increasingly powerful branding and the managed nature of the product make it formidable competition indeed. The question facing both Amazon and competitors is to what extent substitutability matters within the database space. Proprietary databases have had a role in throttling the adoption of PaaS services like Force.com and Google App Engine in the past, but we have very few market examples of standalone, proprietary Database-as-a-Service (DaaS) offerings from which to forecast. Will DaaS or more properly NoSQL-as-a-Service be amenable to single vendor products or will they advantage, as they have in the PaaS space, standardized platforms that permit vendor choice?

The answer to that is unclear at present, but in the meantime expect Amazon to highlight the ease of adoption and vendors like Basho and DataStax to emphasize the potential difficulties in exiting, while aggressively exploring deeper cloud partnerships.

NoSQL Significance

It’s being argued in some quarters that DynamoDB is the final, necessary validation of the NoSQL market. I do not subscribe to this viewpoint. By our metrics, the relevance of distinctly non-relational datastores has been apparent for some years now. Hadoop’s recent commercial surge alone should have been sufficient to convince even the most skeptical relational orthodoxies that traditional databases will be complemented or in limited circumstances replaced by non-relational alternatives in a growing number of enterprises.

Throughput Reservation

Perhaps the most compelling new feature of Amazon’s new offering isn’t, technically speaking, a feature. Functionally, the product is (yet) another implementation of the ideas in the Dynamo paper; Alex Popescu has comprehensive notes on the feature list. Receiving the most attention aren’t technical capabilities like range queries but rather the concept of provisioned throughput, levels which can be dynamically adjusted up or down.

This type of atomic service level provisioning is both differentiating and compelling for certain customer types. Promising single digit latency at a selected throughput level with zero customer effort required is likely to be attractive for customers that require – or think they require – a particular service level. And by requiring customers to manually determine their required provisioning level, Amazon stands to benefit from customer overprovisioning; customers will feel pain if they’re under-provisioned and react, but conversely may fail to observe that they’re over. Much like mobile carriers, Amazon wins in both scenarios.

Timing

With DynamoDB having been extant in some form since at least 2007, one logical question is: why now? Amazon did not detail their intent with respect to timing when they prebriefed us last week, but their track record demonstrates a willingness to be first to market balanced with an understanding of timing.

In 2006, Amazon launched EC2 and S3, effectively creating the cloud market. This entrance, however, was built in part from the success of the Software-as-a-Service (SaaS) market that preceded it; Salesforce, remember, went public in 2004. With enterprises now acclimated to renting software via the network, the market could be considered primed for similar consumption models oriented around hardware and storage.

Three years later after the debut of EC2 and S3, and one year after MySQL had achieved ubiquity sufficient to realize a billion dollar valuation from Sun [coverage], Amazon launched the first cloud based MySQL-as-a-Service offering [coverage]. That same year, the first year that Hadoop was mainstream enough to justify its own HadoopWorld conference, Amazon launched Elastic MapReduce.

The pattern is clear: Amazon is unafraid to create a market, but attempts to temper the introductions with market readiness. Logic suggests that the same tactic is at work here.

NoSQL has, as a category, crossed the chasm from interesting science project to alternative data persistence mechanism. But while NoSQL tools like Cassandra and Riak are available in managed form via providers like Joyent and Heroku, DynamoDB is, in Popescu’s words: “the first managed NoSQL databases that auto-shards.”

It is also possible that SSD pricing contributed directly to the launch timing, with pricing for the drive type down to levels where the economics of a low cost shared service finally make sense.

SSDs

One underdiscussed aspect to the Dynamo launch is the underlying physical infrastructure, which consists solely of SSDs. This is likely one of the major contributing factors to the performance of the system, and in some cases will be another incentive to use Amazon’s platform as many traditional datacenters will not have equivalent SSD hardware available to them.

The Net

While discussion of the DynamoDB offering will necessarily focus on functional differentiation between it and competitive projects, it is likely that initial adoption and uptake will be primarily a function of attitudes regarding lock-in. For customers that want to run the same NoSQL store on premise and in the cloud, DynamoDB will be a poor fit. Those who are optimizing for convenience and cost predictability, however, may well prefer Amazon’s offering.

Amazon would clearly prefer the latter outcome, but both are likely acceptable. Amazon’s history is built on releasing products early and often, adjusting both offerings and pricing based on adoption and usage.

In any event, this is a notable launch and one that will continue to drive competition on and off the cloud in the months ahead.

Disclosure: Basho is a RedMonk client, while Amazon and DataStax are not.

by-nc-sa

Categories: AltDB, Cloud.

Tags: , , , , , ,

What’s in Store for 2012: A Few Predictions

The cost of delaying my 2012 predictions is that one has already come to pass. Nginx – the web server now powering all of the redmonk.com properties – passed IIS according a January 4 Netcraft release. Because the quantitative data available to us has indidicated surging interest in the alternative web server – the logical result of which was a commercial response – we’ve been expecting something like this. But of course we can’t count this as a prediction any longer because it’s January 13th.

Here instead are a few things that have not yet come to pass, but will, I believe, in the year ahead. These predictions are informed by historical context and built off my research, quantitative data that’s available to me externally or via RedMonk Analytics, and the conversations I’ve had over the past twelve months, both digital and otherwise. They cover a wide range of subjects because we at RedMonk do.

For context, my 2010 predictions graded out as 66% accurate while 2011′s were 82% correct.

With that, the 2012 predictions.

Data & The Last Mile

It is not technically correct to assert that large scale data infrastructure is a solved problem. Decades of innovation remain, as the Cambrian explosion of projects demonstrate. It is nevertheless true that relative to the user interface, data storage and manipulation is a solved problem. Since the original creation of Hadoop in 2006, for example, we have seen multiple user interfaces applied: connectors (e.g. R), standard MapReduce, scripting (e.g. Jaql/Pig), SQL (e.g. Hive), spreadsheets (e.g. BigSheets), client tooling (e.g. Karmasphere). Each has its strengths, none bridges the last mile: putting the power of Big Data in the hands of ordinary users.

Which is perhaps unsurprising; even the mature relational database world uses abstractions of varying levels of complexity to interface with business users. But with data driven decision making on the rise, premiums are being placed on tooling which can expose in sensible fashion data to those without degrees in computer science. Hence, the elevated visibility of startups such as Metamarkets, who excite data scientists with tools like Druid but whose valuation may ultimately depend on its last mile expertise.

At this point in time, whatever my preferred model for data storage and whatever the type, there will be greater than one credible option for a data engine. The same cannot be said for presentation. Which would be less problematic if the market for Big Data talent were not so desperate; outsourcing to shops like Mu Sigma will be an option in some quarters, but comes with its own inefficiences and risks, not to mention per inquiry premiums.

This, then, will be an area of focus in 2012, for both innovation (look for assisted anomaly and correlation identification, a la Google Correlate) and M&A.

Desktop Importance Declines

The most interesting characteristic of the forthcoming Windows 8 release isn’t the technology, which is curious because it’s revolutionary from a Microsoft standpoint. From the support for ARM to the addition of the Windows Store to the ability to author in JavaScript and HTML5, there is much to digest. Instead, the single most defining characteristic of the pending launch is apathy.

Overall inquiries and discussion of the platform demonstrate curiosity but limited interest; the visibility of the once dominant Windows platform is secondary to mobile platforms like Android and iOS.

While this is not a function of any specific or general design failures on the part of Microsoft – indeed, the platform is incorporating important changes while making itself more developer accessible – it is symptomatic of a broader and more difficult to attack problem: the declining role of the desktop.

The desktop is simply not as important as it once was. Mobile usage is eroding the central role PC’s once played; while they are still the dominant form of computing, the trendline is declining and there is no reason to expect it to invert. It’s been suggested that mobile computing in general is additive; that it’s being used to extend the usage of computing to areas where PCs were not employed, and is thus non-competitive. But our data as well as Asymco‘s indicates that, at least in part, mobile usage is coming at the expense of traditional platforms. General search volume data, as we’ve seen, validates this assertion.

There are two implications here. Most obviously, Microsoft’s ability to generate interest in and thus leverage for its flagship operating system is jeopardized. Worldwide developer populations are not necessarily zero sum as skills overlap, but they tend to be rivalrous; an Android or iOS developer is often a lost potential Windows developer – experiments like BlueStacks aside. We can therefore expect Microsoft to have to expend more effort to attract fewer developers to their platform, a negative cycle which becomes cyclical. Second, as the desktop’s primacy abates, we can expect to see greater competition in the marketplace. As enterprises become by necessity more heterogeneous, incorporating Android and iOS devices, the costs of supporting second operating systems drifts towards marginal, which means that forecasts of greater Apple penetration become more probable.

Developer Shortages

It’s become axiomatic that industry hiring is all demand and short supply, and none of our clients expect any relief in the year ahead. Nor will they receive it. Shortages for in demand skillsets will continue over the next twelve months, advantaging entities that are either geographically positioned to leverage markets less competitive than the Valley or with the logistical ability to incorporate remote hires.

That said, we will in 2012 see the first steps towards a more rational market, through a combination of cultural shift and educational model innovation that will increase supply. Regarding the former, it’s no secret that technology has had a profound impact on the erosion of middle class jobs. In Race Against the Machine, MIT Professors Andrew McAfee and Erik Brynjolfsson document the role that rapid innovation has had on jobs:

Digital technologies change rapidly, but organizations and skills aren’t keeping pace. As a result, millions of people are being left behind. Their income and jobs are being destroyed, leaving them worse off in absolute purchasing power than before the digital revolution.

Even skill industries are not immune. From John Markoff’s New York Times piece, “Armies of Expensive Lawyers, Replaced by Cheaper Software“:

“From a legal staffing viewpoint, it means that a lot of people who used to be allocated to conduct document review are no longer able to be billed out,” said Bill Herr, who as a lawyer at a major chemical company used to muster auditoriums of lawyers to read documents for weeks on end. “People get bored, people get headaches. Computers don’t.”

While Brynjolfsson and McAfee are ultimately optimistic about the prospects of technical progress as they relate to employment, the outcome is far from certain.

What is becoming clear, however, is that unemployment rates that have been north of 8% in the US since February of 2009 are driving people into industries that are desperate for help. For some, this means oil & gas employment in traditionally underpopulated environments like North Dakota. For others, however, technology – long an enemy – is becoming a refuge.

We’re seeing a spike in inquiries about transitioning to technology careers. Lawyers, management consultants, teachers and others are seeking – and often finding – homes for themselves within the technology sector. Some are self-taught or trained on the job, others merely apply existing skills in new contexts, but both represent a potential cultural shift. Which begs the question: could technology be the next major middle class employment sector?

For that to happen, the education system needs to improve, because even an industry which has been one of the few economic bright spots of the last decade can only absorb so many unskilled workers without slowing. This is the real significance of applications like Code Academy or programs like Harvard’s free CSCI E-52, MITx or Stanford Engineering Everywhere: they are one potential solution to the perpetual shortage of talent. For all of the limitations of distance learning, the scale means that some subset of motivated students will become productive developers, and by extension, contributors to the larger economy.

This is a long term process, so obvious progress within 2012 will be minimal, and talent shortages will continue. But we will in the next twelve months begin to see distance trained students hired at scale, and this will be one of the first steps towards lower talent costs as well as, possibly, the restoration of middle class employment opportunities.

Monitoring as a Service

We are not oriented around category definitions at RedMonk; we prefer market driven names to those conceived and marketed by the analyst industry. That said, it seems clear that the time of Monitoring-as-a-Service (MaaS) is at hand. New Relic’s growth led to a $15M round in November, Boundary took $4M a year ago this month, Monktoberfest speaker Theo Schlossnagle’s Circonus has been in market for over a year, and virtually every vendor that we speak with today is adding monitoring and management facilities, from 10gen’s MMS to Cloudera’s Cloudera Manager.

The proliferation of these services is a direct response to the increasingly heterogeneous nature of application architecture and the reality that the substrate is frequently network based, rather than local. Given accelerating rather than declining consumption of network resources, we predict a strong increase in interest and adoption of MaaS tools. Much as I don’t care for the term itself.

Intelligent usage of generated telemetry – which we’ll come back to – will further cement adoption, delivering previously unseen value.

Open Source and the Paradox of Choice

Gartner in March of last year asserted that open source had hit a tipping point, saying:

“Mainstream adopters of IT solutions across a widening array of market segments are rapidly gaining confidence in the use of open source software.”

We concur, although we would argue that the tipping point actually occured ten years or more prior. The Apache web server and MySQL were originally written in 1995. In 1999, we saw the public offering of Red Hat and the creation by IBM – as mainstream a technology brand as there is in the enterprise – of the Linux Technology Center. Firefox was first released in 2003. None of these reached their relative levels of popularity in the past twelve months; they have instead been the de facto infrastructure for the better part of the last decade.

Regardless of when one asserts that open source crossed the chasm, however, it remains that it is a model whose popularity is increasing over time. As understanding of the benefits increases and concerns about the risks abate, more organizations are not only consuming open source but contributing to it. Evidence suggests, in fact, that perceptions of the value of software are in decline – we’ll come back to that too, and that the end result of this is that more proprietary code is being released as open source software.

Widely perceived as a net benefit, however, the influx of new projects does present problems for would be adopters. Specifically, the paradox of choice implies that developers will increasingly be forced to select from a growing sea of projects which may or may not be suitable for their needs. And while the nature of open source guarantees developers the ability to apply this code to their projects without restriction or commercial engagement, this is a process with a limited ability to scale. Consider the NoSQL space, as an example. Presuming for the sake of argument that the developers in question understand the different categories of database – key value stores, document databases, columnar databases, MapReduce engines, graph databases and so on – well enough to understand their high level needs, there are at least two and sometimes as many as half a dozen credible options to consider.

This paradox of choice, or too much of a good thing, will become more problematic over time rather than less as contributions will continue to rise. The net impact is likely to be increased commercial opportunities around selection, and therefore attention to vendors like Black Duck, Open Logic, Palamida and Sonatype.

PaaS: The New Standard

It has been evident for some time that runtime fragmentation – an aggressive diversification of programming languages and frameworks, specifically – will change the development landscape. The market failure of the first generation PaaS providers, in fact, was primarily a function of their over-prescriptive natures. The benefits to outsourcing management and scale were obsoleted by the constraints; Java shops were never likely to rewrite their application stack in Python or Ruby strictly to benefit from a platform. Which is why virtually every relevant PaaS provider today offers a choice of runtimes, so as to maximize their addressable market.

But in a fragmented world, what might emerge as a standard? From a developers’ perspective, the standard is most often the framework they’re deploying to, whether that’s Django, Node.js, Lift, Play, Rails, Spring, the Zend Framework or another. From a vendor perspective, however, the new standard is likely to be one level of abstraction up from individual language frameworks: the platform itself. Certainly this is VMware’s opinion, as they are in Maritz’ words trying to construct “the 21st-century equivalent of Linux” – i.e. the substrate that everything else is built on top of.

In 2012, this will become more apparent. PaaS platforms will emerge as the new standard from a runtime and deployment perspective, the middleware target for a new generation of application architectures.

Service Proliferation

With the inevitable adoption of multiple third party services – varying cloud resources, multiple, possibly overlapping, management and monitoring services and so on – will come challenges in making sense of the whole. Overall, instrumentation and visibility on a per service level is improved, but aggregating these views into a cohesive picture of overall architectural health and performance is likely to be highly problematic. Not least because the services themselves may present conflicting information and data. Google Analytics and New Relic, for example, are frequently at odds over load times and other delivery related performance metrics. Introduce in to that mix services like Boundary or CloudWatch and the picture becomes that much more complex. Connecting their data back to underlying log management and monitoring solutions such as 10gen’s MMS or Splunk is more complicated still.

The challenges of service intregration will create commercial opportunities for aggregating services which consume individual performance streams, normalize it and present customers with a consolidated single picture of their network performance. Commercial solutions will not fully deliver on this vision in 2012, but we will see progress and announcements in this direction.

Telemetry Usage

Five years ago, we began publicly discussing revenue models based around what we termed telemetry, or product generated datastreams. The context was providing open source commercial vendors with a viable economic model that better aligned customer and vendor needs, but the approach is by no means limited to that category: Software-as-a-Service vendors, as an example, are well positioned to leverage the data because they maintain the infrastructure. In 2011, we finally began seeing vendors besides Spiceworks take the first steps towards incorporating data based revenue models. For products like Sonatype Insight [coverage], data is not a byproduct, but the product.

In 2012, this trend will accelerate as necessary monitoring capabilities are added to product portfolios and industry understanding and acceptance of the model overcomes conservative privacy concerns. Many more vendors will begin to realize that like New Relic, which observed a decline in commercial application server usage, their accumulated data is full of insights on customer behaviors and wider market trends both.

Value of Software Will Continue to Decline

Capital markets have not, traditionally, been overly fond of software firms, perhaps because comparatively few of them eclipse annual revenue marks of a billion dollars – less than twenty, by Forbes‘ count. Microsoft’s share price has languished for over a decade in spite of having not one but two licenses to print money. The mean age of the PwC’s Top 20 software firms by revenue is 47 years; a fact which cannot be encouraging to startups.

Higher valuations instead are being awarded to entities that employ software to some end, rather than attempting to realize revenue from it directly. Startups today realize this, and the value of software in their models has commensurately been adjusted downward. Tom Preston-Werner, for example, describes the GitHub philosophy as “open source (almost) everything.” Facebook, LinkedIn, Rackspace, Twitter and others exhibit a similar lack of protectiveness regarding their software assets, all having open sourced core components of their software infrastructure that would have been even five years ago fiercely guarded.

This is becoming the expectation rather than the exception because it is nothing more or less than an intelligent business strategy. Businesses can and will keep private assets they believe represent competitive differentiation, but it will be increasingly apparent that less and less software is actually differentiating. As a result, 2012 will see even less emphasis on the value of software and more on what the software can be used to achieve.

Bonus: Facebook’s Most Important Feature

In 2012 will be Timeline. Mark it down.

Disclosure: Black Duck, Cloudera, GitHub, IBM, Microsoft, Sonatype and VMware are RedMonk customers, while 10gen, Boundary, Circonus, Facebook, Open Logic, Palamida, and New Relic are not.

by-nc-sa

Categories: AltDB, Analytics, Big Data, Cloud, Desktop, Open Source.

Revisiting the 2011 Predictions, Part 2

This is the concluding half of the exercise in which I review my predictions for the calendar year just ended. If you’re looking for the original 2011 predictions, those are here. Part 1, meanwhile, can be found here. With that, on to the predictions.

Hardware

Workstations Will Make a Comeback

This prediction is not supported by data nor even anecdotal evidence. Even in the small sample of my contacts, migrations away from tower-style hardware profiles towards laptops accelerated. Wider market developments seem to confirm this; Apple may be retiring their existing workstation platform, the Mac Pro.

This is a (big) miss.

ARM Will Emerge as a Server Player

Whether they will ultimately emerge as a credible mainstream alternative remains to be seen, but ARM is indeed emerging as a server player. Though virtually all of them discuss it privately, HP (via Calexda) this year became the first major systems player to publicly detail plans for ARM servers – perhaps banking on the fact that the upcoming A15 processor is more server friendly,

Intel is predictably skeptical of ARM’s viability in its core markets, with CEO Paul Otellini bluntly dismissive: “It ain’t gonna work.” And while it certainly hasn’t proven to work thus far, and there are real architectural and software issues to address, the power profile continues to pique the interest of server manufacturers and customers alike. Even marginal power savings mean real dollars at scale.

I count this as a hit.

Tablets are a Real Market

From the New York Times, late January, 2011:

“The iPad, introduced in April — is on track to deliver $15 billion to $20 billion in revenue in its first full year of sales, estimates A. M. Sacconaghi, an analyst at Sanford C. Bernstein. At that size, if the iPad were a stand-alone company, it would rank within the top third of the Fortune 500.”
Steve Lohr, “The Power of the Platform at Apple

Any questions? It’s true that to date the tablet market is more accurately characterized as an iPad market, but irrespective of the particular vendor dynamics in the space, the hardware form factor appears here to stay.

I count this as a hit.

Mobile

Challenges of Native Development Will Drive Interest in HTML5 and Hybrid Approaches

In August, we did a quick pass at some developer metrics and confirmed what our qualitative research had already indicated: that interest in PhoneGap was booming. Here’s a chart of StackOverflow traction, for example.

Mentions (w/o jQuery): Stack Overflow

Interest was booming enough, in fact, that Adobe acquired the talent behind PhoneGap, the code of which was submitted to Apache. This interest was unsurprising in light of the frustrations experienced by enterprises and developers alike, who are collectively slowed by the process of building an application on one platform and then porting to a second. Developers are frustrated enough, in fact, that they are in certain cases actively stalling development. While opinions differ on individual platform trajectories, most would agree that the status quo is unlikely to remain static. Meaning that at least some native development effort is likely to be wasted.

Couple that with improving mobile browser capabilities, and interest in HTML5 and hybrid approaches is likely to remain strong, in spite of inherent advantages to native development like discovery.

I count this as a hit.

NoSQL

The NoSQL Marketplace Will Experience Consolidation

The merger of CouchOne and Membase into CouchBase in February provided some evidence that the long anticipated wave of consolidation in this space was beginning, but the balance of the year provided little evidence to support this aside from the acceleration of a few individual players such as MongoDB [coverage]. I remain convinced that the marketplace will be unable to sustain the current volume of would be commercial entities, but from our conversations with both those in a position to potentially impact consolidation and those interested in partnering with various NoSQL players, it is clear that consolidation will depend on clearer winners and losers to proceed. This should occur in 2012.

I’ll count this as a push in light of the CouchBase merger which subtracted one player but otherwise saw very few exits.

NoSQL Will Look More Like Pro-SQL

The implicit rejection of the Structured Query Language in the NoSQL term is ironic in light of the fact that a variety of projects are now adding similar features. Continuing in the proud tradition of Hive and Pig, which provide query language interfaces to Hadoop, DataStax announced CQL in June while CouchBase and SQLite announced UnQL in July [coverage].

Whether we’ll see a unified interface or a variety of engine-specific implementations as Alex Popescu would prefer remains to be seen, but query languages will be coming to the majority of NoSQL stores one way or another.

I count this as a hit.

Open Source

Open Source of Non-Strategic Infrastructure Assets Will Increase

From Twitter open sourcing the Storm assets it acquired via the BackType transaction to the New York Stock Exchange’s donation of OpenMAMA to the Linux Foundation, it is increasingly clear even to traditional parties that the release of non-strategic code as open source has multiple benefits. GitHub’s Tom Preston-Werner’s list of same is difficult to improve upon:

  • “Open sourcing code is great advertising for you and your company.”
  • “If your code is popular enough…you will have created a force multiplier that helps you get more work done faster and cheaper. “
  • “When you open source useful code, you attract talent.”
  • “If you’re hiring, the best technical interview possible is the one you don’t have to do because the candidate is already kicking ass on one of your open source projects.”
  • “Dedication to open source code is an amazingly effective way to retain that talent.”
  • “[Assuming code will be open sourced] leads to effortless modularization.”
  • “By getting code out in the public we can drastically reduce duplication of effort.”
  • “It’s the right thing to do.”

It may or may not be beneficial to open source core strategic assets, as VMware did with Cloud Foundry, but it is increasingly hard to justify protecting those that are purely tactical in nature. The benefits in many if not most cases will outweigh the costs, which is why we’re seeing an increase in contributions to open source projects.

Eclipse Survey, Percentage in Change of Open Source Contributing Organizations

The data from the annual Eclipse surveys is one example of this. If we examine the percentage of organizations that contribute back to open source versus those that do not from 2007 to 2011, it is clear that comfort levels with open source generally are rising.

I count this as a hit.

Forking: How Development Gets Done

Commits by Forge, Sorted by Age

The benefits to distributed version control and the beneficial forking model it encourages have been sufficient to convince even large projects such as Eclipse: the majority (44.2%) of its projects have migrated from CVS or Subversion to Git. The popularity of this model is also on display in the graph above, which depicts the relative performance among commit volume of four major forges. GitHub is the youngest of the forges and yet commands a significant majority of the overall commits observed by Black Duck.

While it is difficult to separate the success of Git from GitHub, it is not necessary for this exercise, because both by design encourage forking as a developmental best practice. Forking is increasingly how development is conducted.

I count this as a hit.

Ubuntu is the New SUSE

In March of last year, I explored a set of metrics evaluating the relative performance of SUSE with developers, from jobs data to community traction. None of the metrics favored SUSE at the time, and the observed trends that favored Ubuntu persist. Consider, for example, the job trends:

SUSE, Ubuntu, Red Hat Jobs 2/2011

As pointed out at the time, despite the relative outperformance of SUSE relative to Ubuntu, the respective trajectories were problematic for the former. Revisiting the data today, we can see that the trendlines produced a logical outcome.

SUSE, Ubuntu, Red Hat Jobs 1/2012

Commercially, Ubuntu has continued its emergence as a server player. While initial claims that it was the primary operating system behind HP’s public cloud may have been [overstated],(http://arstechnica.com/business/news/2011/10/ubuntu-will-power-hps-new-cloud-service.ars) in HP’s words “that they are the first one in our current private beta.” This is an announcement that would not have been possible prior to 2011.

I count this as a hit, though it will be interesting to see if Mint can become the new Ubuntu, in turn.

Programming Languages

JavaScript is resurgent

In April, we were fortunate to be able to analyze data from Black Duck regarding open source project commits by programming language. When we compared a 2011 snapshot to the volume of all time commits, the rise of JavaScript was apparent.

Percentage of Change in Language Usage

While dynamic languages were ascendant across the board, JavaScript outperformed even high growth languages like Python and Ruby.

As “The Rise and Rise of JavaScript” notes, the prospect of using the same language on browser and server is compelling, but it doesn’t stop there:

JavaScript’s serialization form, JSON, is becoming ubiquitous as a lighter-weight alternative to XML for streaming structured data, and NoSQL databases like mongo are happily using JSON and JavaScript in the database as a query language. This means, for the first time, you can have the same JavaScript function in the browser, on the server and in the database.

JavaScript may only be the most popular language within forward communities like GitHub, but its future in the wider world is bright, Dash notwithstanding.

I count this as a hit.

Bonus Prediction

Dropbox will become an attractive acquisition target

One the one hand, I got this wrong, because Steve Jobs apparently first attempted to acquire Dropbox in 2009.

“Jobs presciently saw this sapling as a strategic asset for Apple. Houston cut Jobs’ pitch short: He was determined to build a big company, he said, and wasn’t selling, no matter the status of the bidder (Houston considered Jobs his hero) or the prospects of a nine-digit price (he and Ferdowsi drove to the meeting in a Zipcar Prius).

Jobs smiled warmly as he told them he was going after their market.”

It was in 2011, however, that Dropbox – a firm that had previously raised a mere $7.2M – justified that decision, with a $4B valuation.

So I’ll go ahead and call this one a push.

The Final Tally

Scoring the 2011 predictions, then, we get 14 of 17 correct against two push and one miss. Generally, an 82% success rate in forecasting means that the game is rigged; 51% is enough to make a substantial profit in public markets. And in reviewing the predictions, it’s certainly true that a few – the continuing developer shortage, for instance – were likely obvious enough to not merit the “prediction” label.

But we do believe, as our website suggests, in William Gibson’s claim that the future is already here, it’s just unevenly distributed. Conclusions that are obvious to us are not to many traditional enterprise technology buyers, so in that sense this game is, in fact, rigged. We’ve predicted the rise of dynamic languages, Node.js, NoSQL, REST, Software-as-a-Service and such then not because we’re Nostradamus, but because we know who to listen to.

If you want to hear what that audience is telling us about 2012, then, stick around. Those are due next week.

by-nc-sa

Categories: AltDB, Hardware, Mobile, Open Source, Programming Languages.

Revisiting the 2011 Predictions, Part 1

Predicting is an easier business than it once was. True, technology is hysterically accelerating rates of change and disruption, but that’s only relevant if the substance of your predictions matters. Which all too often, these days, it doesn’t. Analysts and pundits are able to prognosticate with relative impunity; who has the time to go back and check their accuracy? Pageview driven models, in fact, reward wilder predictions because the error cost is, generally, approaching zero. Unless you predicted, say, that Linux would be killed off by Windows NT, nobody will remember later.

I find value in reviewing my annual predictions, however. If they prove correct, that’s useful. If they were not, understanding the reasons why is important to adjusting our models moving forward.

Because I made the mistake of making better than a dozen predictions last year, this year’s review will be delivered in two parts. Part 1, below, will cover my predictions for browsers, the cloud, data, developers and programming language frameworks. Part 2, covering predictions within hardware, mobile, NoSQL, open source and programming languages, will hit tomorrow.

If you’d prefer to read last year’s first, they can be found here.

Browsers

Firefox Will Cede First Place to Chrome, But Not Without a Fight

Browser Usage

According to RedMonk Analytics, whose data reflects our developer-heavy audience, Firefox was able to hold off Chrome for two quarters. On the first of June, Firefox held a 32.58 share of our audience to Chrome’s 32.08. By the second, Firefox was in second place and would remain there for the balance of the year, widening the gap in the process. At present, our browser metrics peg Chrome at 36.38 with Firefox a distant second at 25.48.

I feel safe counting this one as a hit.

Cloud

PaaS Adoption Will Begin to Show Traction, With Little Impact on IaaS Traction

The first Platform-as-a-Service providers essentially asked developers to trade choice for development speed. Like Ruby on Rails – itself the basis for multiple first generation PaaS platforms – PaaS was built for those that would embrace constraints. But PaaS platforms never saw the type of growth that Rails experienced, in part because of the further loss of control that the cloud represents. It’s one thing to have a web framework like Rails dictate the way that you build web applications; having PaaS platforms also choose the operating system, database, version control systems and more was too much.

Which is why second and third generation PaaS providers have furiously removed barriers to entry, adding additional runtimes, open sourcing the underlying platform and allowing you to pick your provider. Which, in turn, is why adoption of PaaS is accelerating. VMware CEO Paul Maritz calls PaaS “the 21st-century equivalent of Linux,” which explains not only why they feel compelled to compete in the space, but also why Red Hat might.

Virtually every vendor in this space is reporting growth similar to the Hacker News trajectories for Cloud Foundry and Openshift (below).

Cloud Foundry / Open Shift

In spite of the growth of PaaS, however, none of the metrics we track reflect any decline in usage of general infrastructure platforms. Quite the contrary, in fact.

I count this as a hit.

Data

Firms Will Increasingly Seek to Leverage the Data They Generate

Turning data into revenue has been one of the core themes of the past year, as well as the focus of my talk at the Open Source Business Conference in May. We’ve long held SpiceWorks up as a model of monetizing data, and as customers adjust to the reality that they’re already sharing data and vendors cease to regard it as a third rail issue, we’re seeing more businesses embrace data based revenue streams, as with Sonatype Insight. From 10gen to Black Duck, vendors are increasingly positioning themselves to be purveyors of data as much as software. Data is no longer the byproduct, but a product itself.

I count this as a hit.

Hadoop Will Become the MySQL of Big Data

EMC, HP, IBM, NetApp and even Oracle all have Hadoop – or in EMC’s case, MapReduce – plays in market. Microsoft actually deprecated its own Dryad initiative in favor of the Apache project. Players from AsterData to CouchBase to EnterpriseDB to MarkLogic to Tableau to Vertica have purpose built Hadoop connectors. The commerical distribution space, once essentially owned solely by Cloudera, has expanded to multiple third parties with varying points of differentiation.

Hadoop interest elsewhere, meanwhile, has not slowed.

Hadoop

Need I say more about the growing ubiquity of Hadoop? I count this as a hit.

Developers

Talent Shortages Will Continue

Granted, predicting a shortage of qualified development talent will be seen in some quarters as controversial as predicting that the sun will rise in the east. But part of this is context: certainly in January, the economic direction was less than certain. And in spite of an unemployment rate that has hovered just south of 10% for the better part of the last calendar year, hiring continues to be an issue for the majority of our clients. To the extent that several are spinning up offices solely for purposes of recruitment. This is not surprising, given the historical growth in employee headcounts (see above) that has, to date, been relatively resistant to the global economic crises.

Demand varies by skillset, as might be predicted, but 2011 remained – by our metrics – a tight market. Other market watchers support this assertion.

“Hiring talent in Silicon Valley is the toughest since the last bubble and investors are starting to openly wonder how this one will end.”
Steve Blank

“There is a war for talent, particularly developer talent, going on. Not just in Silicon Valley but also in NYC and many other places around the country.

Companies, small and large, are resorting to all sorts of creative ideas to recruit. Free lunches, free yoga, pushing code day one, cool schwag, options, RSUs, pretty much whatever it takes.”
Fred Wilson

While we don’t have good data then on market specific hiring (Bureau of Labor data is not fine grained enough), the evidence available to us seems to support the contention that shortages of tech talent remain.

I count this as a hit.

Frameworks

Node.js Will Continue its Growth Trajectory

October was a rough month for Node.js, with posts like Node.js is Cancer and node.js Is VB6 – Does node.js Suck? following the tradition of March reddit discussions like Is NodeJS Wrong? The Trough of Disillusionment, it seemed, had arrived well ahead of schedule.

Except that interest metrics showed no commensurate decline. Node took – again – three of the Top 5 spots in inbound search queries within RedMonk Analytics. Which is unsurprising against the backdrop of Google’s Insights for Search numbers.

Over on GitHub, meanwhile, which itself has achieved dramatic growth, Node.js is the second most popular watched repository, ahead of Rails, jQuery, HTML5-Boilerplate, and Homebrew. Microsoft clearly perceives this growth, because it has worked with Joyent to create a stable build of Node for Windows which in turn led to an SDK for Azure.

All of which means nothing except that Node’s growth trajectory continues.

I count this as a hit.


Part 2, tomorrow.

by-nc-sa

Categories: browsers, Cloud, Data.