Skip to content

The RedMonk Programming Language Rankings: January 2015

Update: These rankings have been updated. The third quarter snapshot is available here.

With two quarters having passed since our last snapshot, it’s time to update our programming language rankings. Since Drew Conway and John Myles White originally performed this analysis late in 2010, we have been regularly comparing the relative performance of programming languages on GitHub and Stack Overflow. The idea is not to offer a statistically valid representation of current usage, but rather to correlate language discussion (Stack Overflow) and usage (GitHub) in an effort to extract insights into potential future adoption trends.

In general, the process has changed little over the years. With the exception of GitHub’s decision to no longer provide language rankings on its Explore page – they are now calculated from the GitHub archive – the rankings are performed in the same manner, meaning that we can compare rankings from run to run, and year to year, with confidence.

This is brought up because one result in particular, described below, is very unusual. But in the meantime, it’s worth noting that the steady decline in correlation between rankings on GitHub and Stack Overlow observed over the last several iterations of this exercise has been arrested, at least for one quarter. After dropping from its historical .78 – .8 correlation to .74 during the Q314 rankings, the correlation between the two properties is back up to .76. It will be interesting to observe whether this is a temporary reprieve, or if the lack of correlation itself was the anomaly.

For the time being, however, the focus will remain on the current rankings. Before we continue, please keep in mind the usual caveats.

  • To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.
  • No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.
  • There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.
  • All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.
  • GitHub language rankings are based on raw lines of code, which means that repositories written in a given language that include a greater number amount of code in a second language (e.g. JavaScript) will be read as the latter rather than the former.
  • In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.

(click to embiggen the chart)

Besides the above plot, which can be difficult to parse even at full size, we offer the following numerical rankings. As will be observed, this run produced several ties which are reflected below (they are listed out here alphabetically rather than consolidated as ties because the latter approach led to misunderstandings).

1 JavaScript
2 Java
4 Python
5 C#
5 C++
5 Ruby
9 C
10 Objective-C
11 Perl
11 Shell
13 R
14 Scala
15 Haskell
16 Matlab
17 Go
17 Visual Basic
19 Clojure
19 Groovy

By the narrowest of margins, JavaScript edged Java for the top spot in the rankings, but as always, the difference between the two is so marginal as to be insignificant. The most important takeaway is that the language frequently written off for dead and the language sometimes touted as the future have shown sustained growth and traction and remain, according to this measure, the most popular offerings.

Outside of that change, the Top 10 was effectively static. C++ and Ruby jumped each one spot to split fifth place with C#, but that minimal distinction reflects the lack of movement of the rest of the “Tier 1,” or top grouping of languages. PHP has not shown the ability to unseat either Java or JavaScript, but it has remained unassailable for its part in the third position. After a brief drop in Q1 of 2014, Python has been stable in the fourth spot, and the rest of the Top 10 looks much as it has for several quarters.

Further down in the rankings, however, there are several trends worth noting – one in particular.

  • R: Advocates of the language have been pleased by four consecutive gains in these rankings, but this quarter’s snapshot showed R instead holding steady at 13. This was predictable, however, given that the languages remaining ahead of it – from Java and JavaScript at the top of the rankings to Shell and Perl just ahead – are more general purpose and thus likely to be more widely used. Even if R’s grow does stall at 13, however, it will remain the most popular statistical language by this measure, and this in spite of substantial competition from general purpose alternatives like Python.

  • Go: In our last rankings, it was predicted based on its trajectory that Go would become a Top 20 language within six to twelve months. Six months following that, Go can consider that mission accomplished. In this iteration of the rankings, Go leapfrogs Visual Basic, Clojure and Groovy – and displaces Coffeescript entirely – to take number 17 on the list. Again, we caution against placing too much weight on the actual numerical position, because the differences between one spot and another can be slight, but there’s no arguing with the trendline behind Go. While the language has its critics, its growth prospects appear secure. And should the Android support in 1.4 mature, Go’s path to becoming a Top 10 if not Top 5 language would be clear.

  • Julia/Rust: Long two of the notable languages to watch, Julia and Rust’s growth has typically been in lockstep, though not for any particular functional reason. This time around, however, Rust outpaced Julia, jumping eight spots to 50 against Julia’s more steady progression from 57 to 56. It’s not clear what’s responsible for the differential growth, or more specifically if it’s problems with Julia, progress from Rust (with a DTrace probe, even), or both. But while both remain languages of interest, this ranking suggests that Rust might be poised to outpace its counterpart.

  • Coffeescript: As mentioned above, Coffeescript dropped out of the Top 20 languages for the first time in almost two years, and may have peaked. From its high ranking of 17 in Q3 of 2013, in the three runs since, it has clocked in at 18, 18 and now 21. The “little language that compiles into JavaScript” positioned itself as a compromise between JavaScript’s ubiquity and syntactical eccentricities, but support for it appears to be slowly eroding. How it performs in the third quarter rankings should provide more insight into whether this is a temporary dip or more permanent decline.

  • Swift: Last, there is the curious case of Swift. During our last rankings, Swift was listed as the language to watch – an obvious choice given its status as the Apple-anointed successor to the #10 language on our list, Objective-C. Being officially sanctioned as the future standard for iOS applications everywhere was obviously going to lead to growth. As was said during the Q3 rankings which marked its debut, “Swift is a language that is going to be a lot more popular, and very soon.” Even so, the growth that Swift experienced is essentially unprecedented in the history of these rankings. When we see dramatic growth from a language it typically has jumped somewhere between 5 and 10 spots, and the closer the language gets to the Top 20 or within it, the more difficult growth is to come by. And yet Swift has gone from our 68th ranked language during Q3 to number 22 this quarter, a jump of 46 spots. From its position far down on the board, Swift now finds itself one spot behind Coffeescript and just ahead of Lua. As the plot suggests, Swift’s growth is more obvious on StackOverflow than GitHub, where the most active Swift repositories are either educational or infrastructure in nature, but even so the growth has been remarkable. Given this dramatic ascension, it seems reasonable to expect that the Q3 rankings this year will see Swift as a Top 20 language.

The Net

Swift’s meteoric growth notwithstanding, the high level takeaway from these rankings is stability. The inertia of the Top 10 remains substantial, and what change there is in the back half of the Top 20 or just outside of it – from Go to Swift – is both predictable and expected. The picture these rankings paint is of an environment thoroughly driven by developers; rather than seeing a heavy concentration around one or two languages as has been an aspiration in the past, we’re seeing a heavy distribution amongst a larger number of top tier languages followed by a long tail of more specialized usage. With the exceptions mentioned above, then, there is little reason to expect dramatic change moving forward.

Update: The above language plot chart was based on an incorrect Stack Overflow tag for Common Lisp and thereby failed to incorporate existing activity on that site. This has been corrected.

Categories: Programming Languages.

DVCS and Git Usage in 2014

To many in the technology industry, the dominance of Decentralized Version Control Systems (DVCS) generally and Git specifically is taken as a given. Whether it’s consumed as a product (e.g. GitHub Enterprise/Stash), service (Bitbucket, GitHub) or base project, Git is the de facto winner in the DVCS category, a category which has taken considerable share from its centralized alternatives over the past few years. With macro trends fueling further adoption, it’s natural to expect that the ascent of Git would continue unimpeded.

One datapoint which has proven useful for assessing the relative performance of version control systems is Open Hub (formerly Ohloh)’s repository data. Built to index public repositories, it gives us insight into the respective usage at least within its broad dataset. In 2010 when we first examined its data, Open Hub was crawling some 238,000 projects, and Git managed just 11% of them. For this year’s snapshot, that number has swelled to over 674,000 – or close to 3X as many. And Git’s playing a much more significant role today than it did then.

Before we get into the findings, more details on the source and issues.


The data in this chart was taken from snapshots of the Open Hub data exposed here.

Objections & Responses

  • Open Hub data cannot be considered representative of the wider distribution of version control systems“: This is true, and no claims are made here otherwise. While it necessarily omits enterprise adoption, however, it is believed here that Open Hub’s dataset is more likely to be predictive moving forward than a wider sample.
  • Many of the projects Open Hub surveys are dormant“: This is probably true. But even granting a sizable number of dormant projects, it’s expected that these will be offset by a sizable influx of new projects.
  • Open Hub’s sampling has evolved over the years, and now includes repositories and forges it did not previously“: Also true. It also, by definition, includes new projects over time. When we first examined the data, Open Hub surveyed less than 300,000 projects. Today it’s over 600,000. This is a natural evolution of the survey population, one that’s inclusive of evolving developer behaviors.

With those caveats in mind, let’s start with the big picture. The following chart depicts the total share of repositories attributable to centralized (CVS/Subversion) and distributed (Bazaar/Git/Mercurial) systems.

Even over a brief three year period (we lack data for 2011, and have thus omitted 2010 for continuity’s sake) it’s clear that DVCS systems have made substantial inroads. DVCS may not be quite as dominant as is commonly assumed, but it’s close to managing one in two projects in the world. When considering the inertial effects operating against DVCS, this traction is impressive. In spite of the fact that it can be difficult even for excellent developers to shift their mental model from centralized to decentralized, that version control systems are not typically the priority of other infrastructure elements, that the risks associated with moving from one system to another are non-trivial, DVCS has clearly established itself as a popular, mainstream option. Close observation of the above chart, however, reveals a slight hiccup in adoption numbers which we’ll explore in more detail shortly.

In the meantime, let’s isolate the specific changes per project between our 2014 snapshot and the 2010 equivalent. How has their relative share changed?

As might be predicted, comparing 2010 to 2014, Git is the clear winner. The project with the idiosyncratic syntax made substantial gains (25.92%) partially at the expense of Subversion (-12.02%) but more CVS (-16.64%). Just as clearly, Git is the flag bearer for DVCS more broadly, as other decentralized version control systems in Bazaar and Mercurial showed only modest improvement over that span – 1.33% and 1.41% respectively. The takeaways, then, from this span are first that DVCS is a legitimate first class citizen and second that Git is the most popular option in that category.

What about the past year, however? Has Git continued on its growth trajectory?

The short answer is no. With this chart, it’s very important to note the scale of the Y axis: the changes reflected here are comparatively minimal, which is to be expected over the brief span of one year. That being said, it’s interesting to observe that Subversion shows a minor bounce (1.28%), while Git (-1.17%) took a correspondingly minor step back. Bazaar and CVS were down negligible amounts over the same span, while Mercurial was ever so slightly up.

Neither quantitative nor qualitative evidence supports the idea that Git adoption is stalled, nor that Subversion is poised for a major comeback. Wider market product trends, if anything, contradict the above, and suggest that the most likely explanation for the delta in Open Hub’s numbers is the addition of major new centrally managed codebases to Open Hub’s index.

It does serve as a reminder, however, that as much as the industry takes it for granted that Git is the de facto standard for version control systems, a sizable volume of projects have yet to migrate to a decentralized system of any kind. The implications for this are many. For service providers who are Git-centric, it may be worth considering creating bridges for users on other systems or even offering assistance in VCS migrations. For DVCS providers, the above may be superficially discouraging, but in reality indicates that the market opportunity is even wider than commonly assumed. And for users, it means that those still on centralized systems should consider migrating to decentralized alternatives, but by no means are condemned to the laggard category.

While it is thus assumed here, however, that the step back for Git is an artifact, it will be interesting to watch the growth of the platform over the next year. One year’s lack of growth is easily dismissed as an anomaly; a second year would be more indicative of a pattern. It will be interesting to see what the 2015 snapshot tells us.

Disclosure: Black Duck, the parent company of Open Hub, has been a RedMonk customer but is not currently.

Categories: Version Control.

The Scale Imperative

The Computing Scale Co

Once upon a time, the larger the workload, the larger the machine you would use to service it. Companies from IBM to Sun supplied enormous hardware packages to customers with similarly outsized workloads. IBM, in fact, still generates substantial revenue from its mainframe hardware business. One of the under-appreciated aspects of Sun’s demise, on the other hand, was that it had nothing to do with a failure of its open source strategy; the company’s fate was sealed instead by the collapse in sales of its E10K line, due in part to the financial crisis. For vendors and customers alike, mainframe-class hardware was the epitome of computational power.

With the rise of the internet, however, this model proved less than scalable. Companies founded in the late 1990’s like Google, whose mission was to index the entire internet, looked at the numbers and correctly concluded that the economics of that mission on a scale-up model were untenable. With scale-up an effective dead end, the remaining option was to scale-out. Instead of big machines, scale-out players would build software that turned lots of small machines into bigger machines, e pluribus unum writ in hardware. By harnessing the collective power of large numbers of low cost, comparatively low power commodity boxes the scale-out pioneers could scale to workloads of previously unimagined size.

This model was so successful, in fact, that over time it came to displace scale-up as the default. Today, the overwhelming majority of companies scaling their compute requirements are following in Amazon, Facebook and Google’s footprints and choosing to scale-out. Whether they’re assembling their own low cost commodity infrastructure or out-sourcing that task to public cloud suppliers, infrastructure today is distributed by default.

For all of the benefits of this approach, however, the power afforded by scale-out did not come without a cost. The power of distributed systems mandates fundamental changes in the way that infrastructure is designed, built and leveraged.

Sharing the Collective Burden of Software

The most basic illustration of the cost of scale-out is the software designed to run on it. As Joe Gregorio articulated seven years ago:

The problem with current data storage systems, with rare exception, is that they are all “one box native” applications, i.e. from a world where N = 1. From Berkeley DB to MySQL, they were all designed initially to sit on one box. Even after several years of dealing with MegaData you still see painful stories like what the YouTube guys went through as they scaled up. All of this stems from an N = 1 mentality.

Anything designed prior to the distributed system default, then, had to be retrofit – if possible – to not just run across multiple machines instead of a single node, but to run well and take advantage of their collective resources. In many cases, it proved simpler to simply start from scratch. The Google Filesystem and HDFS papers that resulted in Hadoop are one example of this; at its core, the first iterations of the project were designed to deconstruct a given task into multiple component tasks to be more easily executed by an array of machines.

From the macro-perspective, besides the inherent computer science challenges of (re)writing software for distributed, scale-out systems – which is exceptionally difficult – the economics were problematic. With so many businesses moving to this model in a relatively short span of time, a great deal of software needed to get written quickly.

Because no single player could bear the entire financial burden, it became necessary to amortize the costs across an industry. Most of the infrastructure we take for granted today, then was developed as open source. Linux became an increasingly popular operating system choice as both host and guest; the project, according to Ohloh, is the product of over 5500 person-years in development. To put that number into context, if you could somehow find and hire 1,000 people high quality kernel engineers, and they worked 40 hours a week with two weeks vacation, it would take you 24 years to match that effort. Even Hadoop, a project that hasn’t had its 10 year anniversary yet, has seen 430 person-years committed. The even younger OpenStack, a very precocious four years old, has seen an industry conglomerate collectively contribute 594 years of effort to get the project to where it is today.

Any one of these projects could be singularly created by a given entity; indeed, this is common, in fact. Just in the database space, whether it’s Amazon with DynamoDB, Facebook with Cassandra or Google with BigQuery, each scale-out player has the ability to generate its own software. But this is only possible because they are able to build upon the available and growing foundation of open source projects, where the collective burden of software is shared. Without these pooled investments and resources, each player would have to either build or purchase at a premium everything from the bare metal up.

Scale-out, in other words, requires open source to survive.

Relentless Economies of Scale

In stark contrast to the difficulty of writing software for distributed systems, microeconomic principles love them. The economies of scale that larger players can bring to bear on the markets they target are, quite frankly, daunting. Their variable costs decrease due to their ability to purchase in larger quantities; their fixed costs are amortized over a higher volume customer base; their relative efficiency can increase as scale drives automation and improved processes; their ability to attract and retain talent increases in proportion to the difficulty of the technical challenges imposed; and so on.

If it’s difficult to quantify these advantages in precise terms, but we can at least attempt to measure the scale at which various parties are investing. Specifically, we can examine their reported plant, property and equipment investments.

If one accepts the hypothesis that economies of scale will play a significant role in determining who is competitive and who is not, this chart suggests that the number of competitive players in the cloud market will not be large. Consider that Facebook, for all of its heft and resources, is a distant fourth in terms of its infrastructure investments. This remains true, importantly, even if their spend was adjusted upwards to offset the reported savings from their Open Compute program.

Much as in the consumer electronics world, then, where Apple and Samsung are able to leverage substantial economies of scale in their mobile device production – an enormous factor in Apple’s ability to extract outsized and unmatched margins – so too is the market for scale-out likely to be dominated by the players that can realize the benefits of their scale most efficiently.

The Return of Vertical Integration

Pre-internet, the economics of designing your own hardware were less than compelling. In the absence of a global worldwide network, not to mention less connected populations, even the largest companies were content to outsource the majority of their technology business, and particularly hardware, to specialized suppliers. Scale, however, challenges those economics on a fundamental level, and forced those at the bleeding edge to rethink traditional infrastructure design, questioning all prior assumptions.

It’s long been known, for example, that Google eschewed purchasing hardware from traditional suppliers like Dell, HP or IBM in favor of its own designs manufactured by original device manufacturers (ODMs); Stephen Shankland had an in depth look at one of their internal designs in 2009. Even then, the implications of scale are apparent; it seems odd, for example, to embed batteries in the server design, but at scale, the design is “much cheaper than huge centralized UPS,” according to Ben Jai. But servers were only the beginning.

As it turns out, networking at scale is an even greater challenge than compute. On November 14th, Facebook provided details on its next generation data center network. According to the company:

The amount of traffic from Facebook to Internet – we call it “machine to user” traffic – is large and ever increasing, as more people connect and as we create new products and services. However, this type of traffic is only the tip of the iceberg. What happens inside the Facebook data centers – “machine to machine” traffic – is several orders of magnitude larger than what goes out to the Internet…

We are constantly optimizing internal application efficiency, but nonetheless the rate of our machine-to-machine traffic growth remains exponential, and the volume has been doubling at an interval of less than a year.

As of October 2013, Facebook was reporting 1.19B active monthly users. Since that time, then, machine to machine east/west networking traffic has more than doubled. Which makes it easy to understand how the company might feel compelled to reconsider traditional networking approaches, even if it means starting effectively from scratch.

Earlier that week at its re:Invent conference, meanwhile, Amazon went even further, offering an unprecedented peek behind the curtain. According to James Hamilton, Amazon’s Chief Architect, there are very few remaining aspects to AWS which are not designed internally. The company has obviously dramatically grown the software capabilities of its platform over time: on top of basic storage and compute, Amazon has integrated an enormous variety of previously distinct services: relational databases, a Map Reduce engine, data warehousing and analytical capabilities, DNS and routing, CDN, a key value store, a streaming platform – and most recently ALM tooling, a container service and a real-time service platform.

But the tendency of software platforms to absorb popular features is not atypical. What is much less common is the depth to which Amazon has embraced hardware design.

  • Amazon now builds their own networking gear running their own protocol. The company claims their gear is lower cost, faster and that the cycle time for bugs is reduced from months to weekly.
  • Amazon’s server and storage designs are custom to the vendor; the storage servers, for example, are optimized for density and pack in 864 disks at a weight of almost 2400 pounds.
  • Intel is now working directly with Amazon to produce custom chip designs, capable of bursting to much higher clock speeds temporarily.
  • To ensure adequate power for its datacenters, Amazon has progressed beyond simple negotiated agreements with power suppliers to building out custom substations, driven by custom switchgear the company itself designed.

Compute, networking, storage, power: where does this internal innovation path end? In Hamilton’s words, there is no category of hardware that is off-limits for the company. But the relentless in-sourcing is not driven by religious objections – such considerations are strictly functions of cost.

In economic terms, of course, this is an approximation of backward vertical integration. Amazon may not own the manufacturers themselves as in traditional vertical integration, but manufacturing is an afterthought next to the original design. By creating their own infrastructure from scratch, they avoid paying an innovation tax to third party manufacturers, can build strictly to their specifications and need only account for their own needs – not the requirements of every other potential vendor customer. The result is hardware that is, in theory at least, more performant, better suited to AWS requirements and lower cost.

While Amazon or Facebook have provided us with the most specifics, then, it’s safe to assume that vertical integration is a pattern that is already widespread amongst larger players and will only become more so.

The Net

For those without hardware or platform ambitions, the current technical direction is promising. With economies of scale growing ever larger and gradual reduction of third party suppliers continuing, cloud platform providers would appear to have margin yet to trim. And at least to date, competition on cloud platforms (IaaS, at least) has been sufficient to keep vendors from pocketing the difference, with industry pricing still on a downward trajectory. Cloud’s pricing advantage historically was the ability to pay less upfront and more over the longer term, but with base prices down close to 100% over a two year period, the longer term premium attached to cloud may gradually decline to the point of irrelevance.

On the software front, an enormous portfolio of high quality, highly valuable software that would have been financially out of the reach of small and even mid-sized firms even a few years ago is available today at no cost. Virtually any category of infrastructure software today – from the virtualization layer to the OS to the runtime to the database to the cloud middleware equivalents – has high quality, open source options available. And for those willing to pay a premium to outsource the operational responsibilities of building, deploying and maintaining this open source infrastructure, any number of third party platform providers would be more than happy to take those dollars.

For startups and other non-platform players, then, the combination of hardware costs amortized by scale and software costs distributed across a multitude of third parties means that effort can be directed towards business problems rather than basic, operational infrastructure.

The cloud platform players, meanwhile, symbiotically benefit from these transactions, in that each startup, government or business that chooses their platform means both additional revenue and a gain in scale that directly, if incrementally, drives down their costs (economies of scale) and indirectly increases their incentive and ability to reduce their own costs via vertical integration. The virtuous cycle of more customers leading to more scale leading to lower costs leading to lower prices leading to more customers is difficult to disrupt. This is in part why companies like Amazon or Salesforce are more than willing to trade profits for growth; scale may not be a zero sum game, but growth today will be easier to purchase than growth tomorrow – yet another reason to fear Amazon.

The most troubling implications of scale, meanwhile, are for traditional hardware suppliers (compute/storage/networking) and would-be cloud platform service providers. The former, obviously are substantially challenged by the ongoing insourcing of hardware design. Compute may have been first, with Dell being forced to go private, HP struggling with its x86 business and IBM being forced to exit the commodity server business entirely. But it certainly won’t be the last. Networking and storage players alike are or should be preparing for the same disruption server manufacturers have experienced. The problem is not that cloud providers will absorb all or even the majority of the networking and storage addressable markets; the problem is that it will absorb enough to negatively impact the scale traditional suppliers can operate at.

Those that would compete with Amazon, Google, Microsoft et al, meanwhile, or even HP or IBM’s offerings in the space, will find themselves faced with increasingly higher costs relative to larger competition, whether it’s from premiums paid to various hardware suppliers, lower relative purchasing power or both. Which implies several things. First, that such businesses must differentiate themselves quickly and clearly, offering something larger, more cost-competitive players are either unable or unwilling to. Second, that their addressable market as a result of this specialization will be a fraction of the overall opportunity. And third, that the pool of competitors for base level cloud platform services will be relatively small.

What the long term future holds should these predictions hold up and the market come to be dominated by a few larger players is less clear, because as ever in this industry, their disruptors are probably already making plans in a garage somewhere.

Disclosure: Amazon, Dell, HP, IBM and Microsoft are RedMonk clients. Facebook and Google are not.

Categories: Cloud.

The Implications of IaaS Pricing Patterns and Trends

With Amazon’s re:Invent conference a week behind us and any potential price cuts or responses presumably implemented by this point, it’s time to revisit the question of infrastructure as a service pricing. Given what’s at stake in the cloud market, competition amongst providers continues to be fierce, driving costs for customers ever lower in what some observers have negatively characterized as a race to the bottom.

While the downward pricing pressure is welcome, however, it can be difficult to properly assess how competitive individual providers are with one another, all the more so because their non-standardized packaging makes it effectively impossible to compare service to service on an equal footing.

To this end we offer the following deconstruction of IaaS cloud pricing models. As a reminder, this analysis is intended not as a literal expression of cost per service; this is not, in other words, an attempt to estimate the actual component costs for compute, disk, and memory per provider. Such numbers would be speculative and unreliable, relying as they would on non-public information, but also of limited utility for users. Instead, this analysis compares base hourly instance costs against the individual service offerings. What this attempts to highlight is how providers may be differentiating from each other – deliberately or otherwise – by offering more memory per dollar spent, as one example. In other words, it’s an attempt to answer the question: for a given hourly cost, who’s offering the most compute, disk or memory?

As with previous iterations, a link to the aggregated dataset is provided below, both for fact checking and to enable others to perform their own analyses, expand the scope of surveyed providers or both.

Before we continue, a few notes.


  • No special pricing programs (beta, etc)
  • Linux operating system, no OS premium
  • Charts are based on price per hour costs (i.e. no reserved instances)
  • Standard packages only considered (i.e. no high memory, etc)
  • Where not otherwise specified, the number of virtual cores is assumed to equal to available compute units

Objections & Responses

  • This isn’t an apples to apples comparison“: This is true. The providers do not make that possible.
  • These are list prices – many customers don’t pay list prices“: This is also true. Many customers do, however. But in general, take this for what it’s worth as an evaluation of posted list prices.
  • This does not take bandwidth and other costs into account“: Correct, this analysis is server only – no bandwidth or storage costs are included. Those will be examined in a future update.
  • This survey doesn’t include [provider X]“: The link to the dataset is below. You are encouraged to fork it.

Other Notes

  • HP’s 4XL (60 cores) and 8XL (103 cores) instances were omitted from this survey intentionally for being twice as large and better than three times as large, respectively, as the next largest instances. While we can’t compare apples to apples, those instances were considered outliers in this sample. Feel free to add them back and re-run using the dataset below.
  • While we’ve had numerous requests to add providers, and will undoubtedly add some in future, the original dataset – with the above exception – has been maintained for the sake of comparison.

How to Read the Charts

  • There was some confusion last time concerning the charts and how they should be read. The simplest explanation is that the steeper the slope, the better the pricing from a user perspective. The more quickly cores, disk and memory are added relative to cost, the less a user has to pay for a given asset.

With that, here is the chart depicting the cost of disk space relative to the price per hour.

(click to embiggen)

This chart is notable primarily for two trends: first, the aggressive top line Amazon result and second, the Joyent outperformance. The latter is an understandable pricing decision: given Joyent’s recent market focus on data related workloads and tooling, e.g. the recently open sourced Manta, Joyent’s discounting of storage costs is logical. Amazon’s divergent pattern here can be understood as two separate product lines. The upper points represent traditional disk based storage (m1), which Amazon prices aggressively relative to the market, while the bottom line represents its m3 or SSD based product line, which is more costly – although still less pricy than alternative packages from IBM and Microsoft. Google does not list storage in its base pricing and is thus omitted here.

The above notwithstanding, a look at the storage costs on a per provider basis would indicate that for many if not most providers, storage is not a primary focus, at least from a differentiation standpoint.

(click to embiggen)

As has historically been the case, the correlation between providers in the context of memory per dollar is high. Google and Digital Ocean are most aggressive with their memory pricing, offering slightly more memory per dollar spent than Amazon. Joyent follows closely after Amazon, and then comes Microsoft, HP and IBM in varying order.

Interestingly, when asked at the Google Cloud Live Platform event whether the company had deliberately turned the dial in favor of cheaper memory pricing for their offerings as a means of differentiation and developer recruitment, the answer was no. According to Google, any specific or distinct improvements on a per category basis – memory, compute, etc – are arbitrary, as the company seeks to lower the overall cost of their offering based on improved efficiencies, economies of scale and so on rather than deliberately targeting areas developers might prioritize in their own application development process.

Whatever their origin, however, developers looking to maximize their memory footprint per dollar spent may be interested in the above as a guide towards separating services from one another.

(click to embiggen)

In terms of computing units per dollar, Google has made progress since the last iteration of this analysis, where it was a bottom third performer. Today, the company enjoys a narrow lead over Amazon, followed closely by HP and Digital Ocean. IBM, Joyent and Microsoft, meanwhile, round out the offerings here.

It is interesting to note the wider distribution within computing units versus memory, as one example. Where there is comparatively minimal separation between providers with regard to memory per dollar, there are relatively substantive deltas between providers in terms of computing power per package. It isn’t clear that this has any material impact on selection or buying preferences at present, but for compute intensive workloads in particular it is at least worth investigating.

IaaS Price History and Implications

Besides taking apart the base infrastructure pricing on a component basis, one common area of inquiry is how provider prices have changed over time. It is enormously difficult to capture changes across services on a comparative basis over time, for many of the reasons mentioned above.

That being said, as many have inquired on the subject, below is a rough depiction of the pricing trends on a provider by provider basis. In addition to the caveats at the top of this piece, it is necessary to note that the below chart attempts to track only services that have been offered from the initial snapshot moving forward so as to be as consistent as possible. Larger instances recently introduced are not included, therefore, and other recent additions such as Amazon’s m3 SSD-backed package are likewise omitted.

Just as importantly, services cannot be reasonably compared to one another here because their available packages and the attached pricing vary widely; some services included more performant, higher cost offerings initially, and others did not. Comparing the average prices of one to another, therefore, is a futile exercise.

The point of the following chart is instead to try and understand price changes on a per provider basis over time. Nothing more, and nothing less.

(click to embiggen)

Unsurprisingly, the overall trajectory for nearly all providers is down. And the exception – Microsoft – appears to spike only because its base offerings today are far more robust than their historical equivalents. The average price drop for the base level services included in this survey from the initial 2012 snapshot to today was 95%: what might have cost $0.35 – $0.70 an hour in 2012 is more likely to cost $0.10 – $0.30 today. Which raises many qustions, the most common of which is to what degree the above general trend is sustainable: is this a race to a bottom, or are we nearing a pricing floor?

While we are far from having a definitive answer on the subject, early signs point to the latter. In the week preceding Amazon’s re:Invent, Google announced across the board price cuts to varying services, on top of an October 10% price cut. A week later, the fact that Amazon did not feel compelled to respond was the subject of much conversation.

One interpretation of this lack of urgency is that it’s simply a function of Amazon’s dominant role in the market. And to be sure, Amazon is in its own class from an adoption standpoint. The company’s frantic pace of releases, however – 280 in 2013, on pace for 500 this year – suggests a longer term play. The above charts describe pricing trends in one of the most basic elements of cloud infrastructure: compute. They suggest that at present, Amazon is content to be competitive – but is not intent on being the lowest cost supplier.

By keeping pricing low enough to prevent it from being a real impediment to adoption, while growing its service portfolio at a rapid pace, Amazon is able to get customers in the door with minimal friction and upsell them on services that are both much less price sensitive than base infrastructure as well as being stickier. In other words, instead of a race to the bottom, the points of price differentiation articulated by the above charts may be less relevant over time, as costs approach true commodity levels – a de facto floor – and customer attention begins to turn to time savings (higher end services) over capital savings (low prices) as a means of cost reduction.

If this hypothesis is correct, Amazon’s price per category should fall back towards the middle ground over time. If Amazon keeps pace, however, it may very well be a race to the bottom. Either way, it should show up in the charts here.

Disclosure: Amazon, HP, IBM, Microsoft and Rackspace are RedMonk customers. Digital Ocean, Google and Joyent are not.

Link: Here is a link to the dataset used in the above analysis.

Categories: Cloud.

What are the Most Popular Open Source Licenses Today?

For a variety of reasons, not least of which is that fewer people seem to care anymore, it’s been some time since we looked at the popularity of open source licenses. Once one of the more common inquiries we fielded, questions about the relative merits or distribution of licenses have faded as we see both consolidation around choices and increased understanding of the practical implications of various licensing styles. Given the recent affinity for permissive licensing, however, amongst major open source projects such as Cloud Foundry, Docker, Hadoop, Node.js or OpenStack, it’s worth revisiting the question of license choices.

Before we get into the question of how licensing choices have changed, it’s necessary to establish a baseline number for distribution today. While it cannot be considered definitive, Black Duck’s visibility into a wide variety of open source repositories and forges serves as a useful sample size. Based on the Black Duck data, then, the following chart depicts the distribution of usage amongst the ten most popular open source licenses.

(click to embiggen)

Moving left to right, from less popular licenses to the most popular, it is easy to determine the overall winner. As has historically been the case, the free software, copyleft GPLv2 is the most popular license choice according to Black Duck. Besides high profile projects such as Linux or MySQL, the GPL has been the overwhemingly most selected license for years. The last time we examined the Black Duck data in 2012, in fact, the GPL was more popular than the MIT, Artistic, BSD, Apache, MPL and EPL put together.

Popular as the GPL remains, however, it no longer enjoys that kind of advantage. If we group both versions (2 and 3) of the GPL together, the GPL is in use within 37% of the Black Duck surveyed projects. The three primary permissive license choices (Apache/BSD/MIT), on the other hand, collectively are employed by 42%. They represent, in fact, three of the five most popular licenses in use today.

License selection has clearly changed, then, but by how much? For comparison’s sake, here’s a chart of the percent change in license usage from this month’s snapshot of Black Duck’s data versus one from 2009.

(click to embiggen)

As we can see, the biggest loser in terms of share was the GPLv2 and, to a lesser extent, the LGPLv2.1. The decline in usage of the GPLv2 can to some degree be attributed to copyleft license fans choosing instead the updated GPLv3; that license, released in 2007, gained about 6% share from 2009 to 2014. But with usage of version 2 down by about 24%, the update is clearly not the only reason for decreased usage of the GPL.

Instead, the largest single contributing factor to the decline of the GPL’s dominance – it’s worth reiterating, however, that it remains the most popular surveyed license – is the rise of permissive licenses. The two biggest gainers on the above chart, the Apache and MIT licenses, were collectively up 27%. With the BSD license up 1%, the three most popular permissive licenses are collectively up nearly 30% in the aggregate.

While this shift will surprise some, and suggests that much like the high profile of projects like Linux and MySQL led to wider adoption of reciprocal or copyleft-style licenses, Hadoop and others are leaving a sea of permissively licensed projects in their wake.

But the truth is that a correction of some sort was likely inevitable. The heavily skewed distribution towards copyleft licenses was always somewhat unnatural, and therefore less than sustainable over time. What will be interesting to observe moving forward is whether these trends continue, or whether further corrections are in store. Currently, license preferences seem to be accumulating at either ends of the licensing spectrum (reciprocal or permissive); the middle ground in file-based licenses such as the LGPL/MPL remain a relatively distant third category in popularity. Will MPL-licensed projects like the recently opened Manta or SmartDataCenter change that, or are they outliers?

Whatever the outcome, it’s clear we should expect greater diversity amongst licensing choices than we’ve seen in the past. The days of having a single dominant license are, for all practical purposes, over.

Disclosure: Black Duck, the source of this data, has been a RedMonk client but is not currently.

Categories: licensing, Open Source.

Model vs Execution

One of the things that we forget today about SaaS is that we tried it before, and it failed. Coined sometime in 1999 if Google is to be believed, the term “Application Service Provider” (ASP) was applied to companies that delivered software and services over a network connection – what we today commonly call SaaS. By and large this market failed to gain significant traction. Accounts differ as to how and when a) SaaS was coined (IT Toolbox claims it was coined in 2005 by John Koenig) and b) replaced ASP as the term of choice but the fact that ASP could be replaced at all is an indication of its lack of success. While various web based businesses from that period are not only still with us, but in Amazon and Google among the largest in the world, those attempting to sell software via the web rather than deploying it on premise generally did not survive.

A decade plus later, however, and not only has the SaaS model itself survived, but it is increasingly the default approach. The point here isn’t to examine the mechanics of the SaaS business, however; we’ve done that previously (see here or here, for example). The point of bringing up SaaS here, rather, is to serve as a reminder that there’s a difference between model and execution.

Too often in this industry, we look upon a market failure as a permanent indictment of potential. If it didn’t work once, it will never work.

The list of technologies that have been dismissed because they initially failed or seemed unimpressive is long: virtualization was widely regarded as a toy, it’s now an enterprise standard. Smart people once looked at containers and said “neato, why would you want to do that?” Two plus years after Amazon’s creation of the cloud market, then Microsoft CTO Ray Ozzie admitted that cloud “isn’t being taken seriously right now by anybody except Amazon.” In the wake of the anemic adoption – particularly relative to Amazon’s IaaS alternative – of the first iterations of PaaS market pioneers and Google App Engine, many venture capitalists decided that PaaS was a model with no future. DVCS tools like Git were initially scorned and reviled by developers because they were different on a fundamental level.

In each case, it’s important to separate the model from the execution. Too often, failures of the latter are perceived as a fatal flaw in the former. In the case of PaaS, for example, it’s become obvious that the lack of developer adoption was driven by the initial constraints of the first platforms; not having to worry about scaling was an attractive feature, but not worth the sacrifice of having to develop an application in a proprietary language against a proprietary backend that ensured the application could never be easily ported. Half a dozen years later, PaaS platforms are now not only commonly multi-runtime but open source, and growth is strong.

SaaS, meanwhile, would prove to be an excellent model over time, but initially had to contend with headwinds consisting of inconsistent and asymmetrically available broadband, far more functionally limited browser technologies and a user population both averse to risk and brought up on the literal opposite model. In retrospect, it’s no surprise that the ASP market failed; indeed, it’s almost more surprising that the SaaS market followed so quickly on its heels.

In both cases, the initial failures were not attributable to the models. There is in fact demand for PaaS and SaaS, it was simply that the vendors did not (PaaS) or could not (SaaS) execute properly at the time.

Given the rate and pace of change in the technology industry, it is both necessary and inevitable that new technology and approaches are viewed skeptically. As with most innovation, in the technology world or outside of it, failure is the norm. But critical views notwithstanding, it’s important to try and understand the wider context when evaluating the relative merits of competing models. It may well be that the model itself is simply unworkable. But in an industry where what is old is new again, daily, it is far more likely that a current lack of success is due to a failure of or inability to (due to market factors) execute.

In which case, you may want to give that “failed” market a second look. Opportunity may lie within.

Categories: Business Models.

A Few Suggestions for Briefing Analysts

One of the things that happens when you’re a developer focused analyst firm these days is that you talk to a lot of companies. The conversations analysts have with commercial vendors or developers about their projects are called briefings.

Whether the company or project is large or small, old or new, there are always ways to use our collective time – meaning the analyst’s and the company/developer’s – more efficiently and effectively. Having been doing this analyst thing for a little while now, I have a few ideas on what some of those ways might be and thought I’d share them. For anyone briefing an analyst then, I offer the following hopefully helpful suggestions. Best case they’ll make better use of your time, worst case you make the analyst’s life marginally easier, which probably can’t hurt.

  1. Determine how much time you have up front
    This will tend to vary by analyst firm, and sometimes by analyst. At RedMonk, for example, we limit briefings with non-clients to a half hour, a) because we have to talk to a lot of people and b) because very few people have a problem getting us up to speed in that time. It’s important, however, to be aware of this up front. If you think you have an hour, but only have half that, you might present the materials differently.
  2. Unless you’re solving a unique problem, don’t spend your time covering the problem
    If the analyst you’re speaking with is capable, they already understand it well, so time describing it is effectively wasted time. If there’s some aspect of a given market that you perceive differently and break with the conventional wisdom, by all means explain your unique vision of the world (and expect pushback). But a lot of presentations, possibly because they originated as material for non-analysts, spend time describing a market that everyone on the call likely already understands. Jumping right to how you are different, then, is more productive.
  3. If you’re just delivering slides and they’re not confidential (see #4), do not use web meeting software
    If you need to demo an application, web meeting software is acceptable. If you’re just going over slides that aren’t confidential, skip it. Inevitably the meeting software won’t work for someone; they don’t have the right plugin, a dependency is missing, their connections is poor, etc. The downtime while everyone else is waiting for the one meeting participant to download yet another version of web meeting software they probably don’t want is time that everyone else loses and can never get back. Also, it’s nice for analysts to have slides to refer to later.
  4. Don’t have confidential slides
    If you’re actively engaging with an analyst in something material, a potential acquisition for example, confidential slides are pretty much unavoidable. But if you’re doing a simple product briefing, lose the confidential slides. It makes it more difficult to recall later – particularly if a copy of the slides is not made available – what precisely is confidential, and what is not. Which means that analysts may be reticent to discuss news or information you’d like them to, due to the cognitive overhead of having to remember which 5 slides out 40 were confidential. When it comes time to present confidential material, just note that and walk through it verbally.
  5. If you spend the entire time talking, you may miss out on the opportunity for questions later
    It’s natural to want to talk about your product, and the best briefings are conducted by people with good energy and enthusiasm for what they do. That being said, making sure you leave time for questions can gain you valuable insights into what part of your presentation isn’t clear, and – depending on the analyst/firm – may lead to a two way conversation where you can get some of your own questions answered.
  6. Don’t use the phrase “market leader,” let the market tell us that
    This is perhaps just a pet peeve of mine, but my eyebrows definitely go up when vendors claim to be the “market leader.” This is for a few reasons. First, because genuine market leaders should not have to remind you of that. Second, what is the metric? Analysts may not agree with your particular yardstick. Third, because your rankings may not reflect an analyst’s view of the market, and while disagreement is normal it can sidetrack more productive conversations.
  7. Analysts aren’t press, so treating them that way is a mistake
    While frequently categorized together, analysts and press are in reality very different. Attitudes and towards and incentives regarding embargoes, for example, are entirely distinct. Likewise, many vendors and their PR teams send out “story ideas” to analysts, which is pointless because analysts don’t produce “stories” and are rarely on deadline in the way that the press is. What we tell clients all the time is that our job is not to break news or produce “scoops,” it’s to understand the market. If you treat analysts as press that is trying to extract information from you for that purpose, you may miss the opportunity to have a deeper, occasionally confidential, dialogue with an analyst.
  8. Make sure the analyst covers your space; if you don’t know, just ask
    Every analyst, whether generalist or specialist, will have some area of focus. Before you spend your time and theirs describing your product or service, it’s important to determine whether or not they cover your space at all. Every so often, for example, vendor outreach professionals will see that we cover “developers” and try to schedule a briefing for their bodyshop offering developmental services. Given that we don’t generally cover professional services, this isn’t a good use of anybody’s time. The simplest way of determining whether they cover your category, of course – assuming you can’t determine this from their website, Twitter bio, prior coverage, etc – is to just ask.
  9. Asking for feedback “after the call”
    In general, it seems like a harmless request to make at the end of a productive call: “If you think of any other feedback for us after the call, feel free to send this along.” And in most cases, it is relatively innocuous. Another way of interpreting this request, however, is: “Feel free to spend cycles thinking about us and send along free feedback after we’re done.” So you might consider using this request sparingly.
  10. Don’t ask if we want to receive information: that’s just another email thread
    There are very few people today who don’t already receive more email than they want or can handle. To make everyone’s lives simpler, then, it’s best to skip emails that take the form “Hi – We have an important announcement. Would you like to receive our press release concerning this announcement? If so send us an email indicating that you’ll respect the embargo.” As most analysts will respect embargoes – because we’re not press (see #7) – asking an analyst to reply to an email to get yet another email in return is a waste of an email thread. Your best bet is to maintain a list of trusted contacts, and simply distribute the material to them directly.

Those are just a few that occur off the top of my head based on our day to day work. Do these make sense? Are there other questions, or suggestions from folks in the industry?

Categories: Industry Analysis.

The 2014 Monktoberfest

Last Thursday at ten in the morning, this auditorium was full because I made a joke four years ago.

Describing the Monktoberfest to someone who has never been is difficult. Should we focus on the content, where we prioritize talks about social and tech that don’t have a home at other shows but make you think? Or the logistics, where we try to build a conference that loses the things we don’t enjoy from other conferences? Or maybe the most important thing is the hallway track, which is another way of saying the people?

Whatever else it may be, the Monktoberfest is different. It’s different talks, in a different city, given and discussed by different people. Some of those people are developers with a year or two of experience. Others are founders and CEOs. People helping to decide the future of the internet. Those in business school to help build the businesses that will be run on top of it. Startups meeting with incumbents, cats and dogs living together.

Which is, hopefully, what makes it as fun as it is professionally useful. It doesn’t hurt, of course, that the conference’s “second track” – my thanks to Alex King for the analogy – is craft beer.

During the day our attendees are asked to wrap their minds around complicated, nuanced and occasionally controversial issues. What are the social implications and ethics of running services at scale? When you cut through the hype, what does IoT mean for our lives and the way we play? Perhaps most importantly, how is our industry industry actually performing with respect to gender issues and diversity? And what can we, or what must we, do to improve that?

To assist with these deliberations, and to simultaneously expand horizons on what craft beer means, we turn lose two of the best beer people in the world, Leigh and Ryan Travers who run Stillwater Artisinal Ales’ flagship gastropub, Of Love and Regret, down in Baltimore. Whether we’re serving then the Double IPA that Beergraphs ranks as the best beer in the world, canned fresh three days before, or a 2010 Italian sour that was one of 60 bottles ever produced, we’re trying to deliver a fundamentally different and unique experience.

As always, we are not the ones to judge whether we succeeded in that endeavor, but the reactions were both humbling and gratifying.

Out of all of those reactions, however, it is ones like this that really get to us.

The fact that many of you will spend your vacation time and your own money to be with us for the Monktoberfest is, quite frankly, incredible. But it just speaks to the commitment that attendees have to make the event what it is. How many conference organizers, for example, are inundated with offers of help – even if it’s moving boxes – ahead of the show? How many are complimented by the catering staff, every year, that our group is one of the nicest and most friendly they have ever dealt with? How many have attendees that moved other scheduled events specifically so that they could attend the Monktoberfest?

This is our reality.

And as we say over and over, it is what makes all the blood, sweat and tears – and as any event organizer knows, there are always a lot of all three – worth it.

The Credit

Those of you who were at dinner will have heard me say this already, but the majority of the credit for the Monktoberfest belongs elsewhere. My sincere thanks and appreciation to the following parties.

  • Our Sponsors: Without them, there is no Monktoberfest
    • IBM: Once again, IBM stepped up to be the lead sponsor for the Monktoberfest. While it has been over a hundred years since the company was a startup, it has seen the value of what we have collectively created in the Monktoberfest and provided the financial support necessary to make the show happen.
    • Red Hat: As the world’s largest pure play open source company, there are few who appreciate the power of the developer better than Red Hat. Their support as an Abbot Sponsor – the fourth year in a row they’ve sponsored the conference, if I’m not mistaken – helps us make the show possible.
    • Metacloud: Though it is now part of Cisco, Metacloud stood alongside of Red Hat to be an Abbot sponsor and gave us the ability to pull out all the stops – as we are wont to do.
    • EMC: When we post the session videos online in a few weeks, it is EMC that you will have to thank.
    • Mandrill: Did you enjoy the Damariscotta river oysters, the sushi from Miyake, the falafel and sliders bar, or the mac and cheese station? Take a minute to thank the good folks from Mandrill.
    • Atlassian: Whenever you’re enjoying your shiny new Hydro-Flask 40 oz growler – whether it’s filled with a cold beverage or hot cocoa – give a nod to Atlassian, who helped maked them possible. Outside certainly approves of the choice.
    • Apprenda / HP: From the burrito spread to the Oxbow-infused black bean soup, Apprenda and HP are responsible for your lunch.
    • WePay: Like your fine new Teku stemmed tulip glassware? Thank WePay.
    • AWS/BigPanda/CohesiveFT/HP: Maybe you liked the ginger cider, maybe it was the exceedingly rare Italian sour, or maybe still it was the Swiss stout? These are the people that brought it to you.
    • Cashstar: Liked the Union Bagels on Thursday or the breakfast burritors? That was Cashstar’s doing.
    • O’Reilly: Lastly, we’d like to thank the good folks from O’Reilly for being our media partner yet again and bringing you free books.
  • Our Speakers: Every year I have run the Monktoberfest I have been blown away by the quality of our speakers, a reflection of their abilities and the effort they put into crafting their talks. At some point you’d think I’d learn to expect it, but in the meantime I cannot thank them enough. Next to the people, the talks are the single most defining characteristic of the conference, and the quality of the people who are willing to travel to this show and speak for us is humbling.
  • Ryan and Leigh: Those of you who have been to the Monktoberfest previously have likely come to know Ryan and Leigh, but for everyone else they reall are one of the best craft beer teams not just in this country, but the world. And they’re even better people, having spent the better part of the last few months sourcing exceptionally hard to find beers for us. It is an honor to have them at the event, and we appreciate that they take time off from running the fantastic Of Love & Regret to be with us.
  • Lurie Palino: Lurie and her catering crew have done an amazing job for us every year, but this year was the most challenging yet due to some late breaking changes in the weeks before the event. As she does every year, however, she was able to roll with the punches and deliver on an amazing event yet again. With no small assist from her husband, who caught the lobsters, and her incredibly hard working crew at Seacoast Catering.
  • Kate (AKA My Wife): Besides spending virtually all of her non-existent free time over the past few months coordinating caterers, venues and overseeing all of the conference logistics, Kate was responsible for all of the good ideas you’ve enjoyed, whether it was the masseuses two years ago, the cruise last year or the inspired choice of venue this. And she gave an amazing talk on the facts and data behind sexual harassment. I cannot thank her enough.
  • The Staff: Juliane did yeoman’s work organizing many aspects of the conference, including the cruise, and with James secured and managed our sponsors. Marcia handled all of the back end logistics as she does so well – and put up with the enormous growler boxes living at her house for a week. Kim not only worked both days of the show, but traveled down to Baltimore and back by car simply to get things that we couldn’t get anywhere else. Celeste, Cameron, Rachel, Gretchen, Sheila and the rest of the team handled the chaos that is the event itself with ease. We’ve got an incredible team that worked exceptionally hard.
  • Our Brewers: We picked a tough week for brewer appearances this year, as we overlapped with no fewer than three major beer festivals, but The Alchemist was fantastic as always about making sure that our attendees got some of the sweet nectar that is Heady Topper, and Mike Guarracino of Allagash was a huge hit attending both our opening cruise and Thursday dinner. Oxbow Brewing, meanwhile, not only connected us with a few hard to get selections, but loaned us some of the equipment we needed to have everything on tap. Thanks to all involved.
  • Erik Dasque: As anyone who attended dinner is aware, Erik was our drone pilot for the evening. He was gracious enough to get his Phantom up into the air to capture aerial shots of the Audubon facility as well as video of our arriving attendees. Wait till you see his video. In the meantime, here’s a picture.

With that, this year’s Monktoberfest is a wrap. On behalf of myself, everyone who worked on the event, and RedMonk, I thank you for being a part of what we hope is a unique event on your schedule. We’ll get the video up as quickly as we can so you can share your favorite talks elsewhere.

For everyone who was with us, I owe you my sincere thanks. You are why we do this, and you are the Monktoberfest. Stay tuned for details about next year, as we’ve got some special things planned for our 5th anniversary, and in the meantime you might be interested in Thingmonk or the Monki Gras, RedMonk’s other two conferences, as well as the upcoming IoT at Scale conference we’re running with SAP in a few weeks.

Categories: Conferences & Shows.

A Swing of the Pendulum: Are Fragmentation’s Days Numbered?

Foucault's Pendulum

One of the lessons that has stayed with me all these years removed from my History major is the pendulum theory. In short, it asserts that history typically moves within a pendulum’s arc: first swinging in one direction, then returning towards the other. I’ve been thinking about this quite a bit in recent months as the predictable result of widespread developer empowerment becomes more and more visible in virtually all of the metrics we track. Unsurprisingly, when you have two populations making decisions, the larger one leads to a wider array of outcomes. CIOs, as an example, were long content to consolidate on a limited number of runtimes – Java, .NET and a few others. All of the data we see, however, suggests that as the New Kingmakers have begun to rise up and act on their own initiative, the distribution of runtimes employed has exploded. The pendulum, quite obviously, had swung from centralized to fragmented, driven by a fundamental shift in the way that technologies were selected.
The question I’ve been pondering is simple: when does it begin to swing back in the other direction?

If there is any reversal here, it will come from developers. Even the large, CIO-centric incumbents are aware today that developers are in charge, so there’s no evidence to suggest that CIOs have a plausible strategy for putting developers back under thumb. But while over the last few years newly empowered developers have shown an insatiable appetite for new technologies, it hasn’t been clear that this trajectory was sustainable longer term.

Which is I’ve been paying attention, looking for evidence that the pendulum swing might be slowing – even reversing. The data is inconclusive. As Donnie has noted, there have only been five languages that really mattered on a volume basis on Github: JavaScript, Ruby, Java, PHP, and Python. And yet our rankings indicate that while they do indeed represent the fat part of the tail, there is substantial, ongoing volume usage of maybe twenty to thirty on top of that.

What the data won’t say, however, developers themselves will. Witness this piece from Tim Bray:

There is a re­al cost to this con­tin­u­ous widen­ing of the base of knowl­edge a de­vel­op­er has to have to re­main rel­e­van­t. One of today’s buz­zwords is “full-stack developer”. Which sounds good, but there’s a lit­tle guy in the back of my mind scream­ing “You mean I have to know Gra­dle in­ter­nals and ListView fail­ure modes and NSMan­agedOb­ject quirks and Em­ber con­tain­ers and the Ac­tor mod­el and what in­ter­face{} means in Go and Dock­er sup­port vari­a­tion in Cloud provider­s? Color me sus­pi­cious.

Which links to this piece by Ed Finkler:

My tolerance for learning curves grows smaller every day. New technologies, once exciting for the sake of newness, now seem like hassles. I’m less and less tolerant of hokey marketing filled with superlatives. I value stability and clarity.

Which elicited this reponse from Marco Arment:

I feel the same way, and it’s one of the reasons I’ve lost almost all interest in being a web developer. The client-side app world is much more stable, favoring deep knowledge of infrequent changes over the constant barrage of new, not necessarily better but at least different technologies, libraries, frameworks, techniques, and methodologies that burden professional web development.

Which in turn prompted a response from Matt Gemmell entitled “Confessions of an Ex-Developer”:

I’m glad there are no compilers (visible) in my life. I’m also glad that I can view the WWDC keynote as a tourist, without any approaching tension headache as I think about what I’ll need to add, or change, or remove. I can drift languidly along on the slow-moving current of the everyday web, indulging an old habit when a rainy evening comes by.

It’s a profoundly relaxing thing to be able to observe the technology industry without being invested in it. I’m glad I’m not making software anymore.

To be clear, these are merely four developers. Four experienced developers, more importantly. It may very well be that their experiences are nothing more than a natural and understandable change in priorities that comes with age.

But their experience seems to mirror a logical reaction to a very rapid set of transformations in this industry. Given the hypothesis that the furious rate of change and creation in technology will at some point hit a point of diminishing returns, then become actively counterproductive, it follows that these could merely be the bleeding edge of a more active backlash against complexity. Developers have historically had an insatiable appetite for new technology, but it could be that we’re approaching the too-much-of-a-good-thing stage. In which case, the logical outcome will be a gradual slowing of fragmentation followed by gradual consolidation. Market outcomes would be dependent on individual differences between rates of change, the negative impacts of fragmentation and so on.

It may be difficult to conceive of a return to a more simple environment, but remember that the Cambrian explosion the current rate of innovation is often compared to was itself very brief – in geologic terms, at least. Unnatural rates of change are by definition unnatural, and therefore difficult to sustain over time. It is doubtful that we’ll ever see a return to the radically more simple environment created by the early software giants, but it’s likely that we’ll see dramatically fewer popular options per category.

Whether we’re reaching apex of the swing towards fragmentation is debatable, less so is the fact that the pendulum will swing the other way eventually. It’s not a matter of if, but when.

Categories: Programming Languages.

What is the Atomic Unit of Computing?

defining the unit of atomic weight

According to published reports, Docker (neé dotCloud) is in the process of securing $40M in financing. Update Originally mis-stated the amount of financing, but the substance of the post stands.

If popularity is a guiding metric, this infusion will come as no surprise. Docker is one of the fastest growing projects we have ever seen at RedMonk, and virtually no one we speak with is surprised to hear that. In a little over a year, Docker has exploded into a technology that is seeing near universal uptake, from traditional enterprise IT suppliers (e.g. Red Hat) to emerging infrastructure players (e.g. Google).

There are many questions currently being asked about Docker. Most obviously, why now? The idea of containers is not new, and conceptually can be dated back to the mainframe, with more recent implementations ranging from FreeBSD Jails to Solaris Zones. What is about Docker that it has captured mainstream interest where previous container technologies were unable to?

Rather than one explanation, it is likely a combination of factors. Most obviously, there is the popularity of the underlying platform. Linux is exponentially more popular today than any of the other platforms offering containers have been. Containers are an important, perhaps transformative feature. But they historically haven’t been enough to compel a switch from one operating system to another.

Perhaps more importantly, however, there are two larger industry shifts at work which ease the adoption of container technologies. First, there is the near ubiquity of virtualization within the enterprise. When Solaris Zones dropped in 2004, for example, VMware was six years old, five months from being bought by EMC (in a move that baffled the industry) and three years away from an IPO. Ten years later, and virtualization is, quite literally, everywhere. At OSCON, for example, one database expert noted that somewhere between 30% and 50% of his very large database workloads were running virtualized. The last workload to be virtualized, in other words, is almost half the time. Just as the ASP market failure paved the way for the later SAAS market entrants, the long fight for virtualization acceptance has likely eased the adoption of container technologies like Docker.

More specific to containers specifically, however, is the steady erosion in the importance of the operating system. To be sure, packaged applications and many infrastructure components are still heavily dependent on operating system-specific certifications and support packages. But it’s difficult to make the case that the operating system is as all powerful as it was, given the complete reversal of attitudes towards Ubuntu in the cloud era. Prior to the ascension of Amazon and other public cloud suppliers, large scale enterprise support on a general basis was near zero. Today, besides being by far and away the most popular distribution on Amazon, Ubuntu is supported by those same enterprise stalwarts from HP to IBM. Nor has IAAS been the only factor in the ongoing disintermediation of the operating system; as discussed previously, PAAS is the new middleware, and middleware’s explicit mission has historically been to abstract the application from the operating system underneath it.

These developments imply that there is a shift at work in the overall market importance of the operating system (a shift that we have been expecting since 2010), which in turn helps explain how containers have become so popular so quickly. Unlike virtual machines, which replicate an entire operating system, containers act like a diff of two different images. Operating system components to the two images are shared, leaving the container to house just the difference: little more the application and any specific dependent libraries, etc. Which means that containers are substantially lighter weight than full VMs. If applications are heavily operating system dependent and you run a mix of operating systems, containers will be problematic. If the operating system is a less important question, however, containers are a means of achieving much higher application density on a given instance versus virtual machines fully emulating an operating system.

Taken in the aggregate, this is at least a partial explanation for the question of “why now?” As is typical with dramatic movements, Docker’s success is as much about context as the quality of the underlying technology – intending no disrespect to the Docker engineers, of course. Engineering is critical, it’s just that timing is usually more critical.

The most important question about Docker, however, isn’t “why now?” It is rather the one being asked more rarely today, by those struggling to understand where the often overlapping puzzle pieces fit. The explosion of Docker’s popularity begs a more fundamental question: what is the atomic unit of infrastructure moving forward? At one point in time, this was a server: applications were conceived of, and deployed to, a given physical machine. More recently, the base element of an infrastructure was a virtual recreation of that physical machine. Whether you defined that as Amazon did or VMware might was less important than the idea that an image resembling a server, from virtualized hardware and networking interfaces to a full instance of an operating system, was the base unit from which everything else was composed.

Containers generally and Docker specifically challenge that notion, treating the operating system and everything beneath as a shared substrate, a universal foundation that’s not much more interesting the raised floor of a datacenter. For containers, the base unit of construction is the application. That’s the only real unique element.

What this means yet is undetermined. Users are for the most part years away from understanding this division, let alone digesting its implications. But vendors and projects alike should, and in some cases are, beginning to critically evaluate the lens through which they view the world. Infrastructure players like VMware and the OpenStack ecosystem, for example, need to project forward the potential opportunities and threats presented by an application as opposed to VM-centric worldview, while Docker and others in similar orbits (e.g. Cloud Foundry) conversely need to consider how to traverse the comprehension gap between what users expect and what they get.

Google App Engine, and others, remember, tried to sublimate the underlying infrastructure in the first generation of PAAS offerings and the result was a market dwarfed by IAAS – which not coincidentally looked a lot more like the physical infrastructure customers were used to. But as the Turkey Fallacy states, “it hasn’t happened so it won’t happen” is not the most sustainable defense imaginable. Just because PAAS struggled to get customers beyond thinking in terms of physical hardware doesn’t mean that Docker will as well.

In any event, expect to see players on both sides of the VM / app divide aggressively jockeying for position, as no one wants to be the one left without a chair when the music stops.

Categories: Containers, Open Source, Virtualization.