Skip to content

What is the Atomic Unit of Computing?

defining the unit of atomic weight

According to published reports, Docker (neé dotCloud) is in the process of securing $40M in financing. Update Originally mis-stated the amount of financing, but the substance of the post stands.

If popularity is a guiding metric, this infusion will come as no surprise. Docker is one of the fastest growing projects we have ever seen at RedMonk, and virtually no one we speak with is surprised to hear that. In a little over a year, Docker has exploded into a technology that is seeing near universal uptake, from traditional enterprise IT suppliers (e.g. Red Hat) to emerging infrastructure players (e.g. Google).

There are many questions currently being asked about Docker. Most obviously, why now? The idea of containers is not new, and conceptually can be dated back to the mainframe, with more recent implementations ranging from FreeBSD Jails to Solaris Zones. What is about Docker that it has captured mainstream interest where previous container technologies were unable to?

Rather than one explanation, it is likely a combination of factors. Most obviously, there is the popularity of the underlying platform. Linux is exponentially more popular today than any of the other platforms offering containers have been. Containers are an important, perhaps transformative feature. But they historically haven’t been enough to compel a switch from one operating system to another.

Perhaps more importantly, however, there are two larger industry shifts at work which ease the adoption of container technologies. First, there is the near ubiquity of virtualization within the enterprise. When Solaris Zones dropped in 2004, for example, VMware was six years old, five months from being bought by EMC (in a move that baffled the industry) and three years away from an IPO. Ten years later, and virtualization is, quite literally, everywhere. At OSCON, for example, one database expert noted that somewhere between 30% and 50% of his very large database workloads were running virtualized. The last workload to be virtualized, in other words, is almost half the time. Just as the ASP market failure paved the way for the later SAAS market entrants, the long fight for virtualization acceptance has likely eased the adoption of container technologies like Docker.

More specific to containers specifically, however, is the steady erosion in the importance of the operating system. To be sure, packaged applications and many infrastructure components are still heavily dependent on operating system-specific certifications and support packages. But it’s difficult to make the case that the operating system is as all powerful as it was, given the complete reversal of attitudes towards Ubuntu in the cloud era. Prior to the ascension of Amazon and other public cloud suppliers, large scale enterprise support on a general basis was near zero. Today, besides being by far and away the most popular distribution on Amazon, Ubuntu is supported by those same enterprise stalwarts from HP to IBM. Nor has IAAS been the only factor in the ongoing disintermediation of the operating system; as discussed previously, PAAS is the new middleware, and middleware’s explicit mission has historically been to abstract the application from the operating system underneath it.

These developments imply that there is a shift at work in the overall market importance of the operating system (a shift that we have been expecting since 2010), which in turn helps explain how containers have become so popular so quickly. Unlike virtual machines, which replicate an entire operating system, containers act like a diff of two different images. Operating system components to the two images are shared, leaving the container to house just the difference: little more the application and any specific dependent libraries, etc. Which means that containers are substantially lighter weight than full VMs. If applications are heavily operating system dependent and you run a mix of operating systems, containers will be problematic. If the operating system is a less important question, however, containers are a means of achieving much higher application density on a given instance versus virtual machines fully emulating an operating system.

Taken in the aggregate, this is at least a partial explanation for the question of “why now?” As is typical with dramatic movements, Docker’s success is as much about context as the quality of the underlying technology – intending no disrespect to the Docker engineers, of course. Engineering is critical, it’s just that timing is usually more critical.

The most important question about Docker, however, isn’t “why now?” It is rather the one being asked more rarely today, by those struggling to understand where the often overlapping puzzle pieces fit. The explosion of Docker’s popularity begs a more fundamental question: what is the atomic unit of infrastructure moving forward? At one point in time, this was a server: applications were conceived of, and deployed to, a given physical machine. More recently, the base element of an infrastructure was a virtual recreation of that physical machine. Whether you defined that as Amazon did or VMware might was less important than the idea that an image resembling a server, from virtualized hardware and networking interfaces to a full instance of an operating system, was the base unit from which everything else was composed.

Containers generally and Docker specifically challenge that notion, treating the operating system and everything beneath as a shared substrate, a universal foundation that’s not much more interesting the raised floor of a datacenter. For containers, the base unit of construction is the application. That’s the only real unique element.

What this means yet is undetermined. Users are for the most part years away from understanding this division, let alone digesting its implications. But vendors and projects alike should, and in some cases are, beginning to critically evaluate the lens through which they view the world. Infrastructure players like VMware and the OpenStack ecosystem, for example, need to project forward the potential opportunities and threats presented by an application as opposed to VM-centric worldview, while Docker and others in similar orbits (e.g. Cloud Foundry) conversely need to consider how to traverse the comprehension gap between what users expect and what they get.

Google App Engine, and others, remember, tried to sublimate the underlying infrastructure in the first generation of PAAS offerings and the result was a market dwarfed by IAAS – which not coincidentally looked a lot more like the physical infrastructure customers were used to. But as the Turkey Fallacy states, “it hasn’t happened so it won’t happen” is not the most sustainable defense imaginable. Just because PAAS struggled to get customers beyond thinking in terms of physical hardware doesn’t mean that Docker will as well.

In any event, expect to see players on both sides of the VM / app divide aggressively jockeying for position, as no one wants to be the one left without a chair when the music stops.

Categories: Containers, Open Source, Virtualization.

SaaS vs The Perpetual License Model

Founded in 1998, VMware went public on August 14, 2007. Founded a year after VMware in 1999, held its initial public offering three years earlier on June 23, 2004. From a temporal standpoint, then, the two companies are peers. Which is one reason that it’s interesting to examine how they have performed to date, and how they might perform moving forward.

Another is the fact that they represent, at least for now, radically different approaches to the market. VMware, of course, is the dominant player in virtualization and markets adjacent to that space. Where Microsoft built its enormous value in part by selling licenses to the operating system upon which workloads of all shapes and sizes were run, VMware’s revenue has been driven primarily by the sales of software that makes operating systems like Windows virtual., on the other hand, has been since its inception the canonical example for what eventually came to be known as Software-as-a-Service, the sale of software centrally hosted and consumed via a browser. Not only is’s software – or service, as they might prefer it be referred to – consumed very differently than software from VMware, the licensing and revenue model of services-based offerings is quite distinct from the traditional perpetual license business.

It has been argued in this space many times previously (see here, for example) that the perpetual license model that has dominated the industry for decades is increasingly under pressure from a number of challengers ranging from open source competition to software sold, as in the case of, as a service. In spite of the economic challenge inherent to running a business where your revenue is realized over a long period of time that competes with those that get all of their money up front, the bet here is that the latter model will increasingly give way to the former.

Which is why it’s interesting, and perhaps informative, to examine the relative performance, and market valuation, of two peers that have taken very different approaches. What does the market believe about the two models, and what does their performance tell us if anything?

It is necessary to acknowledge before continuing that in addition to the difference in terms of model, Salesforce and VMware are not peers from a categorical standpoint. The former is primary an applications vendor, its investments in platforms such as or Heroku notwithstanding, while the latter is an infrastructure player, in spite of its dalliances with application plays such as Zimbra. While the comparison is hardly apples to apples, therefore, it is nevertheless instructive in that both vendors compete for a percentage of organizational IT spend. But keep that distinction in mind regardless.

As for the most basic metric, market capitalization, VMware is currently worth more than Salesforce: $42 billion to $35 billion as I write this. Even the most cursory examination of some basic financial metrics will explain why. Since it began reporting in 2003, has on a net basis lost just over $350 million. VMware, for its part, has generated a net profit of almost $4 billion.

If we examine the quarterly gross profit margins of the two entities over time, these numbers make sense.

Apart from some initially comparable margins for Salesforce, VMware has consistently commanded a higher margin than its SaaS-based counterpart. While this chart likely won’t surprise anyone who’s tracked the markets broadly or the companies specifically, however, it’s interesting to note the fork in the 2010-2011 timeframe. Through 2010, the margins appeared to be on a path to converge; after, they diverged aggressively. Nor is this the only metric in which this pattern is visible.

We see the same trajectories in a plot of the quarterly net income, for example, but this is to be expected since this is in part a function of the profit generated.

The question is what changed in the 2010 timeframe, and one answer appears to be investments in scale. The following chart depicts the gross property, plant and equipment charges for both firms over the same timeframe.

Notice in particular the sharp spike beginning in 2010-2011 for Salesforce. After six years as a public entity, the company began investing in earnest, which undoubtedly had consequences partially reflected in the charts above. VMware’s expenditures here are interesting for their part, because the conventional wisdom is that it is the services firms like Salesforce, Amazon, Google or increasingly Microsoft whose PP&E will reflect their necessary investments in the infrastructure to deliver services at scale. Granted, VMware’s vCloud Hybrid Service is intended to serve as a public infrastructure offering, but this was intended to be an “an asset-light model” which presumably would not have commanded the same infrastructure investments. Nevertheless, VMware outpaced Salesforce until 2014.

The question, whether it’s from the perspective of an analyst or an investor, is what the returns have been on this dramatic increase in spending. Obviously in Salesforce’s case, its capital investments have dragged down its income and margins, while VMware’s dominant market position has allowed it to not only sustain its pricing but grow its profitability. But what about revenue growth?

One of the strongest arguments in favor of SaaS products is convenience; it’s far less complicated to sign up to a service hosted externally than it is to build and host your own. If convenience is indeed a driver for adoption, greater revenue growth is one potential outcome: if it’s easier to buy, it will be bought more. This is, in fact, a necessity for Salesforce: if you’re going to trade losses now for growth, you need the growth. To some extent, this is what we see when we compare the two companies.

The initial decline from 70+ percent growth for both companies is likely the inevitable product of simple math: the more you make, the harder it is to grow. While we can discount the first half of this chart, the second half is intriguing in that it is a reversal of the pattern we have seen above. While VMware solidly outperformed Salesforce at a growing rate in profit and income, Salesforce, beginning about a year after its PP&E investments picked up, has grown its revenue at a higher rate than has VMware. Early in this period you could argue the rate differential was a function of revenue disparities, but the delta between the revenue numbers last quarter was less than 10%.

In general, none of these results should be surprising. VMware has successfully capitalized on a dominant position in a valuable market and the financial results demonstrate that. Salesforce, as investors appear to have recognized, is clearly trading short term losses against the longer term return. While there is some evidence to suggest that Salesforce’s strategy is beginning to see results, and VMware is probably paying closer attention to its overall ability to grow revenue, it’s still very early days. It’s equally possible that one or both are poor representatives for their respective approach. It will be interesting to monitor these numbers over time, however, to try and test how the two models continue to perform versus one another.

Categories: Business Models, Software-as-a-Service.

The RedMonk Programming Language Rankings: June 2014

As we settle into a roughly bi-annual schedule for our programming language rankings, it is now time for the second drop of the year. This being the second run since GitHub retired its own rankings forcing us to replicate them by querying the GitHub archive, we are continuing to monitor the rankings for material differences between current and past rankings. While we’ve had slightly more movement than is typical, however, by and large the results have remained fairly consistent.

One important trend worth tracking, however, is the correlation between the GitHub and Stack Overflow rankings. This is the second consecutive period in which the relationship between how popular a language is on GitHub versus Stack Overflow has weakened; this run’s .74 is in fact the lowest observed correlation to date. Historically, the number has been closer to .80. With only two datapoints indicating a weakening – and given the fact that at nearly .75, the correlation remains strong – it is premature to speculate as to cause. But it will be interesting to monitor this relationship over time; should GitHub and Stack Overflow continue to drift apart in terms of programming language traction, it would be news.

For the time being, however, the focus will remain on the current rankings. Before we continue, please keep in mind the usual caveats.

  • To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.
  • No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.
  • There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.
  • All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinguishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.
  • GitHub language rankings are based on raw lines of code, which means that repositories written in a given language that include a greater number amount of code in a second language (e.g. JavaScript) will be read as the latter rather than the former.
  • In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top tiers of languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.


(click to embiggen the chart)

Besides the above plot, which can be difficult to parse even at full size, we offer the following numerical rankings. As will be observed, this run produced several ties which are reflected below.

1 Java / JavaScript
4 Python
5 C#
6 C++ / Ruby
9 C
10 Objective-C
11 Shell
12 Perl
13 R
14 Scala
15 Haskell
16 Matlab
17 Visual Basic
18 CoffeeScript
19 Clojure / Groovy

Most notable for advocates of either Java or JavaScript is the tie atop these rankings. This finding is not surprising in light of the fact that one or the other – most commonly JavaScript – has been atop our rankings as long as we have had them, with the loser invariably finishing in second place. For this run, however, the two languages find themselves in a statistical tie. While the actual placement is, as mentioned above, not particularly significant from an overall share perspective, the continued, sustained popularity of these two runtimes is notable.

Aside from that tie, the rest of the Top 10 is relatively stable. Python retook fourth place from C#, and CSS pushed back C and Objective-C, but these changes notwithstanding the elite performers in this ranking remain elite performers. PHP, as one example, remains rock steady in third behind the Java/JavaScript tandem, and aside from a slight decline from Ruby (5 in 2013, 7 today) little else has changed. Which means that the majority of the interesting activity occurred further down the spectrum. A few notes below on notable movements from selected languages.

  • R: Advocates of R will be pleased by the language’s fourth consecutive gain in the rankings. From 18 in January of 2013 to 13 in this run, the R language continues to rise. Astute observers might note by comparing plots that this is in part due to growth on GitHub; while R has always performed well on Stack Overflow due to the volume of questions and answers, it has tended to be under-represented on GitHub. This appears to be slowly changing, however, in spite of competition from Python, issues with the runtime itself and so on.
  • Go: Like R, Go is sustaining its upward trajectory in the rankings. It didn’t match its six place jump from our last run, but the language moved up another spot and sits just outside the Top 20 at 21. While we caution against reading much into the actual placement on these rankings, where differences between spots can over-represent only marginal differences in performance, we do track changes in trajectory closely. While its 21st spot, therefore, may not distinguish it materially from the languages directly above or behind it, its trendline within these rankings does. Given the movement to date, as well as the qualitative evidence we see in terms of projects moving to Go from other alternatives, it is not unreasonable to expect Go to be a Top 20 language within the next six to twelve months.
  • Perl: Perl, on the other hand, is trending in the opposite direction. Its decline has been slow, to be fair, dropping from 10 only down to 12 in our latest rankings, but it’s one of the few Tier 1 languages that has experienced a decline with no offsetting growth since we have been conducting these rankings. While Perl was the glue that pulled together the early web, many believe the Perl 5 versus Perl 6 divide has fractured that userbase, and at the very least has throttled adoption. While the causative factors are debatable, however, the evidence – both quantitative and qualitative – points to a runtime that is less competitive and significant than it once was.
  • Julia/Rust: Two of the first quarter’s three languages to watch – Elixir didn’t demonstrate the same improvement – continued to their rise. Each jumped 5 spots from 62/63 to 57/58. This leaves them still well outside the second tier of languages, but they continue to climb in our rankings. For differing reasons, these two languages are proving to be popular sources of investigation and experimentation, and it’s certainly possible that one or both could follow in Go’s footsteps and make their way up the rankings into the second tier of languages at a minimum.
  • Dart: Dart, Google’s potential replacement for JavaScript, is a language we receive period inquiries about, although not as a high a volume of them as might be expected. It experienced no movement since our last ranking, placing 39 in both of our last two runs. And while solidly in the second tier at that score, it hasn’t demonstrated to date the same potential for rapid uptake that Go has – in all likelihood because its intended target, JavaScript, has sustained its overwhelming popularity.
  • Swift: Making its debut on our rankings in the wake of its announcement at WWDC is Swift, which checks in at 68 on our board. Depending on your perspective, this is either low for a language this significant or impressive for a language that is a few weeks old. Either way, it seems clear that – whatever its technical issues and limitations – Swift is a language that is going to be a lot more popular, and very soon. It might be cheating, but Swift is our language to watch this quarter.

Big picture, the takeaway from the rankings is that the language diversity explored most recently by my colleague remains the norm. While the Top 20 continues to be relatively static, we do see longer term trends adding new players (e.g. Go) to this mix. Whatever the resulting mix, however, it will ultimately be a reflection of developers’ desires to use the best tool for the job.

Categories: Programming Languages.

What Everyone is Missing About as-a-Service Businesses

Server room with grass!

Following the departure of Steve Ballmer, one of the outgoing executive’s defenders pointed to Microsoft’s profit over his tenure relative to a number of other competitors. One of those was, which compared negatively for the simple reason that it has not generated a net profit. This swipe was in keeping with the common industry criticism of other services based firms from Amazon to Workday. As far as balance sheets are concerned, services plays – be they infrastructure, platform, software or a combination of all three – are poor investments. Which in turn explains why the upward trajectory common to the share prices of firms of this type has generated talk of a bubble.

Recently, however, Andreessen Horowitz’ Preethi Kasireddy and Scott Kupor questioned in print and podcast form the mechanics of how SaaS firms in particular are being evaluated. The source will be an issue for some, undoubtedly, as venture capitalists have a long history of creatively interpreting financial metrics and macro-industry trends for their own benefit. Kasireddy and Kupor’s explanations, however, are simple, digestible and rooted in actual metrics as opposed to the “eyeballs” that fueled the industry’s last recreation of tulipmania.

The most obvious issue they highlight with services entities versus perpetual software models is revenue recognition. Traditional licenses are paid up front, which means that vendors can apply the entire sale to the quarter it was received which a) provides a revenue jolt and b) helps offset the incurred expenses. Services firms, however, have typically incurred all of the costs of development and customer acquisition up front but are only able to recognize the revenue as it is delivered. As they put it,

The customer often only pays for the service one month or year at a time — but the software business has to pay its full expenses immediately…

The key takeaway here is that in a young SaaS business, growth exacerbates cash flow — the faster it grows, the more up-front sales expense it incurs without the corresponding incoming cash from customer subscriptions fees.

The logical question, then, is this: if services are such a poor business model, why would anyone invest in companies built on them? According to Kasireddy and Kupor, the answer is essentially the ratio of customer lifetime value (LTV) to customer acquisition costs (CAC). Their argument ultimately can be reduced to LTV, which they calculate by some basic math involving the annual recurring revenue, gross margin, churn rate and discount rate, and then measure against CAC to produce a picture of business’s productivity. The higher the multiple of a given customers’s lifetime value relative to the costs of acquiring same, obviously, the better the business.

Clearly businesses betting on services are doing so – at least in part, competitive pressures are another key driver – because they believe that while the benefits of upfront perpetual licensing are tempting, the more sustainable, higher margin approach over the longer term is subscriptions. Businesses like Adobe who have made this transition would not willingly leave the capital windfall on the table otherwise. Which means that while we need more market data over time to properly evaluate Kasireddy and Kupor’s valuation model over time, in particular their contention that services plays will be winner-take-all by default, it is difficult to argue the point that amortizing license fees over time allows vendors to extract a premium that is difficult to replicate with up front, windfall-style licensing. Even if this alternative model cannot justify current valuations, ultimately, it remains a compelling argument for why services based companies should not be evaluated in the same manner as their on premise counterparts.

But there is one other advantage to services based businesses that Kasireddy and Kupor did not cover, or even mention. If you listen to the podcast or read the linked piece, the word “data” is mentioned zero times. Which is an interesting omission, because from this vantage point data is one of the most crucial structural advantages to services based businesses. There are others, certainly: the two mention the R&D savings, for example, that are realized by supporting a single version of an application versus multiple versions over multiple platforms. But potentially far more important from my perspective is the inherent advantage IaaS, PaaS and SaaS vendors have in visibility. Services platforms operating at scale have the opportunity to monitor, to whatever level of detail they prefer, customer behaviors ranging from transactional velocity, collaboration rates, technology preferences, deployment patterns, seasonal consumption trends – literally anything. They can tell an organization how this data is trending over time, they can compare a customer against baselines of all other customers, customers in their industry, or direct competitors.

Traditional vendors of on premise technologies sold under perpetual licenses need to ask permission to audit the infrastructure in any form, and to date vendors have been reluctant to grant this permission widely, in part due to horrifically negative experiences with vendor licensing audit teams. By hosting the infrastructure and customers centrally, however, services based firms are granted virtually by default information that sellers of on premise solutions would only be able to extract from a subset of customers. There is a level of permission inherent in working off of remotely hosted applications. It is necessary to accept the reality that literally every action, every behavior, every choice can (and should) be tracked for later usage.

How this data is ultimately used depends, of course, on the business. Companies like Google might leverage the data stored and tracked to serve you more heavily optimized ads or to decide whether or not to roll out more widely a selectively deployed new feature. Amazon might help guide technology choices by providing some transparency into which of a given operating system, database and so on was more widely used, and in what context. The Salesforce and Workday’s of the world, meanwhile, can observe in detail business practices, compare them with other customers and then present those findings back to its customers. For a fee, of course.

Which is ultimately why data is an odd asset to ignore when discussing the valuation of services firms. Should they execute properly, vendors whose products are consumed as a service are strongly differentiated from software-only players. They effectively begin selling not just software, but an amalgam of software and the data-based insights gleaned from every other user of the product – a combination that would be difficult, if not impossible, for on premise software vendors to replicate. Given this ability to differentiate, it seems likely that services firms over time will command a premium, or at the very least introduce premium services, on top of the base software experience they’re delivering. This will become increasingly important over time. And over time, the data becomes a more and more significant barrier to entry, as Apple learned quite painfully.

This idea of leveraging software to generate data isn’t new, of course. We at RedMonk have been writing about it since at least 2007. For reasons that are not apparent, however, very few seem to be factoring it into their public valuations of services based businesses. Pieces like the above notwithstanding, we do not expect this to continue.

Categories: Cloud, Data, Economics.

Consciously Decoupling: Microservices, Swarm and the Unix Philosophy

No ifdefs

Even though the UNIX system introduces a number of innovative programs and techniques, no single program or idea makes it work well. Instead, what makes it effective is the approach to programming, a philosophy of using the computer. Although that philosophy can’t be written down in a single sentence, at its heart is the idea that the power of a system comes more from the relationships among programs than from the programs themselves. Many UNIX programs do quite trivial things in isolation, but, combined with other programs, become general and useful tools.
- The UNIX Programming Environment, Brian Kernighan and Rob Pike

“Is the ‘all in one’ story compelling or should we separate out a [redacted] LOB?” is a question we fielded from a RedMonk client this week, and it’s an increasingly common inquiry topic. For years, functional accretion – in which feature after feature is layered upon a basic application foundation – has been the norm. The advantage to this “all in one” approach is that buyers need only to make one decision, and refer to one seller for support. This simple choice, of course, has a cost: the jack of all trades is the master of none.

At RedMonk, we have been arguing for many years that the developer has evolved from virtual serf to de facto kingmaker. Accepting that, at least for the sake of argument, it is worth asking whether one of the unintended consequences of this transition may be a return to Unix-style philosophies.

The most obvious example of this in the enterprise market today is so-called microservices. Much like Unix programs, many services by themselves are trivial in isolation, but leveraged in concert can be tremendously powerful tools. This demands, of course, an Amazon-level commitment to services, such that every facet of a given infrastructure may be consumed independently and on demand – and this level of commitment is rare. But with even large incumbents increasingly focused on making their existing software portfolios available as services, the trend towards services broadly is real and clearly sustainable.

The trend towards microservices, which are much more granular in nature, is more recent and thus more difficult to project longer term (particularly given some of the costs), but certainly exploding in popularity. Sessions like Adrian Cockroft’s “Migrating to Microservices” are regularly standing room only with lines wrapping down two halls. The parallels between the Unix philosophy and microservices are obvious, in that both essentially are devoted to the idea of composable applications built from programs that do one thing well.

These types of services are difficult to sell to traditional IT buyers, who might not understand them well enough, would prefer to make a single decision or both. But developers understand the idea perfectly, and would prefer to choose a service that does what they need it to over one that may or may not do what they need but does ten things they don’t. It’s easy, then, to see microservices as the latest manifestation of the developer kingmaker.

It’s not as easy, however, to understand a similar trend in the consumer application space. In recent months, rather than continue trying to build a single application that serviced both file sharing and photo sharing needs, Dropbox split its application into the traditional Dropbox (Files) and the newly launched Carousel (Photos). Foursquare today released an application called Swarm, which essentially forks its business into two divisions: Foursquare, a Yelp competitor, and Swarm, a geo-based social network. Facebook, meanwhile, ripped out the messaging component of its core application in April because, as Mark Zuckerberg described it:

The reason why we’re doing that is we found that having it as a second-class thing inside the Facebook app makes it so there’s more friction to replying to messages, so we would rather have people be using a more focused experience for that.

Like enterprise tech, consumer technology has been trending towards all-in-one for sometime as pieces like the “Surprisingly Long List of Everything Smartphones Replaced” highlight. But if Facebook, Foursquare and Twitter are any indication, it may be that a Unix philosophy renaissance is underway in the consumer space as well, even if the causative factors aren’t as obvious.

All of which means that our answer to the opening question should come as no surprise: we advised our client to separate out a new line of business. Particularly when developers are involved, it remains more effective to offer products that do one thing well, however trivial. As long as the Unix-ization of tech continues, you might consider doing the same.

Categories: Microservices.

Software and EMC

Among the predictions in this space for the year 2014 was the idea that disruption was coming to storage. Having looked at the numbers, this prediction may have been off: disruption had apparently already arrived. By my math, these are EMC’s revenue growth rates for the last four years for its Information Infrastructure business: 18.43% (2010), 17.92% (2011), 2.05% (2012), 3.48% (2013). While the Information Infrastructure includes a few different businesses, Information Storage – what EMC is best known for – is responsible for around 91% of the revenue for the Information Infrastructure reporting category. And Information Infrastructure, in turn, generates 77% of EMC’s total consolidated revenue – the rest is mostly VMware (22%).

All of this tells us two things. One, that EMC has seen a multi-year downward trajectory in its ability to grow its storage business, and two, that storage is responsible for the majority of the company’s revenue. Put one and two together and it’s clear that the company has a problem.

How the company has reacted to these developments, meanwhile, can help observers gain a better understanding of what EMC believes are the causes to this under-performance. Based on the announcements at EMC World, it’s easy to sum up the company’s strategic response in one word: software. From ScaleIO to ViPR to the acquisition of DSSD and its team of ex-Solaris engineers, a lot of the really interesting news at EMC World was about software, which is an interesting shift for a hardware company. EMC is committed enough to its software strategy, in fact, that it’s willing to directly compete with its subsidiaries.

If it’s true that EMC is betting heavily on software to restore its hardware growth, the next logical question is whether this is the appropriate response. Based on what happened to the major commodity compute players – Dell has gone private, HP is charging for firmware and IBM left the market entirely – it’s difficult to argue for a different course of action. It seems unlikely that the optimal approach moving forward for EMC – or any other storage provider, for that matter – is going to be heavy hardware engineering. There are customers, particularly loyal EMC customers, that are hungry for hardware innovation and will continue to pay outsized margins for that gear moving forward. There are many more customers, however, willing to explore software abstractions layered on top of commodity hardware, otherwise known as software-defined storage. There’s a reason that EMC’s primary points of comparison were vendors like Amazon and Google rather than its traditional competitors.

Like its counterparts in the networking space who are coping with the implications of software-defined offerings in their space, EMC essentially had two choices: bury its head in the sand and pretend that the business is fine, or begin to tactically incorporate disruptive elements as part of a longer term strategy for adapting its business. Which is another way of saying that the company only really had one realistic choice, which to its credit was made: EMC is clearly adapting. Software-defined storage was a common topic of discussion at the company’s event this week, and while there are still areas where the embrace is awkward, the company clearly understands the challenge ahead and is taking steps to adjust its product lines and the models behind them. The transition to what it calls the “third platform” – EMC’s terminology for the cloud – will pose monumental challenges to the business longer term, but by betting on software EMC is investing in the area most likely to deliver differentiated value over time.

The biggest problem with the transition to the “third platform,” however, isn’t going to be their engineering response. As the company likes to point out, it is investing heavily in both M&A and traditional R&D, and with names like Bechtolsheim, Bonwick, Shapiro et al coming on board it’ll have the requisite brainpower available. But the problem with its current strategy is that it does little to prioritize convenience. As we’ve seen in the cloud compute segment, customers are increasingly willing to trade performance and features for speed and ready availability. And like most systems vendors, EMC is not currently built to service this type of demand directly; they will instead have to settle for an arms supplier-type role. Even in software, which is intrinsically simpler to make available than the hardware EMC has traditionally sold, the company keeps assets like ViPR locked up behind registration walls. In a market in which technology decisions are being made based more on what’s available than what’s good, that’s an issue.

The gist of all this, then, in the wake of EMC World is that the company is inarguably adapting to a market that’s rapidly changing around it, but has tough problems to solve in availability and convenience. The loyalty of EMC accounts is absolutely an asset, one that the company will need to rely on as customers make the “third platform” transition moving forward. But the company also needs to remember that technology decision making at those loyal EMC accounts has changed materially, and is increasingly advantaging players like Amazon at the expense of incumbents.

The focus on software engineering, therefore, is appropriate and welcome, but insufficient by itself to address the coming transition. Only a focus on reducing the friction of adoption, and improving developer engagement, can fix that.

Disclosure: EMC is a client, as are Amazon and VMware.

Categories: Business Models, Cloud, Storage.

Don’t Call it a Comeback, or SOA, But Services Are on the Rise

While the term SOA was lost to marketers years ago, the underlying concept may be in the process of making a comeback. Though the term itself has become a bad word outside of the most conservative enterprises and suppliers today, constructing applications from services has clear and obvious benefits. In his instant classic post about his time at Amazon, Google’s Steve Yegge described Amazon’s journey towards an architecture composed of services this way:

So one day Jeff Bezos issued a mandate…His Big Mandate went something along these lines:

1) All teams will henceforth expose their data and functionality through service interfaces.

2) Teams must communicate with each other through these interfaces.

3) There will be no other form of interprocess communication allowed: no direct linking, no direct reads of another team’s data store, no shared-memory model, no back-doors whatsoever. The only communication allowed is via service interface calls over the network.

4) It doesn’t matter what technology they use. HTTP, Corba, Pubsub, custom protocols — doesn’t matter. Bezos doesn’t care.

5) All service interfaces, without exception, must be designed from the ground up to be externalizable. That is to say, the team must plan and design to be able to expose the interface to developers in the outside world. No exceptions.

6) Anyone who doesn’t do this will be fired.

Like Cortez’s soldiers, the Amazon employees got to work if for no other reason than they had no choice. The result, in part, is the Amazon you see today, the same one that effectively owns the market for public cloud services at present. Much as enterprises have historically writen off Adrian Cockcroft’s Netflix lessons with statements like “it only works for ‘Unicorns’ like Netflix,” most have convinced themselves that the level of service-orientation that Amazon achieved is effectively impossible for them to replicate. Which is, to be fair, likely true absent the Damoclean incentive Bezos put in place at Amazon. What’s interesting, however, is that many of those same enterprises are likely headed towards increased levels of abstraction and service-orientation, whether they realize it or not.

The most obvious example of this trend at work is the unfortunately named (Mobile) Back-end-as-a-Service category of providers. From to Firebase to Kinvey to the dozen other providers in the space, one the core value propositions is shortening the application development lifecycle by composing applications from a collection of services. Rather than building identity, location, and similar common services into the application from scratch, BaaS providers supply the necessary libraries to access externally hosted services. Which means that the application output of these providers is intrinsically service-oriented by design.

Elsewhere, in the adjacent Platform-as-a-Service space, providers are essentially advancing the same concept. In building an application on Engine Yard or Heroku, for example, developers are not required to implement their own datastores or caching infrastructure, but rather may leverage them as services – whether that’s Hadoop, MongoDB, memcached, MySQL, PostgreSQL, Redis, or Riak. Even IBM is planning to make the bulk of its software catalog consumable as a service by the end of the year. Which is logical, because the differentiation for PaaS providers is likely to be above the platform itself, as it is in the open source operating system market.

Consider on top of all of the above the existing traction for traditional SaaS offerings, and the reality is that it’s getting harder to build applications that are not dependent in some way upon services. And for those applications that are not yet, vendors are likely to make it increasingly difficult to maintain that independence as they move into services as a hedge against macro-issues with the sales of stand alone software.

There’s a reason, in other words, that micro-services are all the rage at the moment: services are how applications are being built today.

Categories: Services, Software-as-a-Service.

What is a Software Company Today?

As announced yesterday at their inaugural analyst conference, Cloudera – the first commercial backer of the Hadoop project – has secured $160 million in new financing, bringing their total raised capital to $300 million. Because it remains, at least for now, a private company, precise details on their finances remain unavailable. What has been disclosed, however, is that approximately seventy percent of their revenue derives from what the company refers to as “software.” The question that no one seems to be asking today is what the word “software” actually means in this context.

Twenty years ago, the logistics of software businesses were straightforward. Vendors employed developers, though they were systemically undervalued at the time, to collectively author a product that was explicitly designed to be sold to a specific buyer. By the late 1980′s in the case of the enterprise, this was typically the Chief Information Officer. While software was virtual by nature it was typically distributed in physical form, whether that was a floppy disk, a CD-ROM or, later, a DVD. Software businesses, in other words, looked a great deal like traditional manufacturing businesses: they constructed a product, shipped it to buyers who put it to work in their own environments. Often with assistance from the vendor or certified third parties, true, but the moving pieces of the industry owed a great deal to traditional manufacturing.

Today, things have changed. Many enterprise software vendors are still operating as they have for the past few decades, but pressure from competing models is mounting.

The distribution of software, of course, has changed dramatically. Physical media has for years been an anachronism, as it became much more efficient for buyer and seller to leverage digital distribution. But things hardly stopped there: from’s 2004 IPO forward, mainstream customers began to assess more critically the costs and benefits of installing software on premises versus merely consuming it as a service. The company – whose product resembled traditional packaged application (CRM) with the exception of its delivery model – drew a bright line between its business and that of traditional vendors. Salesforce famously campaigned around a message of “No Software.” Its toll-free number, in fact, remains 1-800-NO-SOFTWARE. For some, this idea is comically inaccurate: without software, there is no It is still software, it’s just delivered via a different medium and managed by the vendor rather than the customer. For Salesforce, and presumably some subset of its customer base, however, the belief is that the distinction between SaaS and the traditional packaged software shipped to, installed on and managed from a customer’s premises is significant enough to render it a completely new product. A product, therefore, that should not be referred to as “software.”

Whatever one makes of the semantics of this argument, changes in the nature of software availability, delivery and procurement are clearly impacting markets and the incumbents who previously dominated them.

While Oracle may attribute its adjusted earnings miss this quarter to currency fluctuations, or its shortfall two quarters ago to a “lack of urgency” in its saleforce, the reality is that the trend line for its sales of new licenses has been problematic for well over a decade. Notably, this declining ability to sell new licenses of its software overlaps with rising adoption of open source software and SaaS packages, among other competitive models. This is no coincidence. While the underlying business and revenue models for open source and SaaS differ, they share one common advantage over traditional software distribution models such as Oracle: they are far easier to acquire.

Microsoft, another firm built largely on traditional software distribution and acquisition models, has similarly struggled to compete with more available alternatives. The company has telegraphed its level of concern with the viability of its traditional models moving forward with its massive investments in more available infrastructure (Azure), and this is appropriate. On the consumer front, Microsoft has seen Apple’s operating system distribution and pricing model shift from physical media priced at $200 to a free download. Within the server market, meanwhile, Microsoft’s biggest challenge of late has been competing in the rapidly growing (and inherently convenient) public cloud market where operating system licensing fees are near zero in most cases.

Microsoft and Oracle collectively generated billions of dollars of wealth according to the simple model described above: they manufactured software, shipped it to customers who were ultimately responsible for its installation, implementation, maintenance and usage. As barriers to hardware and software both have broken down, due to technical approaches such as open source, the public cloud or SaaS, the traditional model became increasingly subject to disruption. Adaptations from both have included the incorporation of the very models they were disrupted by: open source, public cloud and SaaS.

Absent substantial pressure, it’s unlikely that either of these businesses would have strayed from the courses that made them and their shareholders very wealthy. But selling software in the traditional manner became, and is becoming, more difficult to do.

Which brings us to the definitional question of what it means to be a “software” business. If companies as successful and well capitalized as Microsoft and Oracle are struggling with disruptions to traditional software distribution models, it would seem important for every software vendor to consider carefully what it means when it defines itself as a “software” vendor.

Given the market’s clear and accelerating preference for software models built on convenience – be that cloud, open source, SaaS or otherwise – it is useful for vendors and buyers alike to consider where a given product falls on a spectrum of customer effort.

  • At one end we have traditional software players, for whom procurement begins with on-site sales visits, nothing is free and access to the software is jealously guarded. From a customer perspective, this is a high effort model: the procurement process is time intensive, the costs of the software are high, and the customer typically bears the risk of implementation because they are paying for the software on an up front basis.

  • At the other end of the spectrum lies SaaS, which is accessible to anyone with a browser and payable via a credit card. The customer effort metric for SaaS is low: the software is comparatively easily acquired, so procurement is less involved, and implementation risk and effort is largely shifted to the vendor – for a longer term premium, of course.

  • Somewhere in between these models, but closer to the SaaS end of the spectrum, lie both open source software and public cloud services. Both are substantially advantaged versus traditional software and infrastructure in acquisition. Procurement is so low effort for customers, in fact, that both open source and public cloud services are frequently leveraged within organizations without technical leadership being aware of that fact. Because they do not represented a finished product, however, more implementation effort is required versus SaaS. Customer evaluation for public cloud and open source relative to SaaS – or PaaS, for that matter – is typically an evaluation of the tradeoffs between convenience and control.

Directionally, while there are obvious exceptions, from a macro perspective the market is actively shifting away from the traditional model towards the latter two examples. Which implies that vendor strategies must adapt to this changing reality; vendors whose model allows only for traditional models of software distribution and consumption will be at a significant disadvantage moving forward. Many, in fact, are already alarmingly behind. If a given software vendor isn’t at least considering strategic shifts in consumption and delivery of software in its market, it has a serious problem.

In a world in which the only option for customers is to purchase software up front and assemble and leverage it at their own risk, traditional software sales would remain robust, because software is a basic necessity. In today’s market context, however, where customers have a wealth of options, from purchasing, installing and running their own software stack to fully outsourcing same to constructing hybrid applications composed of micro-services (i.e. managed software exposed as an API), software vendors must cater to customer’s desires for convenience and effort minimization.

Even smaller software vendors must actively plan for a future in which they are not merely handing off a software product to customers and hoping for the best, but actively delivering it over a network, managing and monitoring it on behalf of customers in a public cloud, integrating data into the product and so on. This poses immense operational challenges, of course, as most pure software vendors are not appropriately resourced to deliver their software in a network context, for example. But if they don’t, they can be sure competitors will.

Vendors, whether that’s Cloudera or Pivotal as featured in the quote above, will continue to point to “software” as their primary revenue source. But the reality is that when successful companies say “software” they will actually mean software plus some combination of public cloud infrastructure, hardware/appliance, automated management/monitoring capabilities, hosted micro-services, and data enabled analytics. The majority of which is software, of course. Just not strictly software as we have been conditioned to think of it.

Which is why in a growing number of cases, the term “software company” may become as obsolete as the media they once distributed the product on.

Disclosure: Cloudera, Pivotal and Salesforce are current RedMonk customers. Microsoft has been a customer but is not currently, and neither Apple nor Oracle is a RedMonk customer.

Categories: Business Models, Cloud, Software-as-a-Service.

What Does the WhatsApp Acquisition Mean?

Initial reactions to Facebook’s acquisition of WhatsApp primarily centered on price. Which is understandable, given the valuation. Here is an incomplete list of 25 technology companies the market values less than Facebook values WhatsApp.

  1. Sandisk
  2. Broadcom
  3. Sony
  4. Workday
  5. Seagate
  6. Kyocera
  7. Analog Devices
  8. Computer Associates
  9. Dassault Systemes
  10. Symantec
  11. Activision Blizzard
  12. Netapp
  13. Yandex
  14. Autodesk
  15. Citrix
  16. Red Hat
  17. Akamai
  18. Nvidia
  19. Splunk
  20. Equinix
  21. Electronic Arts
  22. Level 3
  23. F5 Networks
  24. Teradata
  25. Pandora

On the one hand, the service boasts over 450 million users – 200 million or so more than Twitter. And the deal math essentially values each user at approximately $42 which is, after adjusting for inflation, less than what AOL paid per ICQ user ($49) in 1998. It’s also considerably less than Twitter’s per user valuation, which is around $127, according to Friday’s market prices.

On the other hand, unlike Twitter, WhatsApp has been distinctly reluctant to monetize users via advertising. Messaging is also a highly competitive market, one in which users have a wide variety of credible alternatives – which we’ll come back to. And while WhatsApp is technically a paid application, it’s free for the first year and the maximum revenue per user under the current model is $0.99 annually. As a side note on that subject, it’s worth noting that based on growth charts it would appear that 200+ million of WhatsApp’s users have signed up in the last year, meaning that they have not yet been forced to make a choice whether to purchase the app or turn to free alternatives. All of which suggests that the company was not acquired because of its revenue generation potential.

Instead, as most of the subsequent analyses have acknowledged, WhatsApp was presumably acquired for strategic reasons. One rationale sees WhatsApp as a hedge against perceived defections from and stagnation within Facebook’s core platform – primarily amongst younger users. Another argues WhatsApp represents a strategic bid to assist Facebook in the zero sum battle for a user’s overall attention. A third is to tap one of the largest sources of messaging traffic in the world for data mining and analysis in order to deliver more effective advertising results for one or both platforms. There are 19 billion reasons to believe, however, that the incentive here was a combination of all of the above, as well as a wide number of other factors, meaning that the impetus for the deal is not likely to be found in a spreadsheet. Which in turn is why the deal seems perfectly defensible to some and absolutely baffling to others.

If we set aside the numbers, however entertaining debating them might be, it’s reasonable to acknowledge that messaging is, by itself, a fundamentally important channel. It has become, almost by accident, the default communication mechanism for individuals all over the world. Regional dynamics – specifically lower cost SMS services and fewer network boundaries in the US – have led to asymmetrical adoption of SMS-competitors such as WhatsApp from country to country, but the overall numbers are inarguable: messaging is an enormous, and therefore strategic, market. Which helps explain why the likes of Facebook and Google have been courting players like Snapchat and WhatsApp.

Strategic though the market may be, however, the dynamics of messaging make it an odd market to parse. Consider the following characteristics:


Messaging networks are, by and large, private. Snapchat’s visibility, in fact, is in part based on the self-destructing nature of its messages. Recent trends, in fact, suggest growth not only in private communications but anonymous ones such as Secret or Whisper. By contrast, Facebook and Twitter, for all that they have been brought in to discussions of messaging in the wake of the WhatsApp acquisition, are fundamentally different channels. They are typically public by default; Facebook has periodically irked users by exposing publicly content they wished to keep private. This distinction is particularly important for advertising business models. Layering advertisements into conversations one has with the public are disruptive but generally accepted. Injecting them into private conversations, many of which are one to one, is highly problematic.


While Facebook, Twitter and other social media sites are not exactly the Library of Congress in terms of their longevity, relative to WhatsApp and other messaging devices they may as well be cuneiform tablets. Twitter has recently made available a user’s entire history via an archive, and Facebook’s Timeline feature is an attempt to give the service relevance over longer periods of time. Messaging services, on the other hand, are typically for ephemeral, and thus largely disposable, content. No one would argue that these temporal limitations make messaging irrelevant; but it is difficult to compare throwaway messages to friends and family with to the more persistent timelines of other social media services. Distinguishing between degrees of transience may seem trivial, but it is in all likelihood the reason there are multiple competitive messaging services against comparatively fewer social networking alternatives. It’s simple for users to manage multiple messaging services for different groups of friends, where similarly fragmented usage would destroy the utility of a social network.

Network Effects

Questions of network effect are likewise important. Much has been made of the importance of the “address book,” the idea being that unlike desktop or web based applications, smartphone apps are much more likely to have access to up-to-date contact information, and therefore the most important network to a given individual. But while this advantages messaging apps over desktop and web rivals, it also means that smartphone-based alternatives have a near equal playing field. None of WhatsApp, LINE, Snapchat and so on have preferential access to my contacts, meaning that the switching costs between one service and another – while not non-existent – are theoretically marginal. With even a moderate group consensus, it’s possible to immediately switch from one messaging app to another – as indeed is occurring already in some quarters. When WhatsApp had three hours of downtime this weekend, the free alternative Telegram saw 5 million new registrants.


The choice of WhatsApp for Facebook is a bid for user volume. But what about the technology? Much has been made of WhatsApp’s impressively low employee count (55) given both the valuation and message volume handled, and understandably so. It’s interesting to note, however, that WhatsApp lacks some of the platform features common to competitors like LINE and WeChat, which has impacted WhatsApp’s popularity in multiple Asian geographies. Even compared to lesser known alternatives such as Telegram, it lacks basic features. Telegram, for example, is accessible from multiple devices including a desktop; WhatsApp, by comparison, is tied to a single device – and architected for same. The coming addition of voice calls, meanwhile, is a potentially interesting differentiator but the appeal to its messaging-centric audience is uncertain. There’s a reason, after all, that voice usage has consistently declined globally in favor of messaging systems, and it’s not strictly price.


The most curious thing about analyses of the WhatsApp deal, however, is that they almost universally fail to mention iMessage as a competitor. If you search this piece by Ben Evans or this piece by Ben Thompson for example – both of which are recommended – neither includes iMessage as potential competition for WhatsApp. Which is technically understandable: iMessage is (at present, at least) limited to iOS, and is only really optimal in group messaging if the entire group is using iOS devices. But in terms of service potential, iMessage has a few key advantages over competition such as WhatsApp.

  1. Rather than make a clean break from SMS systems, it embraces (and extinguishes?) them.
  2. It is embedded in the operating system: no downloads are necessary.
  3. It is accessible using desktop clients.
  4. Every registered Apple user is a de facto user.

The network effects – and potential lock-in – to iMessage is something many non-iOS users have probably experienced. Either through limitations of the service (for group messaging, most obviously), or for deliberate purpose (limited SMS packages, for example), people carrying competitive devices may actively be excluded from conversations amongst iOS users.

The Gist

In sum, then, we have a market that is widely accepted as strategically important, but with a higher degree of efficiency than other markets such as social media. Users can and will engage and leverage multiple messaging services, with few able to manage the kind of lock-in characteristic of sites like Facebook or Twitter. For this reason, large investments in the space, while of theoretically high upside because of their effortless acquisition of new users, are likely to be comparatively high risk.

The fact that Facebook has publicly committed to running WhatsApp as a separate business may be a function of this risk. One advantage older, incumbent technology providers have had over younger rivals in recent years was their understanding that even mergers that make sense on paper are difficult to execute properly, and more likely than not to fail. This, more than the costs involved, is why large scale acquisitions in the technology industry are comparatively rare. It is intrinsically difficult to merge businesses. One wonders if WhatsApp being maintained as a stand alone business, or Google’s recent decision to maintain Nest as a separate entity following the $3.2B acquisition, is a sign that the technology upstarts are learning from the likes of EMC and VMware.

Speaking of Google, meanwhile, what will be most interesting to observe in the wake of WhatsApp’s acquisition is how the search giant responds. While not commonly credited with such, Apple already posseses a competitor in iMessage. Google, curiously enough, does not: while its Android Hangouts app handles both its instant messaging and traditional SMS, the two channels are kept entirely separate.

This is problematic for Google moving forward. For Apple, success with iMessage (Apple does not disclose subscriber numbers) is welcome, but non-essential. At its core, Apple’s value proposition to users is the devices it sells (a lesson Blackberry failed to learn, BBM or no). Google, on the other hand, depends as much on a given user’s attention as does Facebook. The more time users spend with the latter – or wholly owned subsidiaries like WhatsApp – the less time available for interaction with Google properties. Which explains Google’s persistent interest in messaging services. But not why they feel compelled to pay the kinds of outsized premiums for inorganic growth. One imagines that if Google, who is held up as Apple’s superior when it comes to services, could deliver something similar to iMessage in Android, it would be in a position to deliver at least comparable numbers to WhatsApp competitors based on the number of Android devices activated per day. And if it made this service available cross-platform on iOS in particular, as it has with its other Google services (which admittedly may not be technically feasible), it would represent a very formidable competitor indeed. Google’s lack of direct attention to the messaging opportunity has been, frankly, perplexing.

$19 billion dollars is a hell of a shot across the bow, though.

Categories: Mobile.

Quarterly Language Performance on GitHub 2011 – 2013

Because the RedMonk Programming Language Rankings have tended to be fairly stable over time, one of the more common questions we get following a release concerns community volatility. More specifically, many are curious if the individual data sources themselves – GitHub and Stack Overflow – tend to be less constant over time than the correlation of same. To explore this question, we examined the GitHub Archive using a simple query fetching the number of repositories created per programming language per quarter beginning in 2011. Per the GitHub Archive, their data only goes back as far as February 12, 2011, so Q1 of that year here is short data for a little over a month’s worth of activity. As with the Programming Language Rankings, this excludes forks. And in an effort to make the data more accessible, this analysis focuses on a subset of the list, the Top 10 programming languages by our rankings. The findings are interesting, and seem to raise as many questions as they answer.

First, consider the following chart of repositories created on GitHub per quarter for each of our top ten programming languages.

(click to embiggen the chart)

The dramatic growth in late 2011 and early 2012 is not particularly surprising. Less predictable, however, was the three consecutive quarters of decline (Q312 to Q113) for the surveyed languages. To be clear, this is a decline in newly created repositories per quarter within our ten language sample only: the chart does not suggest a decline in overall activity on the platform. Still, given the significance of these ten languages to GitHub and the seemingly corrective nature of the dip, the company’s $100M round early in the third quarter of 2012 appears well timed.

In terms of specific findings per language, this chart on the one hand validates commonly held assumptions about GitHub’s language traction: that Java, JavaScript and Ruby – and to a lesser extent PHP and Python – are the most popular languages as measured by repository. On the other, the spike in C in Q2 of 2012 is notable, as is the sudden emergence of CSS in the second half of last year that has already been documented in detail by my colleague.

The major question, however, remains the aforementioned surge. Further mining of the dataset is needed to try and ascertain a cause, but the good news for GitHub is that even absent a surge growth as measured by repository creation appears to be healthy. Unless valuations, then, were built on an assumption of Q312-Q113 growth rates, the impact of this anomalous spike should be minimal. It will also be interesting to assess growth rates outside of the subset here; was the decline in this sample offset by volume growth in other, less popular languages on the platform?

To provide a clearer picture of how these languages have performed in repository creation relative to one another, the following motion chart is made available. The data pictured is the ranking of each language in terms of repository creation, measured quarterly from 2011 through 2013. The motion charts provide three different lenses by which this data can be viewed, not to mention subsetted, over time. Click the play button in the bottom left hand corner to advance the dataset over time, and the majority of the visualization is interactive and clickable.

An exploration of this data will yield some interesting findings: for the first quarter surveyed, Java and PHP saw more repositories created than Ruby and JavaScript. Motivated parties will doubtless find other curious rankings over the 12 quarter sample.

In the near future we’ll explore the history of the other axis of our rankings, Stack Overflow, for this same sample set of languages to assess the relative differences in trajectories between the two communities.

Categories: Programming Languages.