As long as we have been doing our programming language rankings here at RedMonk, dating back to the original publication by Drew Conway and John Myles White, we have been trying to find the correct timing. Should it be monthly? Quarterly? Annually? While the appetite for up to date numbers is strong, the truth is that historically changes from snapshot to snapshot have been minimal. This is in part the justification for the shift from quarterly to bi-annual rankings. Although we snapshot the data approximately monthly, there is little perceived benefit to cranking out essentially the same numbers month after month. There are more volatile ranking systems that reflect more ephemeral, day-to-day metrics, but how much more or less popular can a programming language realistically become in a month, or even two? The aspect of these rankings that most interests us is the trajectories they may record: which languages are trending up? Which are in decline? Given that and the adoption curve for languages in general, the most reliable approach would seem to be one that measures performance over multi-month periods at a minimum.
Previously, GitHub’s Explore page ranked their top programming languages – theoretically by repository – and we simply leveraged those rankings in our plot. For reasons that are not clear, this provided ranking has been retired by GitHub and is thus no longer available for our rankings. Instead, this plot attempts to duplicate those rankings by querying the GitHub Archive on Google’s BigQuery. We select and count repository languages, excluding forks, for the Top 100 languages on GitHub. Without knowing precisely how GitHub produced their own rankings, however, we can’t be sure we’re duplicating their methods exactly. And there is some evidence to suggest that the new method is an imperfect replica. Previous iterations have produced correlations between GitHub’s rankings and Stack Overflow’s as high as .82 but never one lower than .78. This quarter’s iteration is the lowest yet at .75. It’s possible, of course, that this is reflective of nothing more than a natural divergence between the two communities. But it’s equally possible that our new method is slightly different, and therefore producing slightly distinct results, than in previous iterations. Until and unless GitHub decides to resume publishing of their own rankings, however, this is the best method available to us. This must be kept in mind when comparing these results against previous iterations.
Besides that notable caveat, there are a few others to reiterate here before we get to the plot and rankings.
- To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.
- No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.
- There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.
- All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinuishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.
- In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top 20 to 30 languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.
With that, here is the first quarter plot for 2014.
(embiggen the chart by clicking on it)
Because the plot doesn’t lend itself well to understanding precisely how languages are performing relative to one another, we also produce the following list of the Top 20 languages by combined ranking. The change in rank from our last snapshot is in parentheses.
- Java (-1)
- C# (+2)
- Python (-1)
- C++ (+1)
- Ruby (-2)
- CSS (new)
- Shell (-2)
- Scala (-1)
- R (1)
- Matlab (+3)
- Clojure (+5)
- CoffeeScript (-1)
- Visual Basic (+1)
- Groovy (-2)
A few observations of larger trends:
- Gains for C++/C# / Losses for Python/Ruby: It’s tough to say which was more odd from the result set: the slight gains from the compiled languages or the slight declines from the interpreted alternatives. To be clear, it’s dangerous to read much into the wider popularity of any of these runtimes based on these results. Ohloh, for one, does not concur with the trajectories implied.
But they do represent a change at least within this result set – which has been relatively static. There are some who are – anecdotally, at least – arguing that a C++ renaissance is underway. Until we see more hard data, it’s probably safest to chalk the small change in fortunes here up to statistical noise, but we’ll be watching compiled language trends closely and looking to test the hypothesis wherever possible.
- Clojure Makes the Top 20: For the first time since we began surveying, Clojure joins its JVM-based counterpart Scala as a Top 20 language. It is the continuing success not only of Java the language but JVM-based alternatives that makes the regular “Java is dead” arguments so baffling.
- Statistical Language Popularity: Both R and Matlab experienced gains this quarter, and this was the third consecutive quarter of growth for R in particular. While, as the plot indicates, these languages tend to outperform on Stack Overflow relative to GitHub, they are indicative of a continued rise in popularity for statistical analysis languages more broadly.
- The Rise of Go: Go, which we termed a notable performer in last year’s Q1 ranking, continued its rise. It checked in just outside the Top 20 at 22 this quarter, a gain of six spots from last quarter.
- Languages to Watch: In the intial run of the data for this quarter, Julia, Rust and Elixir finished back to back to back. After making a correction to the GitHub Archive query and re-running the data, they finished Julia, Rust and then Elixir one spot removed from Rust. Regardless, while these are not going to challenge for Top 20 rankings within the near future (Julia performs best at 62), they are each languages to watch, with notable followers and contributors. We’ll keep an eye on each as we move along.
Big picture, the takeaway from the rankings is that language diversity is the new norm. The Top 20 continues to evidence strong diversity in domain, and even non-general purpose languages like Matlab and R are borderline mainstream from a visibility perspective. Expect this to continue, with specialized tools being heavily leveraged alongside of general purpose alternatives, rather than being eliminated by same.