Because the RedMonk Programming Language Rankings have tended to be fairly stable over time, one of the more common questions we get following a release concerns community volatility. More specifically, many are curious if the individual data sources themselves – GitHub and Stack Overflow – tend to be less constant over time than the correlation of same. To explore this question, we examined the GitHub Archive using a simple query fetching the number of repositories created per programming language per quarter beginning in 2011. Per the GitHub Archive, their data only goes back as far as February 12, 2011, so Q1 of that year here is short data for a little over a month’s worth of activity. As with the Programming Language Rankings, this excludes forks. And in an effort to make the data more accessible, this analysis focuses on a subset of the list, the Top 10 programming languages by our rankings. The findings are interesting, and seem to raise as many questions as they answer.
First, consider the following chart of repositories created on GitHub per quarter for each of our top ten programming languages.
(click to embiggen the chart)
The dramatic growth in late 2011 and early 2012 is not particularly surprising. Less predictable, however, was the three consecutive quarters of decline (Q312 to Q113) for the surveyed languages. To be clear, this is a decline in newly created repositories per quarter within our ten language sample only: the chart does not suggest a decline in overall activity on the platform. Still, given the significance of these ten languages to GitHub and the seemingly corrective nature of the dip, the company’s $100M round early in the third quarter of 2012 appears well timed.
The major question, however, remains the aforementioned surge. Further mining of the dataset is needed to try and ascertain a cause, but the good news for GitHub is that even absent a surge growth as measured by repository creation appears to be healthy. Unless valuations, then, were built on an assumption of Q312-Q113 growth rates, the impact of this anomalous spike should be minimal. It will also be interesting to assess growth rates outside of the subset here; was the decline in this sample offset by volume growth in other, less popular languages on the platform?
To provide a clearer picture of how these languages have performed in repository creation relative to one another, the following motion chart is made available. The data pictured is the ranking of each language in terms of repository creation, measured quarterly from 2011 through 2013. The motion charts provide three different lenses by which this data can be viewed, not to mention subsetted, over time. Click the play button in the bottom left hand corner to advance the dataset over time, and the majority of the visualization is interactive and clickable.
In the near future we’ll explore the history of the other axis of our rankings, Stack Overflow, for this same sample set of languages to assess the relative differences in trajectories between the two communities.