When we founded RedMonk in 2002, we made a conscious decision to focus on qualitative analysis at the expense of quantitative research for the simple reason that we didn’t believe there was representative data available about our core constituency, developers. Traditionally, analyst firms had worked backwards from observable metrics such as server shipments and license revenue estimations. While these numbers were effective for measuring the performance of commercial suppliers, however, they were entirely unable to assess the performance of non-commercial alternatives. The growth of free software was largely opaque to quantitative analysis, for example, visible only in the corrosive effects it had on commercial software revenue.
Over the past few years, however, we’ve begun to gradually introduce quantitative analysis into our portfolio – culminating in the fall launch of our RedMonk Analytics product. We’ve begun incorporating numbers because we believe that, for the first time, we have access to quality data from which we can reasonably infer developer behaviors. Some of that data is generated in house: this is the initial basis for RedMonk Analytics, although our system is rapidly incorporating third party data.
But there are many sources for relevant developer related data today. One such is the Hacker News dataset collected by Ronnie Roller, creator of iHackerNews.com. Consisting of 1.7M entries from the site, the dataset is an interesting snapshot of developer commentary and interests.
Our first pass through the data in November looked at programming language popularity. Since then, we have been continuing to crawl the dataset regarding other topics. This dataset is interesting not because it is representative of developers as a whole, but rather because it’s a community of technologists who are collectively ahead of the curve.
DVCS
Consider the following data we derived back in November from Ohloh regarding usage of version control systems, for example.
Subversion dominates, clearly. As do centralized repositories, generally.
On Hacker News, however, the data reflects a different distribution. Even given the caveat that this data reflects mentions rather than observed instantiations, we find the trends illuminating. Here, for example, is a chart of DVCS options:
Note the reversal of the observed trend; Git dominates Subversion, rather than vice versa. Similarly, the observed preference on Ohloh for centralized repositories over decentralized alternatives inverts.
Again, the Hacker News reflects the discussion of technologies rather than actual implementations. But given that each of the technologies is freely available, it would be a mistake to conclude that the distribution of mentions has no relationship to actual adoption.
Vendors
One of the other interesting queries was for vendor names. Because they may appear in a variety of contexts, this graph is more for curiosity’s sake than actual analysis.
Like Oracle, Microsoft’s performance in the above is likely something of an artifact because of its often controversial standing with developers, but its showing is nonetheless impressive. Other surprises were that the underperformance of VMware relative to its peers and the better than average visibility of Cisco.
Frameworks
One of the requests on Hacker News was for a look at the distribution of framework mentions. Here’s the data:
The dominance of Rails is unsurprising, as is Node.js’s strong showing – we’d expect nothing less from Node given our own internal metrics. I was mildly surprised by Grails’ poor numbers; Zend Framework’s result is likely a byproduct of the two name structure.
Operating Systems
Operating systems, meanwhile, were another mixed bag. Windows, as ever, dominated, but Ubuntu’s outperform was a mild surprise, if only because the perception exists that Hacker News can be CentOS centric. That may be true, but the data certainly doesn’t reflect it. In case you’re curious, SUSE’s position relative to Red Hat was not influenced by discussion of the Attachmate acquisition [coverage]: the dataset predates that.
NoSQL
As with distributed version control, NoSQL is a subject that typically finds a welcome audience within the Hacker News community. While conservative enterprises may express little appetite for non-relational tools, developers have been far more pragmatic. Crawling their comments on the subject, we find the following distribution of mentions by datastore.
No real surprises. Mongo is slightly more popular than I would have expected, Hadoop slightly less, but the balance of the data is consistent with our experiences in the marketplace.
Cloud Providers, or: Just How Popular is Amazon?
Very. Even heavily discounting the number of mentions of Amazon as references to its retail businesses rather than its cloud computing stack, Amazon is dominant. Also notable is Heroku’s performance: as above, this dataset predates the acquisition by Salesforce, so the frequency here is unrelated to that event.
Thus concludes this round of Hacker News analytics. If you have questions you’d like to see answered in future, leave a comment or drop me a note. If you’re a RedMonk client, your available hours can be used for custom, on demand crawls of this data. Contact us for details.
Update: By request, we have added Gentoo to the list of operating systems surveyed, Neo4J and FlockDB to the non-relational stores graph, and Joyent to the cloud providers surveyed.
Disclosure: Adobe, Apache (Cassandra, Hadoop, etc), Basho (Riak), Canonical (Ubuntu), Cisco, IBM, Membase, Microsoft (Azure, Windows, etc), Red Hat (Fedora, Makara, RHEL, etc), Salesforce.com (Force.com) and Zend (Zend Framework) are RedMonk clients while Amazon (AWS), Engine yard, Google (GAE), HP, Oracle and VMware are not currently.