Blogs

RedMonk

Skip to content

What’s Popular on Hacker News: From the Cloud to NoSQL

When we founded RedMonk in 2002, we made a conscious decision to focus on qualitative analysis at the expense of quantitative research for the simple reason that we didn’t believe there was representative data available about our core constituency, developers. Traditionally, analyst firms had worked backwards from observable metrics such as server shipments and license revenue estimations. While these numbers were effective for measuring the performance of commercial suppliers, however, they were entirely unable to assess the performance of non-commercial alternatives. The growth of free software was largely opaque to quantitative analysis, for example, visible only in the corrosive effects it had on commercial software revenue.

Over the past few years, however, we’ve begun to gradually introduce quantitative analysis into our portfolio – culminating in the fall launch of our RedMonk Analytics product. We’ve begun incorporating numbers because we believe that, for the first time, we have access to quality data from which we can reasonably infer developer behaviors. Some of that data is generated in house: this is the initial basis for RedMonk Analytics, although our system is rapidly incorporating third party data.

But there are many sources for relevant developer related data today. One such is the Hacker News dataset collected by Ronnie Roller, creator of iHackerNews.com. Consisting of 1.7M entries from the site, the dataset is an interesting snapshot of developer commentary and interests.

Our first pass through the data in November looked at programming language popularity. Since then, we have been continuing to crawl the dataset regarding other topics. This dataset is interesting not because it is representative of developers as a whole, but rather because it’s a community of technologists who are collectively ahead of the curve.

DVCS

Consider the following data we derived back in November from Ohloh regarding usage of version control systems, for example.

Repository Share

Subversion dominates, clearly. As do centralized repositories, generally.

Repository Type Share

On Hacker News, however, the data reflects a different distribution. Even given the caveat that this data reflects mentions rather than observed instantiations, we find the trends illuminating. Here, for example, is a chart of DVCS options:

DVCS Mentions on Hacker News

Note the reversal of the observed trend; Git dominates Subversion, rather than vice versa. Similarly, the observed preference on Ohloh for centralized repositories over decentralized alternatives inverts.

Repository Type Mentions on Hacker News

Again, the Hacker News reflects the discussion of technologies rather than actual implementations. But given that each of the technologies is freely available, it would be a mistake to conclude that the distribution of mentions has no relationship to actual adoption.

Vendors

One of the other interesting queries was for vendor names. Because they may appear in a variety of contexts, this graph is more for curiosity’s sake than actual analysis.

Vendor Mentions on Hacker News

Like Oracle, Microsoft’s performance in the above is likely something of an artifact because of its often controversial standing with developers, but its showing is nonetheless impressive. Other surprises were that the underperformance of VMware relative to its peers and the better than average visibility of Cisco.

Frameworks

One of the requests on Hacker News was for a look at the distribution of framework mentions. Here’s the data:

Language Framework Mentions on Hacker News

The dominance of Rails is unsurprising, as is Node.js’s strong showing – we’d expect nothing less from Node given our own internal metrics. I was mildly surprised by Grails’ poor numbers; Zend Framework’s result is likely a byproduct of the two name structure.

Operating Systems

Operating System Mentions on Hacker News

Operating systems, meanwhile, were another mixed bag. Windows, as ever, dominated, but Ubuntu’s outperform was a mild surprise, if only because the perception exists that Hacker News can be CentOS centric. That may be true, but the data certainly doesn’t reflect it. In case you’re curious, SUSE’s position relative to Red Hat was not influenced by discussion of the Attachmate acquisition [coverage]: the dataset predates that.

NoSQL

As with distributed version control, NoSQL is a subject that typically finds a welcome audience within the Hacker News community. While conservative enterprises may express little appetite for non-relational tools, developers have been far more pragmatic. Crawling their comments on the subject, we find the following distribution of mentions by datastore.

Non-Relational Store Mentions on Hacker News

No real surprises. Mongo is slightly more popular than I would have expected, Hadoop slightly less, but the balance of the data is consistent with our experiences in the marketplace.

Cloud Providers, or: Just How Popular is Amazon?

Very. Even heavily discounting the number of mentions of Amazon as references to its retail businesses rather than its cloud computing stack, Amazon is dominant. Also notable is Heroku’s performance: as above, this dataset predates the acquisition by Salesforce, so the frequency here is unrelated to that event.

Cloud Provider Mentions on Hacker News

Thus concludes this round of Hacker News analytics. If you have questions you’d like to see answered in future, leave a comment or drop me a note. If you’re a RedMonk client, your available hours can be used for custom, on demand crawls of this data. Contact us for details.

Update: By request, we have added Gentoo to the list of operating systems surveyed, Neo4J and FlockDB to the non-relational stores graph, and Joyent to the cloud providers surveyed.

Disclosure: Adobe, Apache (Cassandra, Hadoop, etc), Basho (Riak), Canonical (Ubuntu), Cisco, IBM, Membase, Microsoft (Azure, Windows, etc), Red Hat (Fedora, Makara, RHEL, etc), Salesforce.com (Force.com) and Zend (Zend Framework) are RedMonk clients while Amazon (AWS), Engine yard, Google (GAE), HP, Oracle and VMware are not currently.

Categories: AltDB, Cloud, Databases, Operating Systems, Version Control.

Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

  • http://dberkholz.wordpress.com/ Donnie Berkholz

    I bet the rates of change of VCS adoption on Ohloh would make an interesting graph. Or go a step further and look at accelerating or decelerating adoption… the 1st and 2nd derivatives should be easy enough to calculate if you’ve got time-based data. You might also have the data to actually look at lead times of discussion on Hacker News to actual use on Ohloh.

    I’m curious about other Linux distros — obviously Gentoo, but I’ve also been hearing a lot about Mint lately.

    On a minor note, I have a couple of suggestions to improve how the data are visualized: (1) you might want to consider using bar graphs instead of pie graphs because relative ratios are really hard to compare on pies; (2) I would probably sort by size instead of alphabetically, but this depends on whether people look for “their” language/framework or care more about the overall most popular ones.

  • http://redmonk.com/sogrady sogrady

    @Donnie Berkholz: concur. one of things i’ll be looking at over time is historical trends, both in these datasets and others. snapshots are interesting, but the temporal element gives it another dimension entirely.

    i’ll try and add Gentoo and Mint tomorrow.

    as for the graphs, i’m torn. i concur that basic histograms are more effective, but i decided to employ the pie charts beyond the DVCS data (where i think that’s useful) simply for variety’s sake. a bunch of graphs that look the same, i thought, might be a bit boring.

    but perhaps you’re right. either way, appreciate the feedback.

  • http://www.reala.net/ Robin

    How does OSX fare with OS mentions? Seems like an obvious candidate to include.

  • http://dberkholz.wordpress.com/ Donnie Berkholz

    By the way, could you please find some way to re-enable per-post comment subscription? This is a killer feature and the lack of it makes it really hard to have a real discussion on here.

  • http://ktschmidt.blogspot.com Kevin Schmidt

    Was Java EE/J2EE not significant among the framework mentions or was it not included?

  • http://redmonk.com/sogrady sogrady

    @Donnie Berkholz: i’ll see what i can do.

  • http://redmonk.com/sogrady sogrady

    @Kevin Schmidt: J2EE is ~387 mentions, or roughly comparable with Grails in other words.

    it’s not practical to search for EE, meanwhile, b/c it’s a conventional term with other meanings (e.g. Enterprise Edition).

  • http://redmonk.com/sogrady sogrady

    @Robin: the searches here were focused on technologies also usable in server contexts, which is the reason OS X wasn’t included.

  • http://www.whatsthebeef.org whatsthebeef

    Great post, excellent angle on data analysis.

    Suprised by absense of bigtable in NoSQL, although app engines popularity is reflected in Cloud provider graph. Also no perforce in VCSs

    • Jeff Thompson

      Was perforce really not discussed? Its pretty dominant in the games industry, though i cant speak to elsewhere.

  • http://blog.zawodny.com/ Jeremy Zawodny

    Very interestig! Thanks for posting.

  • http://redmonk.com/sogrady sogrady

    @whatsthebeef / @Jeff Thompson: Perforce is at 251 mentions, FYI, which leaves it in last place.

  • http://dberkholz.wordpress.com/ Donnie Berkholz

    I see Gentoo showed up, thanks! Interesting that it’s equal to CentOS. Perhaps that means the hype’s gone away, and we’ve finally got real users instead of trend followers (currently with Ubuntu?).

  • Pingback: Open Sources » Turning popularity into cash

  • Pingback: 2010, the year of Ubuntu « rand($thoughts);

  • Pingback: Hoe houdbaar is het succes van Ubuntu? |

  • Pingback: Hoe houdbaar is het succes van Ubuntu? | Talk About IT

  • Pingback: Rethinking Ruby’s role in the cloud « rand($thoughts);

  • Pingback: ehcache.net

  • Pingback: Ubuntu, le gagnant discret du Cloud Computing « Ippon Technologies