{"id":4009,"date":"2010-11-23T18:08:41","date_gmt":"2010-11-23T22:08:41","guid":{"rendered":"http:\/\/redmonk.com\/sogrady\/?p=4009"},"modified":"2010-11-23T18:08:41","modified_gmt":"2010-11-23T22:08:41","slug":"the-languages-of-hacker-news","status":"publish","type":"post","link":"https:\/\/redmonk.com\/sogrady\/2010\/11\/23\/the-languages-of-hacker-news\/","title":{"rendered":"The Languages of Hacker News"},"content":{"rendered":"<p>Two weeks ago, <a href=\"http:\/\/ihackernews.com\/\">iHackerNews.com<\/a> (a creation of <a href=\"http:\/\/ronnieroller.com\/\">Ronnie Roller<\/a>) made available a Hacker News related dataset via bittorrent. For those of you unfamiliar with Hacker News, it&#8217;s a portal for developers to discuss relevant items of interest run by Paul Graham&#8217;s Y Combinator. Hacker News may not be considered representative of developers broadly, but it is generally well trafficked by alpha geek types, and thus conclusions drawn from it have predictive value. <\/p>\n<p>The dataset that was made available included a little better than 1.7M items from Hacker News in a basic XML structure. Believing that this represented collective wisdom of a sort, I collected the set shortly after it was made available, which proved to be shortly before it was taken down. <\/p>\n<p>Having examined the dataset only briefly, it&#8217;s impossible to say as yet what might be reasonably extracted from it. That said, even the superficial metrics &#8211; bearing the requisite caveats in mind &#8211; are proving to be of interest.<\/p>\n<p>As an example, the histogram below represents the distribution of select programming language mentions on Hacker News. It records nothing more than mentions; it&#8217;s blind to multiple occurences in a single sentence, for example, let alone the nuance of sentiment. But given the scale of the dataset, the distribution remains interesting.<br \/>\n<br \/>\n<a href=\"http:\/\/www.flickr.com\/photos\/sog\/5203695804\/\" title=\"Programming Language Mentions on Hacker News by sogrady, on Flickr\"><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/farm5.static.flickr.com\/4084\/5203695804_20d2b3d397.jpg\" width=\"468\" height=\"500\" alt=\"Programming Language Mentions on Hacker News\" \/><\/a><\/p>\n<p>This rough metric is just a datapoint, and obviously cannot by itself contradict broader claims such as Forrester analyst Mike Gualtieri&#8217;s &#8220;<a href=\"http:\/\/blogs.forrester.com\/mike_gualtieri\/10-11-23-java_is_a_dead_end_for_enterprise_app_development\">Java is a Dead-End For Enterprise App Development<\/a>.&#8221; But in our view, such claims should include <i>in situ<\/i> assessments of developer behaviors in addition to traditional analyst firm survey work. <\/p>\n<p>Hence our interest in datasets like Hacker News, and the reason we built <a href=\"http:\/\/redmonk.com\/analytics\">RedMonk Analytics<\/a>, which will be our primary mechanism for sharing similar data directly with our customers. We&#8217;ll keep you apprised of what we learn from the dataset, and if you have questions you&#8217;d like us to ask of it we&#8217;re open to suggestions. <\/p>\n<p>As a technical note for those interested, the frequency counts were done with Cloudera&#8217;s <a href=\"http:\/\/www.cloudera.com\/downloads\/\">Hadoop distribution<\/a> and their sample examples.jar application. This provides for case sensitive substitute searching, so that the metrics above do reflect, as an example, both &#8220;Java&#8221; and &#8220;java.&#8221; <\/p>\n<p><b>Update<\/b>: That was fast. Thomas Winningham, via <a href=\"http:\/\/twitter.com\/#!\/th0ma5\/status\/7196069024759808\">Twitter<\/a>, passes along his look at the dataset with historical Python vs Ruby numbers through October, which is available <a href=\"http:\/\/chart.apis.google.com\/chart?chtt=Python+vs.+Ruby+2010+Through+October+in+Hacker+News+Comments+by+@th0ma5&amp;cht=lc&amp;chs=750x400&amp;chxt=y&amp;chxr=0,0,1680&amp;chd=t:35.8,40.9,37.8,41.8,46.8,48.9,60.7,55.5,66.2,95.2|37.2,43.9,30.1,33.6,44.3,43.3,54.0,42.2,53.6,91.1&amp;chco=990000,336699&amp;chls=1,1,0|1,1,0|1,1,0&amp;chdl=ruby%20or%20rails|python\">here<\/a>.<\/p>\n<p><b>Update 2<\/b>: By <a href=\"http:\/\/news.ycombinator.com\/item?id=1935687\">request<\/a>, I&#8217;ve updated the graphic to reflect count data for Erlang and Lisp. <\/p>\n<p><b>Update 3<\/b>: By further request <a href=\"http:\/\/news.ycombinator.com\/item?id=1935670\">request<\/a>, I&#8217;ve updated the graphic to reflect count data for C, C#, Haskell and Perl. <\/p>\n<p><b>Disclosure<\/b>: Cloudera is a RedMonk customer, and I am a Hacker News <a href=\"http:\/\/news.ycombinator.com\/user?id=sogrady\">member<\/a>. <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Two weeks ago, iHackerNews.com (a creation of Ronnie Roller) made available a Hacker News related dataset via bittorrent. For those of you unfamiliar with Hacker News, it&#8217;s a portal for developers to discuss relevant items of interest run by Paul Graham&#8217;s Y Combinator. Hacker News may not be considered representative of developers broadly, but it<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"spay_email":"","footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false},"categories":[6,72],"tags":[162,166,268,269,306,308,409,425,448,462],"class_list":["post-4009","post","type-post","status-publish","format-standard","hentry","category-application-development","category-programming-languages","tag-clojure","tag-cloudera","tag-hackernews","tag-hadoop","tag-java","tag-javascript","tag-php","tag-python","tag-ruby","tag-scala"],"jetpack_featured_media_url":"","jetpack_publicize_connections":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts\/4009","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/comments?post=4009"}],"version-history":[{"count":0,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/posts\/4009\/revisions"}],"wp:attachment":[{"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/media?parent=4009"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/categories?post=4009"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redmonk.com\/sogrady\/wp-json\/wp\/v2\/tags?post=4009"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}