What languages win for data mining and analysis?

Donnie Berkholz's Story of Data

What languages win for data mining and analysis?

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

KDnuggets, a website that’s been around since 1997 and probably not redesigned since then, is a pretty popular site for data-mining folks. The site just posted the results of a poll of its visitors on the most popular languages for data mining & analysis, and (more interestingly) contrasted the poll to last year’s results. KDnuggets does an ample job of showing the absolute results, but I wondered what more I could learn by focusing on the growth and loss of individual languages since last year:

The languages are listed in order of popularity from left to right, and I omitted ones that weren’t in the 2011 poll or, like Lisp/Clojure vs Lisp alone, were not fair comparisons. Given that there were 579 total voters and 39 for even the least-popular option shown (Hadoop-based), there’s a pretty reasonable sample size, although I have no expectation that KDnuggets visitors are broadly representative of people doing data mining and analysis. That said, it’s an interesting data set.

You can see the general trend of newer, open-source languages growing at varying speeds (Python followed by R and Hadoop-based options like Hive/Pig), while older languages including Java, SAS, and Matlab are bleeding users. One of the most unusual changes, in my view, was the growth in people using old-school Unix shell tools to get the job done. My best guess is that when people leave languages that come along with IDEs, they tend to use the best tool for the job rather than staying inside a single window. I’m not really surprised at the increase in C/C++, which makes a pretty natural pairing with the higher-level languages when you need to speed up critical calculations.


No Comments

Leave a Reply

Your email address will not be published. Required fields are marked *