tecosystems

Why I’m Taking Statistics

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids. Because now we really do have essentially free and ubiquitous data. So the complimentary scarce factor is the ability to understand that data and extract value from it.”
Hal Varian, Professor of Information Sciences, Business, and Economics at the University of California at Berkeley / Chief Economist, Google

This interview was not why started taking an Introduction to Statistics course down at the Harvard Extension School. It was published the month I started class. But this statement is precisely the justification for my weekly trips down to Boston January through May.

Like Varian, I am of the opinion that the ability to extract value from data and effectively communicate that value will be at a significant premium going forward. It has been for some time, clearly, hence the historical margins for vendors like Business Objects, Cognos, SAS and so on. But just wait.

IBM’s head of Software Group Steve Mills said late last year that we are “moving into an era of information led transformation.” I look at businesses like Flightcaster, and I can’t help but agree.

We have more and better data than we’ve ever had before. The tools to process that data – even at scale – are aggressively being democratized. And the infrastructure required to run the tools against the data are available, as Bradford said, is economically accessible even to startups.

But technology is, sadly, only part of the equation. And in some respects, a small part.

It doesn’t take much education, for example, to spin up a dozen Hadoop instances. Knowing how to write a MapReduce job, on the other hand, requires a more “sizable time investment.” As does understanding confidence intervals, the null hypothesis and the central limit theorem. Hence, my class.

Life, according to economics, is about incentives. My incentive to learn such things is simple: the ability to be able to understand more completely what data is trying to tell us will have value. Value more than sufficient to offset my investment. Or so I hope.

That, in a nutshell, is why I’m taking statistics.

Not that statistics is the end; there’s a lot I have left to learn. The older I get, the more I realize how much I don’t know.

Next on the agenda after this semester is trying to absorb the rudiments of statistics languages, machine learning and MapReduce; I’m just praying I don’t have to learn Java first. Cloudera’s Training VM has been a big help with Hadoop training, and Robert Kabacoff’s Quick-R has been a big help for me in learning that language, as have Flowing Data’s tutorials.

I’ve got a long way to go, in other words. But we all have to start somewhere, and the payoff for me, as for Varian, is apparent.