While at Velocity 2009 this year, I had the chance to sit down and talk with Cloudera's Jeff Hammerbacher. We talked all about Hadoop, how it's used, and how the company he's at, Cloudera, is fitting into the Hadoop community.
We start out talking about the reception of Jeff's Hadoop talk at Velocity and then very quickly get into a discusion of what Hadoop is and the types of applications that use it. Remembering Jeff's (brief) Wall Street background, I ask him how he went from Wall Street to Facebook, which starts a discussion of the differences between financial analysis and web company data analysis.
I ask Jeff if Hadoop is just an open source implementation of existing closed source ideas and products. His answer is no, that it's something completely new. And though they're not sure what to call the category it fits into, they've been watching users of Hadoop swap it into traditional workflows and use it for new ones. This leads me to ask what the data inputs and outputs for Hadoop typically are: are they mostly databases, logs, or what? And how do users consume the data work-loads that get send through Hadoop? Jeff goes over some interesting examples.
We talk about Jeff's sense for how many people are using Hadoop. He says "on the order of 100's" are using it production, with an order of magnitude more playing around with Hadoop. Looking forward, Jeff goes over some application types that we might see using Hadoop in the near future.
Next, we get into a discussion of what the company Cloudera's relation to Hadoop is and what the corporate goals are. As Jeff starts to say, most Hadoop use in the web world at the moment, so they're hoping to not only bring Hadoop to the more enterprise installs, but be an advocate in the Hadoop community for those uses.
I then ask Jeff what the emerging cycle of Hadoop adoption is for organizations. As with much open source, he begins, it starts with developers electing to use it, largely, on their own. As these applications move from "desktop computing" (some spare boxes) to production, operations typically gets involved to make sure the whole Hadoop-based stack stays healthy.
Many of the questions Cloudera gets about Hadoop are around the hardware specs for Hadoop clusters and applications. So, I ask Jeff to go over the typical, if not best, server configurations and specs.
Looking forward, I ask Jeff to tell us about the product road-map for Cloudera: what are they looking to do in the near future? Much of their efforts are in the area of making Hadoop more usable and, with things like monitoring, making Hadoop easier for IT departments to consume. At higher-level, Cloudera is figuring out how to deliver "warehouse scale computing" beyond the walls of Google, who recently quipped the term.
Closing out, Jeff briefly discusses the high probability of commercial competitors to Hadoop and Cloudera emerging, and the virtues of Hadoop being open, vs. closed source.
Disclosure: Cloudera is a client and sponsored this video.