Blogs

RedMonk

Skip to content

Why isn’t everyone racing to get Hadoop into Linux distros?

The biggest opportunity, and the biggest threat, I see for the commercial Hadoop distributions is who becomes the default provider for Hadoop packaged by the popular Linux distributions. Today, none of them package Hadoop in their primary package repositories; but imagine a world where `yum install hadoop`, or `apt-get install hadoop`, or `emerge hadoop` worked.

Today, Cloudera essentially owns the definition of Hadoop by virtue of maintaining such a large market share (see slides 17–18). But this ownership of the Hadoop “standard” could change dramatically if another Hadoop distribution got itself into the main package repo of Ubuntu, Debian, Fedora, and friends, as these are the default operating systems and distributions used for Hadoop, be it on-prem or in the cloud.

What would this require? In the case of Debian and Fedora, at least, packages must be truly open source — nothing proprietary. This could provide a significant advantage to Hortonworks and Intel Hadoop as the former has been pushing a fully OSS solution and the latter truly only cares about shipping hardware; Hadoop software/services revenue would likely be a rounding error.

Free code, easy installation, and easy use are tough to beat.

Disclosure: Among Hadoop distributions/providers, Amazon, Cloudera, IBM, and MapR are clients. Hortonworks, Intel, and Pivotal are not.

by-sa

Categories: big-data, cloud, community, data-science, linux, open-source.