Skip to content

The Rise and Rise of Apache Kafka

One of the key technologies in the new data stack is Apache Kafka, and over the last eighteen months we have been tracking a huge uptick in developer interest in, chatter around, and usage of, Kafka. If you have not heard of Kafka it is a highly scalable distributed publish-subscribe messaging system, which happens to be very well suited to use cases such as streaming event data.

With new workloads in areas such as IoT, mobile and gaming generating massive, and ever increasing, streams of data, developers have been looking for a mechanism to easily consume the data in a consistent and coherent manner. Which is exactly where Kafka fits in.  This has lead to a number of commercial offerings and product combinations appearing over the last year from vendors such as Confluent, IBM and Cloudera among others.

As my colleague Stephen O’Grady has stated recently

It’s [kafka] becoming more visible because it’s a high-quality open-source project, but also because its ability to handle high-velocity streams of information is increasingly in demand for usage in servicing workloads like IoT, among others”

I mentioned earlier we have seen a big uptick in developer interest over the last eighteen months. Looking at multiple data sources we can see some clear trends around this uptick.

On Stack Overflow we saw a marked increase in questions beginning in mid 2014:


This trend on Stack Overflow matches into the growth in stars for the Kafka project on GitHub.


Google trends demonstrates the same growth, with an uptick starting in mind 2014.


The most interesting data point, however, has been the consistent growth of active users on the kafka users mailing list, which is just over 260% since July 2014.



We define an active user as someone who has sent an e-mail to the kafka users list. The majority of these questions tend to come from a new users of Kafka. While the volumes of mails remain relatively constant this continuous growth in active, rather than passive, users, is indicative of a strong and vibrant community.

Disclosure: IBM and Cloudera are current RedMonk clients.

Categories: Apache, Data, Internet of Things.

Comment Feed

No Responses (yet)

Some HTML is OK

or, reply to this post via trackback.