TL; DR – Usage of Kafka continues to grow at an extremely fast pace across multiple industry segments. Kafka is becoming a core part of data pipelines at scale.
When we first looked at data around Kafka early last year we commented on how it was fast becoming one of the key technologies in the new data stack. Its use in areas such as data pipelines and streaming continues to grow, something born out in both our own conversations and in the recent Kafka survey completed by Confluent (we comment on several aspects of Confluents survey below, and, as always, the caveats on implicit survey bias and sample size should be considered).
We also note the level of usage across core business applications using legacy applications – this contrasts with the usage patterns we see with many other next generation tools in the cloud native space. This is in no small way due to the ongoing focus on the connector and stream APIs. Large organisations cannot simply abandon or quickly re-architect legacy systems when a new paradigm emerges, such change takes time. The Cloud Foundry and Kubernetes communities are working on the open service broker project for the same reason. Confluent note that 59% of the users surveyed are connecting Kafka to a database of some description, with only 36%, a declining number, connecting to Hadoop.
It is interesting to see that over fifty percent of the respondents to the Confluent survey cited their usage of Kafka in conjunction with microservices. This tallies with the increasing levels of interest that we see in microservices based approaches to software development.
From a momentum perspective, this validates the work done in ensuring Kafka integration in other projects such as that done by Pivotal on Spring Cloud Stream. We expect this trend to continue, and anticipate an increasing number of easy to use integration points emerging in frameworks that are heavily used for, and in conjunction with, microservices.
A Question of Scale & Cloud
In Confluents survey they noted that over 15% of their respondents are now processing over a billion messages a day. While not on the scale of those in the four-comma club, this is still very, very impressive. No matter what the industry and what way you slice and dice this number, any company that is processing over a billion messages is doing something significant. To put this number on a human scale it is over eleven thousand messages a second.
The other key trend to note from the Confluent survey is the level of Kafka usage in the cloud. This is part of a wider macro shift in how companies are working on data that my colleague James Governor has touched on recently. The combinations and distinctions require some further clarifications, but over 60% of respondents stated they are using AWS as their public cloud, and 48% of the respondents have some form of on premises deployment.
When we looked at Kafka last year we took note of the number of distinct users contributing to the Kafka Users mailing list. This has continued its steep and steady rise.
We define an active user as someone who has sent an e-mail to the kafka users list. The majority of these questions still tend to come from new users of Kafka. While the volumes of mails continue to remain relatively constant, we still view this continuous growth in active, rather than passive, users, as indicative of a strong and vibrant community.
We also continue to see increased interest around Kafka on both Stack Overflow and Github, with increasing volumes for both metrics. On Stack Overflow we note a doubling of questions from last year.
Threats on the Horizon?
Much of the enterprise momentum is all with Kafka for now, and on that front Confluent are the primary provider of a commercial offering. The most significant threat is from the cloud providers, namely Google Cloud Dataflow, Amazon Kinesis and Azure Event Hub from Microsoft and the various integration points which they offer.
This leads us to the question of the overall direction that technology consumers may take, the all in or over the top model that Cohesive Networks talk about. Where it is the over the top model Confluent are well placed. However, for companies that are choosing to go all in with a provider, Kafka will be quickly dropped for one of the cloud provider alternatives.
Disclaimers: Confluent, Amazon, Google, Pivotal, Microsoft, Cohesive Networks and The Cloud Foundry Foundation are current RedMonk clients.