Charting Stacks

Kafka Summit: The Four Comma Club

Share via Twitter Share via Facebook Share via Linkedin Share via Reddit

We had the opportunity to attend the Kafka Summit in San Francisco in late April. As we have noted previously usage of, and interest in, Apache Kafka has been growing at a very impressive pace. Both the enthusiasm and energy levels at the summit were high, and as we would expect from a community at this stage of its evolution, the level of marketing speaking was refreshingly low. This was truly a technology focused event with great content.

Kafka solves a difficult problem – that of a highly scalable distributed publish-subscribe messaging system, used for streaming event data and, in many cases, as an enterprise service bus. This is a problem that many companies have to address, but few want, or can, invest the level engineering time required.

My colleague, James Governor, has been highlighting the current shift is from the cloud to the data era, and Kafka is fast becoming a key technology for facilitating this.

A Question of Scale and The Four Comma Club

The general question that comes up in discussions about Kafka is “why”, and in particular why in comparison to other messaging systems out there. The first answer is scale, and in particular massive scale.

four-comma

 

During the summit, Confluent CTO Neha Narkhede, highlighted the “Four Comma Club”, companies that are processing over a trillion messages a day using Kafka. No matter what way you slice and dice the numbers 1,000,000,000,000 messages is a really impressive figure.

But not everyone is a Netflix, or is going to approach that scale at any time in the near future, something I return to at the end of this post.

Apache Flink & Apache Beam

Two emerging technologies, which we at RedMonk are watching closely, that are often used in conjunction with Kafka are Apache Flink, a streaming data flow engine, and Apache Beam, a programming model for creating data processing pipelines. Among the key features of both projects is the ability to deal with out of order streams of data.

Explaining the out of order problem has been challenging up to now, but Stephan Ewen, CTO of Data Artisans, has solved this once and for all with this lovely example.

Apache Beam has grown out of Google Dataflow, and more explicitly is based on a lovely academic paper “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing”. As an aside we have mentioned The Morning Paper in the past when we wrote about “The Welcome Return of Research Papers to Software Craft” and Adrian Colyer provided a very nice write up on the The Dataflow paper last year.

Apache Beam committers, and Google Engineers, Frances Perry and Tyler Akidau gave a lovely talk about using Apache Beam, highlighting the simplicity of the approach they are taking, without delving too much into the highly complex solution that lies below.

Now we must note that both Apache Beam is still an incubating project, but given its usefulness and the significant engineering resources behind it, one would expect to see it becoming a fully-fledged Apache. project over time.

If you have the time, I’d highly recommend watching the videos of both sessions (Flink talk, Beam talk). It will be an hour and a half well spent understanding the next few years in the evolution of data.

The Commercial Problem

The list of companies using Kakfa that was highlighted at the summit represented a truly impressive roster, with some immense engineering talent.

kafka-companies

 

However, therein lies both the problem and opportunity for Kakfa currently – the Netflix problem one might say. The companies that have the scale that really demands Kakfa at the moment generally have engineering, and more specifically SRE teams, that can and do support Kafka by themselves.

This will change in the medium term as more companies gain an understanding of exactly what they can accomplish with Kafka, and more importantly continue the shift back towards having teams of strategic technologists in house. However, for now it does lead to some interesting commercial question. For those companies that want to use Kafka, but don’t want either the administrative overhead or to dedicate the engineering resources, here are a number of approaches that they can adopt.

On a cloud only level you could use offerings such as IBM’s Message Hub. But for most companies the real value of using something like Kafka will come in modernizing their approach to, and understanding of, data, and that will mean using Kafka on premise. Currently this leads you too looking at the product offerings from Confluent and IBM.

As for the four comma club? We look forward to seeing more companies joining. And expect quite a few in the three comma club in the near future.

Disclaimers: IBM are a RedMonk client. Confluent provided my ticket to the Kafka Summit.

3 comments

  1. Thank you, Fintan. Great insight!

    A small, but important, request: the project’s name is actually “Apache Kafka” –any chance you would be able to update it at the initial mention, please?

    Thanks in advance for this.

    1. All updated :).

Leave a Reply to Last week in Stream Processing & Analytics 5/31/2016 | Enjoy IT - SOA, Java, Event-Driven Computing and Integration Cancel reply

Your email address will not be published. Required fields are marked *