{"id":114,"date":"2016-05-23T09:25:50","date_gmt":"2016-05-23T09:25:50","guid":{"rendered":"http:\/\/redmonk.com\/fryan\/?p=114"},"modified":"2016-06-06T15:27:10","modified_gmt":"2016-06-06T15:27:10","slug":"kafka-summit-the-four-comma-club","status":"publish","type":"post","link":"https:\/\/redmonk.com\/fryan\/2016\/05\/23\/kafka-summit-the-four-comma-club\/","title":{"rendered":"Kafka Summit: The Four Comma Club"},"content":{"rendered":"<p>We had the opportunity to attend the <a href=\"http:\/\/kafka-summit.org\/\">Kafka Summit<\/a> in San Francisco in late April. As <a href=\"http:\/\/redmonk.com\/fryan\/2016\/02\/04\/the-rise-and-rise-of-apache-kafka\/\">we have noted previously<\/a> usage of, and interest in, Apache Kafka has been growing at a very impressive pace. Both the enthusiasm and energy levels at the summit were high, and as we would expect from a community at this stage of its evolution, the level of marketing speaking was refreshingly low. This was truly a technology focused event with great content.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">@merv its a very impressive program, commented earlier its unusual to get a 3 track conference where you can&#39;t choose between sessions<\/p>\n<p>&mdash; Fintan Ryan (@fintanr) <a href=\"https:\/\/twitter.com\/fintanr\/status\/724996130669473794?ref_src=twsrc%5Etfw\">April 26, 2016<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Kafka solves a difficult problem \u2013 that of a highly scalable distributed publish-subscribe messaging system, used for streaming event data and, in many cases, as an enterprise service bus. This is a problem that many companies have to address, but few want, or can, invest the level engineering time required.<\/p>\n<p>My colleague, <a href=\"http:\/\/twitter.com\/monkchips\">James Governor<\/a>, has been highlighting the current shift is from the cloud to the data era, and Kafka is fast becoming a key technology for facilitating this.<\/p>\n<h2>A Question of Scale and The Four Comma Club<\/h2>\n<p>The general question that comes up in discussions about Kafka is \u201cwhy\u201d, and in particular why in comparison to other messaging systems out there. The first answer is scale, and in particular massive scale.<\/p>\n<p><a href=\"http:\/\/redmonk.com\/fryan\/files\/2016\/05\/four-comma.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-116\" src=\"http:\/\/redmonk.com\/fryan\/files\/2016\/05\/four-comma.png\" alt=\"four-comma\" width=\"500\" height=\"275\" srcset=\"https:\/\/redmonk.com\/fryan\/files\/2016\/05\/four-comma.png 500w, https:\/\/redmonk.com\/fryan\/files\/2016\/05\/four-comma-300x165.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>During the summit, <a href=\"http:\/\/confluent.io\">Confluent<\/a> CTO <a href=\"https:\/\/twitter.com\/nehanarkhede\">Neha Narkhede<\/a>, highlighted the \u201cFour Comma Club\u201d, companies that are processing over a trillion messages a day using Kafka. No matter what way you slice and dice the numbers 1,000,000,000,000 messages is a really impressive figure.<\/p>\n<p>But not everyone is a <a href=\"http:\/\/netflix.com\">Netflix<\/a>, or is going to approach that scale at any time in the near future, something I return to at the end of this post.<\/p>\n<h2>Apache Flink &amp; Apache Beam<\/h2>\n<p>Two emerging technologies, which we at RedMonk are watching closely, that are often used in conjunction with Kafka are <a href=\"https:\/\/flink.apache.org\/\">Apache Flink<\/a>, a streaming data flow engine, and <a href=\"http:\/\/beam.incubator.apache.org\/\">Apache Beam<\/a>, a programming model for creating data processing pipelines. Among the key features of both projects is the ability to deal with out of order streams of data.<\/p>\n<p>Explaining the out of order problem has been challenging up to now, but <a href=\"https:\/\/twitter.com\/stephanewen\">Stephan Ewen<\/a>, CTO of <a href=\"http:\/\/data-artisans.com\">Data Artisans<\/a>, has solved this once and for all with this lovely example.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Brilliant explanation of event time vs processing time by <a href=\"https:\/\/twitter.com\/StephanEwen?ref_src=twsrc%5Etfw\">@StephanEwen<\/a> <a href=\"https:\/\/twitter.com\/hashtag\/kafkasummit?src=hash&amp;ref_src=twsrc%5Etfw\">#kafkasummit<\/a> <a href=\"https:\/\/t.co\/IM2BJtQOH5\">pic.twitter.com\/IM2BJtQOH5<\/a><\/p>\n<p>&mdash; Fintan Ryan (@fintanr) <a href=\"https:\/\/twitter.com\/fintanr\/status\/725030836479873024?ref_src=twsrc%5Etfw\">April 26, 2016<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Apache Beam has grown out of <a href=\"https:\/\/cloud.google.com\/dataflow\/\">Google Dataflow<\/a>, and more explicitly is based on a lovely academic paper \u201c<a href=\"http:\/\/bit.ly\/1XV3oZt\">The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing<\/a>\u201d. As an aside we have mentioned The Morning Paper in the past when we wrote about \u201c<a href=\"http:\/\/redmonk.com\/fryan\/2016\/03\/10\/the-welcome-return-of-research-papers-to-software-craft\/\">The Welcome Return of Research Papers to Software Craft<\/a>&#8221;\u00a0and <a href=\"https:\/\/twitter.com\/adriancolyer\">Adrian Colyer<\/a> provided<a href=\"https:\/\/blog.acolyer.org\/2015\/08\/18\/the-dataflow-model-a-practical-approach-to-balancing-correctness-latency-and-cost-in-massive-scale-unbounded-out-of-order-data-processing\/\"> a very nice write up<\/a>\u00a0on the The Dataflow paper last year.<\/p>\n<p>Apache Beam committers, and Google Engineers, <a href=\"https:\/\/twitter.com\/francesjperry\">Frances Perry<\/a> and <a href=\"https:\/\/twitter.com\/takidau\">Tyler Akidau<\/a> gave a lovely talk about using Apache Beam, highlighting the simplicity of the approach they are taking, without delving too much into the highly complex solution that lies below.<\/p>\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\">\n<p lang=\"en\" dir=\"ltr\">Simplicity &amp; amazing power with <a href=\"https:\/\/twitter.com\/ApacheBeam?ref_src=twsrc%5Etfw\">@ApacheBeam<\/a> <a href=\"https:\/\/twitter.com\/takidau?ref_src=twsrc%5Etfw\">@takidau<\/a> @francesjperry <a href=\"https:\/\/twitter.com\/hashtag\/kafkasummit?src=hash&amp;ref_src=twsrc%5Etfw\">#kafkasummit<\/a> <a href=\"https:\/\/t.co\/H6MIeatyXi\">pic.twitter.com\/H6MIeatyXi<\/a><\/p>\n<p>&mdash; Fintan Ryan (@fintanr) <a href=\"https:\/\/twitter.com\/fintanr\/status\/725021308048793600?ref_src=twsrc%5Etfw\">April 26, 2016<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<p>Now we must note that both Apache Beam is still an incubating project, but given its usefulness and the significant engineering resources behind it, one would expect to see it becoming a fully-fledged Apache. project over time.<\/p>\n<p>If you have the time, I\u2019d highly recommend watching the videos of both sessions (<a href=\"http:\/\/www.confluent.io\/kafka-summit-2016-systems-advanced-streaming-analytics-with-apache-flink-and-apache-kafka\">Flink talk<\/a>, <a href=\"http:\/\/www.confluent.io\/kafka-summit-2016-systems-fundamentals-of-stream-processing-with-apache-beam\">Beam talk<\/a>). It will be an hour and a half well spent understanding the next few years in the evolution of data.<\/p>\n<h2>The Commercial Problem<\/h2>\n<p>The list of companies using Kakfa that was highlighted at the summit represented a truly impressive roster, with some immense engineering talent.<\/p>\n<p><a href=\"http:\/\/redmonk.com\/fryan\/files\/2016\/05\/kafka-companies.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-117\" src=\"http:\/\/redmonk.com\/fryan\/files\/2016\/05\/kafka-companies.png\" alt=\"kafka-companies\" width=\"500\" height=\"282\" srcset=\"https:\/\/redmonk.com\/fryan\/files\/2016\/05\/kafka-companies.png 500w, https:\/\/redmonk.com\/fryan\/files\/2016\/05\/kafka-companies-300x169.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>However, therein lies both the problem and opportunity for Kakfa currently \u2013 the Netflix problem one might say. The companies that have the scale that really demands Kakfa at the moment generally have engineering, and more specifically SRE teams, that can and do support Kafka by themselves.<\/p>\n<p>This will change in the medium term as more companies gain an understanding of exactly what they can accomplish with Kafka, and more importantly continue the shift back towards having teams of strategic technologists in house. However, for now it does lead to some interesting commercial question. For those companies that want to use Kafka, but don\u2019t want either the administrative overhead or to dedicate the engineering resources, here are a number of approaches that they can adopt.<\/p>\n<p>On a cloud only level you could use offerings such as <a href=\"https:\/\/console.ng.bluemix.net\/catalog\/services\/message-hub\">IBM\u2019s Message Hub<\/a>. But for most companies the real value of using something like Kafka will come in modernizing their approach to, and understanding of, data, and that will mean using Kafka on premise. Currently this leads you too looking at the product offerings from Confluent and IBM.<\/p>\n<p>As for the four comma club? We look forward to seeing more companies joining. And expect quite a few in the three comma club in the near future.<\/p>\n<p><strong>Disclaimers<\/strong>: IBM are a RedMonk client. Confluent provided my ticket to the Kafka Summit.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We had the opportunity to attend the Kafka Summit in San Francisco in late April. As we have noted previously usage of, and interest in, Apache Kafka has been growing at a very impressive pace. Both the enthusiasm and energy levels at the summit were high, and as we would expect from a community at<\/p>\n","protected":false},"author":40,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,23,4,8,7,15],"tags":[],"class_list":["post-114","post","type-post","status-publish","format-standard","hentry","category-apache","category-api","category-business","category-conferences","category-data","category-developers"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/posts\/114","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/users\/40"}],"replies":[{"embeddable":true,"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/comments?post=114"}],"version-history":[{"count":0,"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/posts\/114\/revisions"}],"wp:attachment":[{"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/media?parent=114"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/categories?post=114"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/redmonk.com\/fryan\/wp-json\/wp\/v2\/tags?post=114"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}