Kafka Summit 2020

I attended Kafka Summit 2020, a virtual conference organized by Confluent. It was a two-day virtual event with many great speakers from the Kafka community. I joined the event from Denmark, which meant staying up almost all night with coffee to chat and follow along with all the exciting talks.

Summary of Day 1

I started the event by watching the Day 1 Morning Keynote Program, which officially kicked things off. Gwen Shapira from Confluent talked about new Kafka Improvement Proposals, focusing on KIP-405 (Kafka Tiered Storage) and KIP-500 (Replacing ZooKeeper with a Metadata Quorum). Both represent massive improvements to Kafka.

Kafka Tiered Storage helps improve the elasticity of Kafka clusters by introducing a cold storage layer to offload data from Kafka brokers. This feature makes it easier to scale and add new Kafka brokers, since less data needs to be copied to the new brokers.

Removing ZooKeeper simplifies the operational burden of running a Kafka cluster, as it removes one additional software component to manage. By keeping metadata state in memory, Kafka opens up new scaling possibilities—potentially supporting up to 10,000,000 partitions in the future.

After that, I moved on to a talk about trade-offs in distributed systems design: “Is Kafka the Best?” I highly recommend watching this talk. The speakers discussed many infrastructure design trade-offs, including benchmark comparisons such as data throughput comparisons between Kafka, Pulsar, and RabbitMQ. They also covered messaging model basics, contiguous streams vs. fragmented streams, and lots of other great material.

Viktor Gamov’s talk on testing stream processing applications also contained excellent content—and he is a very fun and energetic speaker. I highly recommend checking out his livestream videos on Viktor’s YouTube channel and Confluent’s YouTube channel, especially if you want to learn more about testing and Kafka Streams applications.

The talk “Can Kafka Handle a Lyft Ride?” featured a great demo and walkthrough of state machines, pub/sub architecture, message delivery latency, and Kafka.

Summary of Day 2

I started Day 2 by watching Kai Waehner’s talk: “Apache Kafka, Tiered Storage, and TensorFlow for Streaming Machine Learning without a Data Lake.” It was a great talk with explanations and demos of a predictive maintenance use case, showing a complete data pipeline for machine learning—covering model training with streaming data and real-time predictions using ksqlDB.

Next, I watched Robin Moffatt’s talk: “Apache Kafka and ksqlDB in Action: Let’s Build a Streaming Data Pipeline!” A great customer data demo using Kafka, ksqlDB, Kafka Connect with Change Data Capture from a relational database, and integration with Elasticsearch and Kibana dashboards.

Day 2 really kicked into high gear with the Morning Keynote Program featuring Jay Kreps, co-founder of Kafka, and Sam Newman, author of *Building Microservices*. These were the two best talks of the entire event. You can watch them here:

I wrapped up the event by watching “A Tale of Two Data Centers: Kafka Streams Resiliency” by Anna McDonald. A fun and engaging talk about resiliency, replication, and stretch clusters.

A huge thank you to all the speakers, sponsors, and participants who made this event possible!

Kafka Summit 2020

Summary of Day 1

Summary of Day 2

Apply this to your platform

Related content

Discuss a similar challenge