The Best Books to Learn Apache Kafka

A practitioner’s journey to enlightenment

John L. Watson

--

Apache Kafka is the proverbial runaway freight train of event-driven and Big Data-centric architectures, as well as this microservices thing that everyone seems to be rattling on about. Its tremendous power and flexibility have given it unstoppable momentum in both the industry and the open-source community; so much so that Kafka has become the default scan term of recruiters, along with other buzzwords such as ‘full-stack’, ‘micro frontends’ and ‘polyglot’. By some estimates, as much as 40% of Fortune 500 companies are using Kafka. Some might argue that these estimates are conservative.

This post is a reflection on my experience learning Apache Kafka, which has had its ups and downs. I sincerely hope it will benefit those that have yet to embark on their educational journey. In this article, I’m going to focus on the resources that have helped me get there, predominantly the books I have read and my experience with (and without) them.

Want to read this story later? Save it in Journal.

I will begin by saying that I do not profess to be an expert in Kafka. (And I wish more people would be honest with themselves and others on such matters.) I have been using Kafka for many years now and have become moderately proficient in it. I understand the ecosystem and the tooling, and I have dealt with Kafka from both developer and operational standpoints. I have been a casual speaker at some low-key local meetups and I’ve recently been admitted to the top-ten writers in the Apache Kafka topic on Quora.

That’s me, coasting lazily in number 5, rubbing shoulders with some of the true Kafka legends: Emil Koutanov, Stéphane Maarek, Hans Jespersen, Jay Kreps, Kai Wähner and Gwen Shapira. 🥳 🥳 🥳 Okay, I’m not quite in their league, but it still gives me a sense of pride.

Effective Kafka: A Hands-on Guide to Building Robust and Scalable Event-Driven Applications

I started using Kafka in 2017 when the technology was already mature, the documentation was thorough and blogs were plentiful. I took the path that most did at the time — read the official documentation and “Stack Overflow” the rest. And our team were building production apps that worked fine… until they didn’t. Sometimes we would lose a few messages. Sometimes consumers would stall for no apparent reason, eventually causing a rebalance. And sometimes we saw messages appear out-of-order. But all in all, the developer experience was adequate — not overly different from any other open-source project. I thought I had a pretty good handle on things and consigned the occasional production incident to either a hypothetical bug in Kafka or the actions of our DevOps team that were looking after Kafka, Spark, Hadoop, Cassandra, and a bunch of other open-source platforms we relied on at the time.My experience changed when I spotted a rather unremarkable-looking $10 book that popped up on Leanpub in early 2020 as part of their Autumn promotion. (I’m writing from New Zealand, in case this may have caused confusion.) The book claimed to have an architectural focus, touching on things like distributed computing, consistency models, security… So it wasn’t just another Kafka manual. And at $10 it’s a joke — two cups of coffee.

I say! All things aside, the author has a way with words. The use of language, sentence construction and the overall structure of the book is a literary masterpiece. The common pulp that you get these days from Manning, Packt and Apress couldn’t hold a candle to this. Why is that important? Because the author obviously takes pride in how he communicates complex ideas. For a great author, language is an expression of one’s thoughts and ultimately one’s self; it is not just an instrument for making their fame and fortune. I consider myself a mildly cultured person when it comes to literature, and I can certainly appreciate a fine writing style, for lack of having one of my own. Then again, this is a subjective opinion and isn’t likely to convince you if you are simply after a Kafka ‘quick fix’.

Effective Kafka: A Hands-on Guide to Building Robust and Scalable Event-Driven Applications by Emil Koutanov is an astonishingly good book on every level. The first few chapters were relatively uneventful, as I was already quite familiar with the platform; this wasn’t the first Kafka book I had purchased. (We’ll come to that later.) Chapter 3 was where it started to get interesting. I began questioning my knowledge of the platform, but not necessarily just taking the author’s word for it either. Being a new book with no reviews at the time, I couldn’t afford to give it any serious credibility, so I started looking over the official docs and drilling into Kafka’s source code. You see, I spotted the reason why we were losing messages. It was to do with consuming record batches in a thread pool while continuously polling the consumer to keep the pool saturated. The consumer client was committing offsets for records that were still in flight. Yet it was there in the book, in black and white. And we had all missed it. At this point, I had recouped my $10 investment many-fold.

Me, before reading Effective Kafka

Chapter 10 produced the next big revelation. It explained why we were seeing messages published out of order, despite queuing them in the correct order on the producer client. It came down to the default setting of the max.in.flight.requests.per.connection property, which allowed the producer to reorder the records under certain conditions. Again, cross-referencing the official documentation — it was there, hidden among about a thousand other properties. Like looking for a needle in a haystack. To me, this is one of the biggest challenges in Kafka — its flexibility means tons of configuration parameters that you need to understand to get on top of Kafka and to use it correctly. Of those parameters, most are esoteric, but the ones that aren't can really wreak havoc on your system. Without an expert to hold your hand, you need to set aside several days to go over all the client configuration properties and decide for yourself which ones matter to your specific use case.

The rest of my experience was in the same vein. By chapter 15, I had sorted the last of our production issues. All of a sudden, I became the go-to guy on Kafka. My stock trended up.

The book contains tons of examples that are written in Java as well as little configuration snippets here and there. The examples are backed by a GitHub repository, so you don’t need to write out the code to make it work. All examples work: verified! If you don’t come from a Java background, I think the examples can probably be adapted to other languages, but being a Java developer, this is not something I’ve tried.

The book doesn’t just target application developers; the operations team will get a lot out of it. Configuration and cluster management is covered in great detail, along with containerisation and security. Actually, the bits on security are probably the best I’ve read in any book.

Effective Kafka is hands-down the best book on Apache Kafka money can buy, period. It is suitable for everyone — from absolute green-horn beginners to experts who “think” they know everything about Kafka. (Trust me, you have no idea.) I think the best thing about Effective Kafka isn’t the technical detail, but the way it manages to convey the author’s insights. Emil is obviously a knowledgeable person in several fields, and it feels like the book was more of a conveyor of knowledge rather than information. I hugely enjoyed the section on liveness and safety, even though at the time, I couldn’t see how it related to Kafka. It did, of course. This is the point I’m making: sometimes the knowledge we seek isn’t the knowledge we need. Typing “free ebook on kafka pdf” into Google will probably get you what you want, but it will land you precisely nowhere.

I am glad I’ve taken the time to read it and I’m hugely indebted to Emil for his contribution. Partway through the book, I realised that he is also the maintainer of Kafdrop — a tool that we have been using at work for several years. I was so mesmerised by the book that I wanted to fly over from Auckland to meet the author in Sydney for a spot of lunch, but unfortunately, the pandemic had other plans. I’m hoping I can at least do an online interview with Emil.

Kafka: The Definitive Guide

The Definitive Guide, which is how I’ve come to call it now, was the first book I read on Kafka. At about the time when we transitioned from ActiveMQ to Kafka, it was quite hectic, and the company bought a few copies of The Definitive Guide to help move things along. (But more likely to quiesce that characteristic developer moaning that comes with every new tech decision made by the enterprise architecture big wigs.) I’m glad we picked this book over the others at the time.

At the time when most developers were exposed to some sort of JMS-based MQ broker, the very notion of an event streaming platform seemed alien. To say more, it seemed needlessly convoluted. Partitions, offsets, consumer groups, total order, at-least once delivery… Why all this nonsense, when topics and queues have carried enterprise systems for decades. The distinction between events and commands was not really clear to anyone at the time and while event-driven architecture wasn’t new, it certainly wasn’t mainstream. Most microservices were REST-based then, while the cool kids dabbled with gRPC, Kafka, Kubernetes and Monorepos.

The thing is, Kafka isn’t just a platform, it is a concrete manifestation of the event-driven ideology. And this is where I think the documentation fails to do it justice. It explains the technical detail with clinical precision, but leaves you wondering why things are the way they are. This is my opinion, but I believe most regular devs who read Kafka docs are just looking for a simple way to get Kafka to work like, say, ActiveMQ. Which is absolutely the wrong way to go about it, but I think this happens more often than the maintainers of Kafka realise. I think this is also why Kafka polarised the developer world when it came out: its proponents really understood why you needed total order and at-least once delivery, while its opposers were looking for a better version of RabbitMQ. The world was split into those who got it and those who didn’t. I must admit, I wasn’t always on the side that I am on now.

Kafka: The Definitive Guide by Neha Narkhede, Gwen Shapira and Todd Palino addressed this vital shortfall. The authors take seemingly complex concepts and break them down into a very digestible form, making them accessible to most people. More importantly, they explain the “why”, not just the “how”. The book really puts things into context, clearly separating the use cases behind event streaming platforms from those that are best suited to message queues.

I still recall my first time picking up the book, thinking that it will have a “salesy” feel to it — considering that two of its three authors are card-carrying Confluent employees. I can assure you that this is not the case. There is mention of some Confluent open-source projects, which is quite understandable — and in the same vein as the extensive use of Kafdrop in Effective Kafka. Overall, I think it is a very objective and unbiased book that explains the core topics nicely, without any prior expectations or assumptions about the reader.

Owing to my newfound reliance on Effective Kafka I hadn’t touched The Definitive Guide for some time. I was a tad concerned that this would lead to a biased review. So I read it again, cover-to-cover. It was a great experience now, as it was then! It is, by all measures, an excellent book for its size and price. And for its time too; lest we forget, The Definitive Guide is the older of the two.

The first few chapters of The Definitive Guide really focus on the core concepts, rather than diving head-first into the fray with a handful of random examples. This is very much the same approach taken by Effective Kafka — both sets of authors really take their time to make sure you understand the nature of the beast. Remember, you are buying knowledge, not an instruction manual.

Contrasting The Definitive Guide with Effective Kafka

  • The Definitive Guide is laser-focused on the technology platform, while Effective Kafka also educates you on architecture and best-practices and even goes into some of the concepts of distributed computing.
  • Effective Kafka covers the core open-source platform components — Kafka and ZooKeeper. The Definitive Guide also touches on some of Confluent’s contributions, such as Replicator and Schema Registry.
  • The Definitive Guide is a lot smaller — roughly 280 pages of content versus 460 pages of Effective Kafka. I hadn’t realised how short it was in comparison until I read it a second time. You’ll get through The Definitive Guide faster, if that matters to you.
  • I would argue that The Definitive Guide limits its audience from beginner to intermediate level. Effective Kafka can appeal to a broader range — from beginner to advanced, and beyond.
  • Effective Kafka discusses deployment topologies and networking in a lot more detail, even touching on technologies such as Docker and Kubernetes. The Definitive Guide is an older publication, preceding Effective Kafka by about three years. Perhaps for that reason, it omits any mention of Docker and Kubernetes.
  • Both books have their fair share of code examples written in Java. Effective Kafka has a broader API coverage of producer and consumer clients and is backed by a GitHub repository.
  • Effective Kafka has a broader coverage of tooling, not only the CLI tools that come with Kafka, but also third-party web-based UIs, such as Kafdrop. (Unsurprising, seeing that the author is a Kafdrop maintainer.)

The reduced page count of The Definitive Kafka definitely trades off in the content department. The book either omits or skims on some of the more advanced topics covered in Effective Kafka, such as:

  • Static group membership — mostly used in Kubernetes-style deployments, where pods can come and go at short notice.
  • Rebalance listeners — they do get a mention, but only in passing. Depending on your application, you may need complete knowledge of rebalance listeners to implement proper consumer exclusivity.
  • Transactions — the book was written before Kafka transactions were a thing. Effective Kafka dedicates a whole chapter to this topic.
  • Exactly-once delivery — gets a mention in the context of idempotent writes, but is incomplete without a discussion on transactions.
  • Resilience patterns, such as retries, dead-letters, timeouts, and so forth, are not given due attention. By comparison, Effective Kafka dedicates an entire chapter to dealing with failures. From experience, this is also one of the most overlooked areas in real-life systems built on Kafka.
  • Advertised listeners are omitted altogether. This is somewhat disappointing, being an intermediate concept that all self-respecting operators and developers should be well aware of. I think this is, in part, due to how Kafka used to be commonly deployed in 2017 — on virtual machines or bare metal. Perhaps having multiple listeners configured wasn’t as crucial back then, but it definitely is now. These days most developers run Kafka and ZooKeeper locally with Docker Compose.
  • Quotas are only mentioned from a monitoring standpoint. When operating shared Kafka infrastructure in a medium-large organisation, having a quota-free setup just doesn’t cut it. Effective Kafka dedicates an entire chapter to quotas and rightly so.
  • Security is covered in all of three paragraphs in The Definitive Guide. There is hardly any mention of SASL, which is the dominant broker authentication protocol. There is no mention of certificates and Access Control Lists (ACLs). Effective Kafka spends… get ready for this… 80 pages on the blasted topic. This is bonkers! It’s like having a book inside a book! And just about all of it is useful.
  • Partition assignment strategy is briefly touched on in The Definitive Guide, showing the differences between RoundRobinAssignor and RangeAssignor. Effective Kafka covers all four built-in assignors in detail and provides nice examples comparing their behaviour.

There are things that The Definitive Guide has that are not in Effective Kafka:

  • Garbage collection: surprisingly The Definitive Guide touches on GC performance tuning, which Effective Kafka omits. Admittedly, G1 has pretty much been superseded now by ZGC on large heaps, but some of the insights are still interesting and useful.
  • JMX monitoring is omitted from Effective Kafka. The Definitive Guide has a couple of paragraphs on it.

There are (very) minor differences in terminology. Consumers without an associated consumer group are referred to as standalone consumers in The Definitive Guide and free consumers in Effective Kafka. The main loop that polls records on the consumer and processes the resulting batch is called the poll-process loop in Effective Kafka and simply poll loop in The Definitive Guide.

Overall, Effective Kafka is a complete offering that is almost completely standalone, in that you don’t need to refer to external documentation, blogs, KIPs or other books. You may still use the documentation as a reference guide, but you don’t really need to as much. By comparison, The Definitive Guide offers surface-level coverage of most intermediate and some advanced topics — deferring to the official documentation for the missing detail. If there is one piece of negative feedback, the one thing I’m really missing in Effective Kafka is an index.

Comparing the two books really makes you appreciate the effort that must have gone into Effective Kafka. The Definitive Guide is still excellent, even if it is starting to show its age a bit. I think The Definitive Guide would benefit greatly from a second edition. There is a lot more detailed content in Effective Kafka that is, in my opinion, absolutely essential. It isn’t just broader than The Definitive Guide, it also eclipses the official documentation in the level of detail. According to the author, a fair amount of this was obtained by analysing Kafka’s source code, the commit history and digging through the Kafka Improvement Proposals (KIPs). I can’t even begin to fathom how long it would have taken to write. The architectural insight in Effective Kafka is timeless, being agnostic of the technology. I see Effective Kafka becoming a classic.

Had I written this article in 2019, I would have rated The Definitive Guide as the go-to book for Kafka. But Effective Kafka, along with one or two others, has completely redefined for me what a book should be. As a source of wisdom and knowledge, it’s in a league of its own. In saying that, The Definitive Guide is an amazingly good read and is immortalised on my bookshelf. I bought a hard-copy for myself even though we had several copies floating around at work. Without a doubt, I owe a debt of gratitude to the authors — Neha, Gwen and Todd — for guiding me through my journey with this amazing platform that they have poured so much energy into creating. I also have a hard-copy of Effective Kafka and I’m hoping to get both books signed one day.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Brace yourself, for the third book in the list isn’t about Kafka at all. My decision to include a non-Kafka book is, in part, due to having covered two excellent books already — a third Kafka book just felt superfluous and almost cliché. Does there really need to be three of everything? The other reason goes back to what I said about Effective Kafka: sometimes the knowledge we seek is not what we need.

Why do we need to design our applications around event-driven architecture? What other patterns and principles must we consider? How should we deal with consistency in a distributed system? How should we organise and store data? What sorts of failures should we code for? Kafka is just one element in a distributed architecture. Becoming an expert in Kafka is not enough to navigate the rough terrain of a modern software landscape.

Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems by Martin Kleppmann is another literary masterpiece that will grace many a bookshelf until the next ice age. The book addresses the three great pillars of application design: reliability, scalability and maintainability.

Based on its title, you might be forgiven in thinking that this is a “me too” Big Data book. It isn’t. Designing Data-Intensive Applications is fundamentally a book on distributed systems architecture. If there is any negative criticism, it’s that its name does not do it justice. I would have gone with Designing Reliable and Scalable Distributed Applications… In saying that, the use of the word ‘data’ in the book’s title isn’t completely lost on me; indeed, distributed systems have little value without the data element.

A minor note: although the book is not focused on Kafka, it does mention it at various points. The author is no Kafka noob — Martin was a Senior Software Engineer at LinkedIn (the original incubators of Kafka) and is one of the maintainers of Apache Samza — a MapReduce style framework for distributed processing of high-volume event streams, which is inherently intertwined with Kafka, the latter acting as Samza’s message transport.

So why this left-field entry? Sometimes you are living your life and a book comes along that splits your life into a distinct before and an after. It gives you a level of clarity that unsettles your past beliefs and crystalises new ones. I had this experience with Designing Data-Intensive Applications in 2018. And I experienced this again in 2020 with Effective Kafka. Incidentally, the authors share a similar writing style, down to grammatical choices and sentence construction. The authors make liberal use of notes, callouts and asides to bring in related content that is interesting and occasionally amusing, making the reading experience that much better. Their choice of analogies is exquisite. I chuckled at Emil’s witty comparison of 18th-century criminology and the presumption of innocence to the safety and liveness properties of distributed systems. Martin’s comparison of database schema migrations to track gauge conversion of 19th-century railways is equally amusing. I also appreciated Martin’s nostalgic references to the mainframes of the 60s and 70s, portraying bluntly just how much history repeats itself. Both texts are amazingly well articulated; no half-measures and no hints whatsoever of rushing through any of the chapters to make the publisher’s deadline. Almost subconsciously, I have also “borrowed” heavily from their writing style. And with this calibre of authors, this is nothing to be ashamed of.

Designing Data-Intensive Applications is a compendium of state-of-the-art research that Martin has very conveniently condensed into a book, albeit a fairly meaty one. The list of topics covered includes:

  • Data models (relational, key-value, graph, etc.), data structures and indexes.
  • Persistence models (B-Trees, LSMs).
  • Data encoding formats (JSON, XML, Avro, Thrift, Protobuf, etc.). It’s probably worth mentioning: Martin is one of the maintainers of Avro.
  • Replication — leader-follow architectures, lag, multi-leader replication, write conflicts and quorum replication.
  • Transactions — isolation levels, read and write skew, concurrency control mechanisms (serial, pessimistic and optimistic).
  • Faults in distributed systems — timings and clock synchronization, Byzantine failures, timeouts and asynchrony.
  • Distributed algorithms and areas such as consistency models, ordering, distributed transactions and consensus.
  • Batch processing, MapReduce and stream processing.

Looking over the list above, some of the content feels quite academic and may appear daunting at first. However, it is very hard to overstate how eloquently the author breaks complex concepts down and explains them in simpler terms.

To make things clear, this book isn’t going to make you better at Kafka — this will be taken care of by Effective Kafka. And it’s not the book for the time-poor either. You do need to take your time, as this is not the sort of book that you can skim. However, it will make you a better developer because it will change the way you think about distributed systems. In retrospect, there were so many things I took for granted before reading Martin’s book. Here are some of the less shameful admissions I will hesitantly make:

  • I thought that ACID was a relational database thing, even though I understood it well enough to pick (mostly) correct isolation levels for my needs. ACID actually applies to any system that manipulates data and is hugely important in a distributed context.
  • I only heard of consistency models in the context of eventual consistency, thinking that there is some strong form and a weak form and that’s that. Little did I know that consistency takes many forms and it’s not even an ordered spectrum — some consistency models allow different phenomena (or side effects, if you like) without necessarily being stronger or weaker.
  • I always associated distributed systems with microservices and saw them as an evolutionary step from monoliths. But I never appreciated the difficulties of dealing with failures where only parts of a system may fail or become unreachable. Just let it time out and retry… What can possibly go wrong?

The knowledge conveyed by Designing Data-Intensive Applications is timeless; it is easily one of the best technical books I have read and is destined to become a classic.

Other books on Apache Kafka?

I have read a few more books that were related to Kafka than what’s listed here, and you may be wondering whether there are others that I would recommend. The answer, unfortunately, is an emphatic no.

I don’t want to speak ill of authors because writing a book, irrespective of whether it turns out the way you or others expect, is such a massive undertaking and a considerable personal sacrifice. I’ve read my fair share of books on mainstream open-source technologies, mostly from Safari, Manning and Packt, and what I will say is this: there are two kinds of authors. First, the kind of person who is highly knowledgeable and passionate about his/her subject matter and wishes to share their knowledge. They are already at the top of their game and don’t need to write a book to prove it. The second kind is what I have called a “career author”. They may be somewhat knowledgeable on the topic still, but the real reason for their publication is to leave a calling card and to claim author status in their resume.

Now, I have nothing against career progression, as long as you aren’t swindling your audience out of money. Unfortunately, there are a couple of authors on Amazon right now that are doing just that. I won’t name them, but you can spot them by their appalling reviews. To cite one:

This book is basically a bunch of free blogs stitched and bound together. It offers no value beyond what you can get in the free docs on the Apache Kafka project page. Absolute waste of space. Would not recommend even if it were a free e-book.

Damn right! This is my sentiment exactly. The content of some of those books is absolute comedy gold. This is a real passage taken verbatim from one of the books:

You will generally need the big data technologies, such as Hadoop and Storm, to run your Hadoop and Storm clusters.

You need Hadoop to run Hadoop? Who would have thought! Anyway, before buying any book, you should check the author’s expertise and contribution to the field. Beyond the books listed above, I could not find one that was worthy of my time. But if you do come across one, be sure to drop me a line!

To summarise, you can definitely get started with Kafka by working through the official documentation and reading popular blogs. But consider this: Kafka is a comprehensive platform with a multitude of use cases and thousands of configuration options, and can be genuinely difficult to understand and use correctly — to the point that even seasoned Kafka practitioners still get it wrong. I know; I was there. Blogs are generally written by people who know a bit more than the average Joe. They are not always right. Again, I should know! The official documentation is useful as a reference guide, but hardly as a source of wisdom.

If you want to get somewhere, you need to be prepared to invest in your education. A smart person learns from their mistakes, but a truly wise person learns from the mistakes of others. I recommend you consider the books I’ve listed. It will absolutely make you a better professional.

📝 Save this story in Journal.

--

--

John L. Watson

I’m an experienced software engineer specialising in all manner of event-driven and microservices applications, polyglot stacks and performance tuning.