Apache Kafka pitfalls. Kafka vs RabbitMQ

In this Post we’ll outline Apache Kafka pitfalls, we’ll compare Kafka vs RabbitMQ as pretty popular message brokers for event-drivern architecture.
We’ll also outline key points to take into account, when considering if you really need to introduce Apache Kafka on your project, which you will hardly find in an introductory training or some how-to. So let’s go..

Common drivers for introducing Apache Kafka

Before digging to pitfalls, let’s glance on key drivers for introducing Apache Kafka on a project. Later we’ll reconsider, whether Kafka is really the good choice.
Many companies (Architects, CTOs etc) introduce Kafka on their projects (platforms) primarily due:
1. high throughput
2. outstanding scaling capabilities.
3. frequently event-driven architecture.
Apache Kafka was initially developed as a LinkedIn’s inner tool for users’ activity tracking on their website. Right, LinkedIn do have millions of users, and hence it’s really the challenge to address by means of Apache Kafka.
And now ask yourself : do you (your company) really need (foresee) that sort of throughput in the middle-term perspective ?
Another good practice is to not overcomplicate things. Imagine, if we decided to use event-driven architecture and SAGAs bor building simple application like this blog, it would look like this:

Reaction to Event-driven Architecture for simple application
Reaction to Event-Driven Architecture for simple application, like a WordPress blog 🙂

Apache Kafka Use-Cases:

  1. stream data processing with huge throughput
  2. platform (application) Audit
  3. producing and further analysis of platform (application) System events
  4. event sourcing or event-driven architecture.
    However, Rabbit MQ or another message middleware could be a good counter-part here

Apache Kafka vs Rabbit MQ (as a very popular open-source message broker) comparison

Thus let’s compare Apache Kafka’s and Rabbit MQ’s key features and let’s highlight some numbers.

Charecheristic / FeatureRabbit MQApache Kafka
ThroughputUp to ~ 10 000 msg per secUp to ~ 1 mln msg per sec
Payload sizeNo restrictionsUp to 1 MB
TopologyOpen-source message broker, which uses Queue topology in FIFO mannerDurable message broker, where continuous messages stay in the queue until the retention time expires
ScalingDepends on whether you use own on-premise infrastructure or a cloud providerSimilarly to RabbitMq, depends on deployment model
Apache Kafka vs Rabbit MQ comparison

As to scaling, you don’t have to worry about it, if you deploy your services in a Cloud. As all Cloud providers allow auto-scaling of all their managed services.

Apache Kafka pitfalls

Now, if you’re still confident you need Apache Kafka in your company, or if it’s already there, let’s outline some of Kafka‘s pitfalls:

  • Strive to use Kafka as a messaging system for processing of real-time data, in a high-load or event-driven architecture
  • Most probably, you will need to think of a Kafka managed service provider, such as e.g. Confluent or Aiven (due clustering, replication, applying patches and security updates). Confluent and Aiven used to be the main competitors as per 2023.
    Both: Confluent and Aiven are pretty dear SaaS service providers. Btw, you can pass absolutely free highly recognized Confluent certification for Apache Kafka
    Alternatively, you can certainly consider Kafka as a managed service (exposed by a cloud provider), if you already use some cloud.
  • For to leverage Teams’ and components integration, at best you will need to introduce Kafka schemas compatibility checks
  • Kafka Streams is a technology that performs a lot of things under the hood. Things, that happen automatically under the hood, are most probably out of your control and even worse, they frequently appear out of development teams’ understanding.
  • Strive to avoid using KSQL for Events transformations, which can result into implicit topics creation (unless you’re Kafka expert and already know potential implications).
    Some Kafka Streams operations, such as e.g. “join” or “selectKey”, are stateful or change message key and result into automatic internal topics creation under the hood.
    Such automatically generated Topic names usually contain sequential numbers, that are determined by sequential numbering of “atomic” Kafka Streams operations. If the sequence of the operation would change (e.g. imagine a use-case, when topology gets modified because of new business requirements etc), the numbering of internal topics would change too.
    The consequence of this is that an application would need to create new internal topics with updated numbers. While this may appear acceptable in some situations (if it’s fine to reset the state of stateful operations), it may appear very painful in other cases (imagine situation if you decide to disallow automatic Kafka Topics’ creation during your evolution of Kafka usage).

Summary

Hence if you need event-driven architectural style, or event sourcing, consider Rabbit MQ as Kafka alternative due lower pricing and less complexity.
If you foresee huge thropought (tens / hundreds thousands of messages), Kafka is the right choice.

Hope, the post will appear useful as it brings to surface some essential points, and should appear useful for making decision if you really neeed Kafka.