Event streaming platform used to broadcast event between systems.
Used for:
Kafka provides at-least-once semantics in most cases.
Not a queue
FIFO
with a single producer
and single consumer
group, but lacks features to be a dedicated queue
Transport for RPC
calls
Kafka
fire and forgets - should not care about the results of consumptions of eventsTopics
Producers
and Consumers
exchange information on a named topic
.
Topics
are always multi-producer
and multi-consumer
.
topic
and consume the same events in multiple different applications.Partitions
Each topic
is divided into Partitions
.
partition
is a separate bucket of events within the topic
.Each partition
can be located on a different broker
node, making this important for the performance of the Kafka cluster.
Each message
is produced to a single partition
.
Consumers
must consume all partitions
, this is automatically handled via a ConsumerGroup
.Order of events is guaranteed within a partition
not within a topic
.
partition key
, such as the account_id
, so that message ordering is preserved within an account.Producer => Brokers => All Partitions (to ConsumerGroup)
Producers
connect to Kafka Brokers
to publish their messages.
Each message
consists of a Key and a Value field.
Value
field is the payload
of the event
, the Key is used to allocate the event
to a particular partition
.partitions
of the topic
are spread across the broker
nodes allowing for spreading the load of a single topic
across multiple brokers
.Consumption
is typically done via a ConsumerGroup
.
The group can contain 1 or more consumers
and these can be across different hosts
, allowing for parallel processing.
brokers
ensure that each partition
is assigned to a single consumer
in the group.The maximum parallelism is limited to the number of partitions
in the topic
.
consumer group
has a unique id, so all members
of that group would register with the same id
and the partitions
of a topic
would be divided among them.In order to ensure all events
for a particular account are produced to a single partition
, and thus retain ordering, one needs a deterministic way to go from some_id
→ partition #
.
producers
follow the same convention.Key is set to some_id
.
Partition = murmur3(some_id) % numPartitions
murmur3
is a common non-cryptographic hashing function available in virtually any language, so it makes a good choice as a convention.
When producing to Kafka, one way to reduce the possibility of duplicates is to enable idempotence on the producer.
Producer will do extra bookkeeping, such as including a sequence number, to ensure events are not unintentionally published twice.
Reference:
Kafka Topics have two important settings which, when misconfigured, can cause the topics to become unusable.
The first is replication factor
.
The second is the minimum number of in sync replicas
.
If replicationFactor
< min in sync replicas
, the cluster will no longer accept publishing to that topic.
The solution is to increase the replication factor
or decrease the number of in-sync replicas
.