Runbook

Kafka Broker Low ISR (In-Sync Replica) Count Incident

Back to Runbooks

Overview

In a Kafka cluster, each topic partition is replicated across multiple brokers to ensure high availability and fault tolerance. The In-Sync Replica (ISR) is the set of replicas that are fully caught up with the leader replica and in sync with the latest data. When the ISR count is low, it means that some replicas are lagging behind and are not up-to-date with the latest data. This can lead to data loss, inconsistent reads and writes, and eventual data corruption. Therefore, monitoring and maintaining a healthy ISR count is crucial for a stable Kafka cluster.

Parameters

Debug

Check the status of the Kafka broker

Check the log file of the Kafka broker for errors

Check the ISR count of the Kafka topic

Check the replication factor of the Kafka topic

Check the ISR count of the Kafka partition

Check the preferred replica count of the Kafka partition

Repair

Increase the replication factor of the affected topic(s) in Kafka to ensure that there are enough replicas to maintain the minimum ISR count. This will ensure that the ISR count does not drop below the minimum threshold.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.