Runbook

Kafka Replication Factor Violation.

Back to Runbooks

Overview

This incident type refers to a situation where the replication factor of a Kafka cluster falls below the required threshold. Replication factor determines how many copies of a message are stored across the Kafka cluster to ensure data durability and fault tolerance. When the replication factor falls below the required level, it can lead to data loss, inconsistencies, and other issues. This incident requires immediate attention and resolution to prevent any further damage.

Parameters

Debug

Check the replication factor of a topic

Check the ISR (In-Sync Replicas) of all brokers for a topic

Check the status of all brokers in the Kafka cluster

Check the leader status of a partition for a topic

Check the number of partitions for a topic

Repair

Increase replication factor: The first step in resolving this incident is to increase the replication factor of the Kafka cluster to the required level. This can be done by adding new brokers to the cluster or reconfiguring the existing brokers to handle more replicas.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.