Runbook

Dead Kafka Node Detection.

Back to Runbooks

Overview

Dead Kafka Node Detection incident is a type of incident that occurs when a node of the Kafka cluster goes down or fails. Kafka is an open-source distributed event streaming platform used by software engineers to build real-time streaming data pipelines and applications. When a node of the Kafka cluster goes down, it can cause data loss, message duplication, and various other issues. Therefore, it's essential to detect and resolve dead Kafka nodes as quickly as possible to minimize the impact on the system's performance and data integrity.

Parameters

Debug

Check if the Kafka service is running

Check if the Kafka process is running

Check if the Kafka node is reachable from other nodes in the cluster

Check if the Kafka node is able to communicate with ZooKeeper

Check if the Kafka node is able to access the Kafka data directory

Check if there are any Kafka logs indicating a potential issue with the node

Repair

Replace the failed node: If the node is found to be defective, it should be replaced with a new one.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.