Runbook
Kafka Disk IO Latency Spike Incident
Back to Runbooks
Overview
This incident type refers to a sudden increase in disk input/output (I/O) latency in the Kafka message broker, which can lead to degraded performance, slow processing of messages, and potentially impact system availability. This may be caused by a variety of factors, such as hardware failure, network issues, or software bugs. It is important to quickly identify and resolve this issue to prevent disruption to the system and ensure smooth operation of the Kafka message broker.
Parameters
Debug
Check the current CPU usage
Check if Kafka is running
Check the current disk usage
Check the disk I/O utilization
Check the disk I/O wait time
Check the current network usage
Check the Kafka logs for any errors or warnings
Repair
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.