Runbook

Network Saturation Check for Kafka Nodes

Back to Runbooks

Overview

This incident type refers to an issue where the network bandwidth on one or more Kafka nodes becomes fully saturated, causing the node(s) to experience performance degradation or even failure. This can result in message delivery delays or loss, and can also impact other nodes that rely on the affected node(s) for message replication. The saturation can be caused by a variety of factors, such as increased message traffic, misconfiguration, or hardware issues.

Parameters

Debug

Check network interface statistics

Check network connections

Check network bandwidth usage for each process

Check network latency between nodes

Check network throughput between nodes

Check Kafka node status

Check Kafka logs

Check Kafka node configuration

Repair

Optimize network configuration: This could involve tuning network settings on the nodes to better handle the traffic or changing the network topology to reduce congestion.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.