Runbook

Slow Etcd GRPC Requests

Back to Runbooks

Overview

The incident type "Slow Etcd GRPC Requests" is triggered when the performance of Etcd GRPC requests slows down. This may be due to HTTP requests slowing down, and the 99th percentile exceeding 0.15s. This incident can affect the overall performance of the system, and needs to be resolved promptly.

Parameters

Debug

Check the current CPU usage

Check the current memory usage

Check the network connections to the affected instance

Check the network traffic on the affected network interface

Check the etcd logs for errors or warnings

Check the GRPC logs for errors or warnings

Check the Prometheus metrics for etcd and GRPC

Check the etcd cluster health

Check the etcd member list

Check the etcd endpoint status

Check the etcd key space size

Check the etcd watch counts

Check the etcd snapshot size

Resource contention on the etcd cluster.

Repair

Increase the number of etcd nodes to improve the capacity of the cluster.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.