The incident type "Slow Etcd GRPC Requests" is triggered when the performance of Etcd GRPC requests slows down. This may be due to HTTP requests slowing down, and the 99th percentile exceeding 0.15s. This incident can affect the overall performance of the system, and needs to be resolved promptly.
Parameters
Debug
Check the current CPU usage
Check the current memory usage
Check the network connections to the affected instance
Check the network traffic on the affected network interface
Check the etcd logs for errors or warnings
Check the GRPC logs for errors or warnings
Check the Prometheus metrics for etcd and GRPC
Check the etcd cluster health
Check the etcd member list
Check the etcd endpoint status
Check the etcd key space size
Check the etcd watch counts
Check the etcd snapshot size
Resource contention on the etcd cluster.
Repair
Increase the number of etcd nodes to improve the capacity of the cluster.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.