Runbook

Etcd high fsync durations incident.

Back to Runbooks

Overview

The Etcd high fsync durations incident occurs when the fsync duration of the Etcd service exceeds a certain threshold. This can be caused by various factors, such as high load, network issues, or other system errors. When this incident occurs, it can impact the performance and availability of the Etcd service, and it requires immediate attention from the responsible team to diagnose and resolve the underlying issue.

Parameters

Debug

Check if Etcd service is running

Check Etcd log for any errors or warnings

Check Etcd metrics for fsync duration

Check system load and resource usage

Check network connectivity and latency to Etcd server

Check disk I/O performance

High load: If the Etcd service is experiencing high traffic or a sudden spike in requests, it can cause the fsync duration to increase, leading to this incident.

Repair

Optimize the Etcd configuration settings, such as the WAL sync interval and the number of concurrent compacting processes, to improve the system performance and reduce the fsync duration.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.