Runbook

Etcd insufficient Members incident.

Back to Runbooks

Overview

The Etcd insufficient Members incident type refers to an issue where the Etcd cluster has an insufficient number of members. Etcd is a distributed key-value store used for shared configuration and service discovery. In order to maintain high availability and fault tolerance, the cluster should have an odd number of members. When the number of members falls below the minimum required, it can result in service outages and other disruptions. This incident type requires immediate attention to restore the service to normal operation.

Parameters

Debug

Check the status of the Etcd cluster

View the Etcd log to check for any errors or warnings

Check the number of Etcd members

Check the health of the Etcd cluster

Check the amount of available disk space on the Etcd nodes

Check the memory usage on the Etcd nodes

Check the CPU usage on the Etcd nodes

Verify that the Etcd configuration has an odd number of members

Insufficient resources like CPU, memory, or disk space on one or more Etcd members can cause them to become unresponsive or crash, leading to a reduced cluster size.

Repair

Increase the number of Etcd members: In order to maintain the required odd number of members, additional Etcd members can be added to the cluster. This can be done manually or through automation.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.