The Etcd insufficient Members incident type refers to an issue where the Etcd cluster has an insufficient number of members. Etcd is a distributed key-value store used for shared configuration and service discovery. In order to maintain high availability and fault tolerance, the cluster should have an odd number of members. When the number of members falls below the minimum required, it can result in service outages and other disruptions. This incident type requires immediate attention to restore the service to normal operation.
Parameters
Debug
Check the status of the Etcd cluster
View the Etcd log to check for any errors or warnings
Check the number of Etcd members
Check the health of the Etcd cluster
Check the amount of available disk space on the Etcd nodes
Check the memory usage on the Etcd nodes
Check the CPU usage on the Etcd nodes
Verify that the Etcd configuration has an odd number of members
Insufficient resources like CPU, memory, or disk space on one or more Etcd members can cause them to become unresponsive or crash, leading to a reduced cluster size.
Repair
Increase the number of Etcd members: In order to maintain the required odd number of members, additional Etcd members can be added to the cluster. This can be done manually or through automation.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.