Runbook

Elasticsearch Healthy Nodes Incident on Kubernetes

Back to Runbooks

Overview

This incident type indicates an issue related to Elasticsearch nodes. Specifically, it indicates that one or more nodes in the Elasticsearch cluster are not healthy, which could cause performance issues or data loss. The incident may be triggered automatically by monitoring software or manually by a team member. It typically requires immediate attention to resolve the underlying issue and restore Elasticsearch nodes to a healthy state.

Parameters

Debug

1. Get the list of Elasticsearch cluster pods

2. Check the status of the Elasticsearch cluster pods

3. Check the Elasticsearch cluster health status

4. Check the Elasticsearch cluster node status

Elasticsearch cluster is experiencing high CPU or memory usage.

One or more Elasticsearch nodes are down or unresponsive.

Repair

If there is a missing data node in the Elasticsearch cluster, add a new node or replace the missing one.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.