---
id: 6a7654d7-a9a7-4500-8d23-378dbd5d86a7
---
# High number of pending tasks in ElasticSearch.
---

This incident type refers to an alert triggered by a monitoring system indicating that the number of pending tasks in ElasticSearch is high. This can be an issue because it may indicate that the system is overloaded and unable to process all the incoming tasks, which can result in performance degradation or even downtime. The incident needs to be investigated and resolved as soon as possible to ensure the system is functioning properly.

### Parameters
```shell
# Environment Variables

export ELASTICSEARCH_NODE="PLACEHOLDER"

export DESIRED_NODE_COUNT="PLACEHOLDER"

export CLUSTER_NAME="PLACEHOLDER"

export DESIRED_SHARDS_PER_NODE="PLACEHOLDER"

export DESIRED_CONCURRENT_REBALANCE="PLACEHOLDER"

export DESIRED_CONCURRENT_RECOVERIES="PLACEHOLDER"

export ELASTICSEARCH_LOG_FILE="PLACEHOLDER"

export ELASTICSEARCH_CONFIG_FILE="PLACEHOLDER"
```

## Debug

### Check if ElasticSearch service is running
```shell
systemctl status elasticsearch.service
```

### Check ElasticSearch logs for any errors or warnings
```shell
journalctl -u elasticsearch.service
```

### Check the status of the ElasticSearch cluster
```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/health?v'
```

### Check the status of ElasticSearch nodes in the cluster
```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/nodes?v'
```

### Check the number of pending tasks in the ElasticSearch cluster
```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cluster/pending_tasks'
```

### Check the metrics for ElasticSearch
```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/indices?v'
```

### The ElasticSearch cluster may be lacking sufficient resources, such as memory or processing power, to handle the volume of tasks it is receiving.
```shell


#!/bin/bash



# Check the current memory usage of the ElasticSearch cluster.

free -h



# Check the current CPU usage of the ElasticSearch cluster.

top



# Check the ElasticSearch logs for any memory or processing related errors.

grep -i "out of memory" ${ELASTICSEARCH_LOG_FILE}



# Check the ElasticSearch configuration for any settings related to memory or processing limits.

cat ${ELASTICSEARCH_CONFIG_FILE} | grep -i "memory" | grep -i "limit"

cat ${ELASTICSEARCH_CONFIG_FILE} | grep -i "cpu" | grep -i "limit"



# Check the ElasticSearch cluster settings to ensure that it is properly optimized for the current workload.

curl -X GET "http://localhost:9200/_cluster/settings?pretty"


```

## Repair

### Define variables
```shell
ES_NODE_COUNT=${DESIRED_NODE_COUNT}
```

### Scale the ElasticSearch cluster
```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "persistent": {

    "cluster.routing.allocation.total_shards_per_node":"${DESIRED_SHARDS_PER_NODE}"

  },

  "transient": {

    "cluster.routing.allocation.enable": "all"

  }

}

'
```

### Update the elasticsearch cluster settings to change the concurrent rebalance limit.
```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "transient": {

    "cluster.routing.allocation.cluster_concurrent_rebalance": ${DESIRED_CONCURRENT_REBALANCE}

  }

}

'
```

### Update elasticsearch cluster settings to set desired number of concurrent recoveries.
```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "transient": {

    "cluster.routing.allocation.node_concurrent_recoveries": ${DESIRED_CONCURRENT_RECOVERIES}

  }

}

'
```


This incident type refers to an alert triggered by a monitoring system indicating that the number of pending tasks in ElasticSearch is high. This can be an issue because it may indicate that the system is overloaded and unable to process all the incoming tasks, which can result in performance degradation or even downtime. The incident needs to be investigated and resolved as soon as possible to ensure the system is functioning properly.


The Vault too many pending tokens incident refers to an issue where the Vault server has too many pending tokens. This can happen when the number of tokens generated exceeds the limit of what the server can handle, causing a backlog of requests. As a result, users may experience difficulty accessing certain resources or functions that require token authentication. This incident type typically requires the attention of a system administrator or engineer to investigate and resolve the underlying cause, such as increasing the token limit or optimizing the token generation process.


Vault too many pending tokens incident on kubernetes

In software engineering, a large number of suspended threads in Tomcat is an incident type that occurs when a high number of threads are created but are blocked from executing. This can happen due to a variety of reasons, such as slow database queries, network latency, or resource contention. When this occurs, the system may become unresponsive or slow, causing a degradation in performance or even a complete system failure. It is important to investigate and resolve this issue as quickly as possible to ensure the system can continue to function properly.


Tomcat Large Number of Suspended Threads Incident.

This incident type refers to a situation where Spark tasks are failing due to out of memory errors. Spark is a distributed computing system used for big data processing. When the data volume exceeds the allocated memory, the Spark tasks fail, and the system generates an out of memory error. This type of incident can cause data processing delays or even system downtime, which can impact the overall performance of the application.


Spark tasks failing due to out of memory errors.

This incident type indicates that there is a high number of blocked jobs in the Jenkins queue. This can cause delays in job execution and can be indicative of a larger issue within the system. It is important to investigate the root cause of the blocked jobs and take appropriate action to prevent this from happening in the future.


High number of blocked jobs in Jenkins queue.

This incident type typically occurs when the unsuccessful requests rate in ElasticSearch is higher than the expected threshold. This may indicate that there are issues with the ElasticSearch cluster or that it is not able to handle the requests in a timely manner. To prevent further issues, it is important to investigate the root cause and address any underlying issues that may be causing the high unsuccessful requests rate. This will help ensure that the ElasticSearch cluster is able to handle requests efficiently and effectively.


High Unsuccessful Requests Rate in ElasticSearch

```shell
# Environment Variables

export ELASTICSEARCH_NODE="PLACEHOLDER"

export DESIRED_NODE_COUNT="PLACEHOLDER"

export CLUSTER_NAME="PLACEHOLDER"

export DESIRED_SHARDS_PER_NODE="PLACEHOLDER"

export DESIRED_CONCURRENT_REBALANCE="PLACEHOLDER"

export DESIRED_CONCURRENT_RECOVERIES="PLACEHOLDER"

export ELASTICSEARCH_LOG_FILE="PLACEHOLDER"

export ELASTICSEARCH_CONFIG_FILE="PLACEHOLDER"
```


### Check if ElasticSearch service is running

```shell
systemctl status elasticsearch.service
```

### Check ElasticSearch logs for any errors or warnings

```shell
journalctl -u elasticsearch.service
```

### Check the status of the ElasticSearch cluster

```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/health?v'
```

### Check the status of ElasticSearch nodes in the cluster

```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/nodes?v'
```

### Check the number of pending tasks in the ElasticSearch cluster

```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cluster/pending_tasks'
```

### Check the metrics for ElasticSearch

```shell
curl -XGET '${ELASTICSEARCH_NODE}:9200/_cat/indices?v'
```

### The ElasticSearch cluster may be lacking sufficient resources, such as memory or processing power, to handle the volume of tasks it is receiving.

```shell


#!/bin/bash



# Check the current memory usage of the ElasticSearch cluster.

free -h



# Check the current CPU usage of the ElasticSearch cluster.

top



# Check the ElasticSearch logs for any memory or processing related errors.

grep -i "out of memory" ${ELASTICSEARCH_LOG_FILE}



# Check the ElasticSearch configuration for any settings related to memory or processing limits.

cat ${ELASTICSEARCH_CONFIG_FILE} | grep -i "memory" | grep -i "limit"

cat ${ELASTICSEARCH_CONFIG_FILE} | grep -i "cpu" | grep -i "limit"



# Check the ElasticSearch cluster settings to ensure that it is properly optimized for the current workload.

curl -X GET "http://localhost:9200/_cluster/settings?pretty"


```


### Define variables

```shell
ES_NODE_COUNT=${DESIRED_NODE_COUNT}
```

### Scale the ElasticSearch cluster

```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "persistent": {

    "cluster.routing.allocation.total_shards_per_node":"${DESIRED_SHARDS_PER_NODE}"

  },

  "transient": {

    "cluster.routing.allocation.enable": "all"

  }

}

'
```

### Update the elasticsearch cluster settings to change the concurrent rebalance limit.

```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "transient": {

    "cluster.routing.allocation.cluster_concurrent_rebalance": ${DESIRED_CONCURRENT_REBALANCE}

  }

}

'
```

### Update elasticsearch cluster settings to set desired number of concurrent recoveries.

```shell
curl -XPUT "http://localhost:9200/_cluster/settings" \

-H 'Content-Type: application/json' -d'

{

  "transient": {

    "cluster.routing.allocation.node_concurrent_recoveries": ${DESIRED_CONCURRENT_RECOVERIES}

  }

}

'
```


High number of pending tasks in ElasticSearch.

Overview

Parameters

Debug

Check if ElasticSearch service is running

Check ElasticSearch logs for any errors or warnings

Check the status of the ElasticSearch cluster

Check the status of ElasticSearch nodes in the cluster

Check the number of pending tasks in the ElasticSearch cluster

Check the metrics for ElasticSearch

The ElasticSearch cluster may be lacking sufficient resources, such as memory or processing power, to handle the volume of tasks it is receiving.

Repair

Define variables

Scale the ElasticSearch cluster

Update the elasticsearch cluster settings to change the concurrent rebalance limit.

Update elasticsearch cluster settings to set desired number of concurrent recoveries.

Learn more

Related Runbooks

Vault too many pending tokens incident on kubernetes

Tomcat Large Number of Suspended Threads Incident.

Spark tasks failing due to out of memory errors.

High number of blocked jobs in Jenkins queue.

Support