---
id: f7890ff0-ef42-4b86-90c3-8615432bfcf7
---

# Kubernetes - Pod count per node high
---

This incident type is related to high pod count per node in a Kubernetes cluster. This can happen due to various reasons such as misconfigurations, resource constraints, or issues with the application itself. The incident can cause service disruptions or outages if not addressed in a timely manner. It requires investigation and resolution by the DevOps team to ensure proper functioning of the Kubernetes cluster and the applications running on it.

### Parameters
```shell
# Environment Variables

export NAMESPACE="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export NODE_NAME="PLACEHOLDER"

export DEPLOYMENT_NAME="PLACEHOLDER"

```

## Debug

### List all nodes in the cluster
```shell
kubectl get nodes
```

### Check the pod count per node
```shell
kubectl get nodes -o json | jq '.items[] | {name:.metadata.name} + {pods:.status.capacity.pods}'
```

### Check the status of the pods
```shell
kubectl get pods -n ${NAMESPACE}
```

### Check the logs of a pod
```shell
kubectl logs ${POD_NAME} -n ${NAMESPACE}
```

### Check the metrics for the node
```shell
kubectl top node ${NODE_NAME}
```

## Repair

### Define variables
```shell
NODE_SELECTOR="PLACEHOLDER"

POD_SELECTOR="PLACEHOLDER"

DESIRED_REPLICAS="PLACEHOLDER"
```
### Check the deployment status
```shell
kubectl rollout status deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE}
```

### Scale down the pods
```shell
kubectl scale deployment --replicas=$DESIRED_REPLICAS -n ${NAMESPACE} -l $POD_SELECTOR
```

### Wait for pods to terminate
```shell
while [[ $(kubectl get pods -n ${NAMESPACE} -l $POD_SELECTOR -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') == "True True" ]]; do sleep 1; done
```

### Identify and terminate any unnecessary or redundant pods running on the nodes.
```shell
#!/bin/bash

# Set the namespace and node name to check

NAMESPACE=${NAMESPACE}

NODE_NAME=${NODE_NAME}

NECESSARY_LABEL="PLACEHOLDER"

# Get a list of running pods on the specified node

PODS=$(kubectl get pods -n $NAMESPACE -o wide --field-selector spec.nodeName=$NODE_NAME --no-headers | awk '{ print $1 }')

# Loop through each pod and check if it is necessary

for POD_NAME in $PODS

do
  # Get the labels for the pod

  LABELS=$(kubectl get pod -n $NAMESPACE $POD_NAME -o jsonpath='{.metadata.labels}')

  # Check if the pod is necessary based on its labels

  if [[ "$LABELS" != *"${NECESSARY_LABEL}"* ]]; then

    echo "Terminating unnecessary pod $POD_NAME"

    kubectl delete pod -n $NAMESPACE $POD_NAME

  fi

done


```


This incident type is related to high pod count per node in a Kubernetes cluster. This can happen due to various reasons such as misconfigurations, resource constraints, or issues with the application itself. The incident can cause service disruptions or outages if not addressed in a timely manner. It requires investigation and resolution by the DevOps team to ensure proper functioning of the Kubernetes cluster and the applications running on it.


The Vault cluster health incident is related to the health of a Vault cluster instance. This incident type is triggered when the cluster instance is not healthy and requires attention to ensure it is functioning properly. The incident typically involves evaluating the current state of the cluster instance, diagnosing the issue, and taking corrective action to restore the health of the instance.


Vault cluster health incident on kubernetes

Nodes with PID Pressure in Kubernetes is an incident type that occurs when a Kubernetes cluster node experiences PID pressure, meaning that it may not be able to start more containers. This is a rare condition where a pod or container spawns too many processes and starves the node of available process IDs. Each node has a limited number of process IDs to distribute amongst running processes; and if it runs out of IDs, no other processes can be started. Kubernetes lets you set PID thresholds for pods to limit their ability to perform runaway process-spawning, and a PID pressure condition means that one or more pods are using up their allocated PIDs and need to be examined.


Nodes with PID Pressure in Kubernetes

This incident type involves monitoring the replicas of a Kubernetes Statefulset, which is a type of workload in Kubernetes used for stateful applications. The incident is triggered when more than one replica's pods are down, creating an unsafe situation for manual operations. This incident is critical and requires immediate attention to resolve the issue and ensure the smooth functioning of the stateful applications.


Kubernetes Statefulset Replicas Monitoring Incident

A Kubernetes Pod Restarting Monitoring incident is triggered when a pod running on a Kubernetes cluster restarts multiple times within a certain time frame. This incident type is usually used to detect issues with the application or infrastructure running on the cluster, and can be caused by various factors such as resource constraints, misconfigurations, or bugs in the application code. The incident is typically resolved by identifying and addressing the underlying cause of the pod restarts.


Kubernetes Pod Restarting Monitoring

A Kubernetes Pod ImagePullBackOff incident occurs when a pod in a Kubernetes cluster is unable to pull its container image. This can happen due to various reasons, such as incorrect image path or tag, or misconfigured image pulling credentials. This incident can cause the pod to fail to start and impact the availability of the application running in the pod. It requires investigation and resolution to ensure the pod can pull its container image and restart successfully.


Kubernetes Pod ImagePullBackOff Incident

```shell
# Environment Variables

export NAMESPACE="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export NODE_NAME="PLACEHOLDER"

export DEPLOYMENT_NAME="PLACEHOLDER"

```


### List all nodes in the cluster

```shell
kubectl get nodes
```

### Check the pod count per node

```shell
kubectl get nodes -o json | jq '.items[] | {name:.metadata.name} + {pods:.status.capacity.pods}'
```

### Check the status of the pods

```shell
kubectl get pods -n ${NAMESPACE}
```

### Check the logs of a pod

```shell
kubectl logs ${POD_NAME} -n ${NAMESPACE}
```

### Check the metrics for the node

```shell
kubectl top node ${NODE_NAME}
```


### Define variables

```shell
NODE_SELECTOR="PLACEHOLDER"

POD_SELECTOR="PLACEHOLDER"

DESIRED_REPLICAS="PLACEHOLDER"
```

### Check the deployment status

```shell
kubectl rollout status deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE}
```

### Scale down the pods

```shell
kubectl scale deployment --replicas=$DESIRED_REPLICAS -n ${NAMESPACE} -l $POD_SELECTOR
```

### Wait for pods to terminate

```shell
while [[ $(kubectl get pods -n ${NAMESPACE} -l $POD_SELECTOR -o 'jsonpath={..status.conditions[?(@.type=="Ready")].status}') == "True True" ]]; do sleep 1; done
```

### Identify and terminate any unnecessary or redundant pods running on the nodes.

```shell
#!/bin/bash

# Set the namespace and node name to check

NAMESPACE=${NAMESPACE}

NODE_NAME=${NODE_NAME}

NECESSARY_LABEL="PLACEHOLDER"

# Get a list of running pods on the specified node

PODS=$(kubectl get pods -n $NAMESPACE -o wide --field-selector spec.nodeName=$NODE_NAME --no-headers | awk '{ print $1 }')

# Loop through each pod and check if it is necessary

for POD_NAME in $PODS

do
  # Get the labels for the pod

  LABELS=$(kubectl get pod -n $NAMESPACE $POD_NAME -o jsonpath='{.metadata.labels}')

  # Check if the pod is necessary based on its labels

  if [[ "$LABELS" != *"${NECESSARY_LABEL}"* ]]; then

    echo "Terminating unnecessary pod $POD_NAME"

    kubectl delete pod -n $NAMESPACE $POD_NAME

  fi

done


```


Kubernetes - Pod count per node high

Overview

Parameters

Debug

List all nodes in the cluster

Check the pod count per node

Check the status of the pods

Check the logs of a pod

Check the metrics for the node

Repair

Define variables

Check the deployment status

Scale down the pods

Wait for pods to terminate

Identify and terminate any unnecessary or redundant pods running on the nodes.

Learn more

Related Runbooks

Vault cluster health incident on kubernetes

Nodes with PID Pressure in Kubernetes

Kubernetes Statefulset Replicas Monitoring Incident

Kubernetes Pod Restarting Monitoring

Support