---
id: a709ac57-c9c8-42db-ba7c-cd1fe5a5382f
---

# High CPU Usage on Kubernetes DNS Pods
---

This incident type involves a high CPU usage on Kubernetes DNS pods in a test environment. It typically occurs when the average CPU usage exceeds a certain threshold, as indicated by a query alert monitor. This can impact the performance and stability of the Kubernetes cluster and may require investigation and remediation to prevent further issues.

### Parameters
```shell
# Environment Variables

export NAMESPACE="PLACEHOLDER"

export SELECTOR="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export INCIDENT_KEYWORD="PLACEHOLDER"

export CONTEXT_NAME="PLACEHOLDER"

export CPU_THRESHOLD="PLACEHOLDER"

export MEMORY_THRESHOLD="PLACEHOLDER"
```

## Debug

### 1. Check the CPU usage of Kubernetes DNS pods
```shell
kubectl top pods --namespace ${NAMESPACE} --selector ${SELECTOR} # replace ${NAMESPACE} and ${SELECTOR} with appropriate values
```

### 2. Check the resource limits and requests of Kubernetes DNS pods
```shell
kubectl describe pods --namespace ${NAMESPACE} --selector ${SELECTOR} | grep -A 5 "Limits\|Requests" # replace ${NAMESPACE} and ${SELECTOR} with appropriate values
```

### 3. Check the status of the Kubernetes cluster
```shell
kubectl get nodes # check if any nodes are in NotReady state
```

### 4. Check the logs of the Kubernetes DNS pods
```shell
kubectl logs ${POD_NAME} --namespace ${NAMESPACE} # replace ${POD_NAME} and ${NAMESPACE} with appropriate values
```

### 5. Check the status of the Kubernetes DNS service
```shell
kubectl describe services coredns --namespace ${NAMESPACE} # replace ${NAMESPACE} with appropriate value
```

### 6. Check the CPU usage of the node(s) hosting the Kubernetes DNS pods
```shell
kubectl top nodes # identify the node(s) hosting the pods and check their CPU usage
```

### 7. Check the Kubernetes events related to the incident
```shell
kubectl get events --namespace ${NAMESPACE} --sort-by='.metadata.creationTimestamp' | grep -i ${INCIDENT_KEYWORD} # replace ${NAMESPACE} and ${INCIDENT_KEYWORD} with appropriate values
```

## Repair

### Optimize the configuration of the DNS pods to reduce their resource consumption and improve efficiency, such as adjusting resource requests and limits, or using a more lightweight DNS solution.
```shell
bash

#!/bin/bash

# Set the Kubernetes context and namespace

kubectl config use-context ${CONTEXT_NAME}

kubectl config set-context $(kubectl config current-context) --namespace=${NAMESPACE}

# Update the resource requests and limits for the DNS pods

kubectl patch deployment coredns --patch '{"spec": {"template": {"spec": {"containers": [{"name": "kubedns", "resources": {"requests": {"cpu": "50m", "memory": "100Mi"}, "limits": {"cpu": "100m", "memory": "200Mi"}}}]}}}}'

# Alternatively, replace kube-dns with the name of the DNS pod deployment and adjust the resource requests and limits as needed

# Restart the DNS pods to apply the changes

kubectl rollout restart deployment kube-dns


```

This incident type involves a high CPU usage on Kubernetes DNS pods in a test environment. It typically occurs when the average CPU usage exceeds a certain threshold, as indicated by a query alert monitor. This can impact the performance and stability of the Kubernetes cluster and may require investigation and remediation to prevent further issues.


This incident type occurs when the number of 5xx errors on Traffic Server is higher than usual. This can indicate issues with server performance or connectivity problems. It requires investigation to identify the root cause and resolve the issue.


High 5xx Errors on Traffic Server

This incident type refers to an increase in the number of errors per second on a Tomcat server, which could indicate an issue with the server itself, the host, a deployed application, or an application servlet. This could include errors generated when the Tomcat server runs out of memory, can't find a requested file or servlet, or is unable to serve a JSP due to syntax errors in the servlet codebase. This incident type requires immediate attention to diagnose and address the underlying issue.


Increase of the errors/second rate for Tomcat server

This incident type is triggered when the percentage of busy threads on a Tomcat host exceeds a certain threshold. This can indicate that the host is experiencing high traffic or that there is a problem with the configuration of the Tomcat server. The incident is resolved when the issue is addressed and the percentage of busy threads returns to a normal level.


High % of Busy Threads on Tomcat Host (TEST)

The Istio High Request Latency incident type refers to an issue where the average execution time for requests in Istio is longer than the expected threshold of 100ms. This may result in slow or unresponsive services, negatively impacting the user experience. It may require investigation and troubleshooting to identify the underlying cause and resolve the issue.


Istio High Request Latency

A Kubernetes HPA (Horizontal Pod Autoscaler) Status Incident refers to an issue where the autoscaling feature of Kubernetes, which automatically scales the number of pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization, is not functioning as expected. This can result in insufficient resources being provisioned to handle incoming load and potentially lead to service disruptions.


Kubernetes HPA Status Incident

```shell
# Environment Variables

export NAMESPACE="PLACEHOLDER"

export SELECTOR="PLACEHOLDER"

export POD_NAME="PLACEHOLDER"

export INCIDENT_KEYWORD="PLACEHOLDER"

export CONTEXT_NAME="PLACEHOLDER"

export CPU_THRESHOLD="PLACEHOLDER"

export MEMORY_THRESHOLD="PLACEHOLDER"
```


### 1. Check the CPU usage of Kubernetes DNS pods

```shell
kubectl top pods --namespace ${NAMESPACE} --selector ${SELECTOR} # replace ${NAMESPACE} and ${SELECTOR} with appropriate values
```

### 2. Check the resource limits and requests of Kubernetes DNS pods

```shell
kubectl describe pods --namespace ${NAMESPACE} --selector ${SELECTOR} | grep -A 5 "Limits\|Requests" # replace ${NAMESPACE} and ${SELECTOR} with appropriate values
```

### 3. Check the status of the Kubernetes cluster

```shell
kubectl get nodes # check if any nodes are in NotReady state
```

### 4. Check the logs of the Kubernetes DNS pods

```shell
kubectl logs ${POD_NAME} --namespace ${NAMESPACE} # replace ${POD_NAME} and ${NAMESPACE} with appropriate values
```

### 5. Check the status of the Kubernetes DNS service

```shell
kubectl describe services coredns --namespace ${NAMESPACE} # replace ${NAMESPACE} with appropriate value
```

### 6. Check the CPU usage of the node(s) hosting the Kubernetes DNS pods

```shell
kubectl top nodes # identify the node(s) hosting the pods and check their CPU usage
```

### 7. Check the Kubernetes events related to the incident

```shell
kubectl get events --namespace ${NAMESPACE} --sort-by='.metadata.creationTimestamp' | grep -i ${INCIDENT_KEYWORD} # replace ${NAMESPACE} and ${INCIDENT_KEYWORD} with appropriate values
```


### Optimize the configuration of the DNS pods to reduce their resource consumption and improve efficiency, such as adjusting resource requests and limits, or using a more lightweight DNS solution.

```shell
bash

#!/bin/bash

# Set the Kubernetes context and namespace

kubectl config use-context ${CONTEXT_NAME}

kubectl config set-context $(kubectl config current-context) --namespace=${NAMESPACE}

# Update the resource requests and limits for the DNS pods

kubectl patch deployment coredns --patch '{"spec": {"template": {"spec": {"containers": [{"name": "kubedns", "resources": {"requests": {"cpu": "50m", "memory": "100Mi"}, "limits": {"cpu": "100m", "memory": "200Mi"}}}]}}}}'

# Alternatively, replace kube-dns with the name of the DNS pod deployment and adjust the resource requests and limits as needed

# Restart the DNS pods to apply the changes

kubectl rollout restart deployment kube-dns


```


High CPU Usage on Kubernetes DNS Pods

Overview

Parameters

Debug

1. Check the CPU usage of Kubernetes DNS pods

2. Check the resource limits and requests of Kubernetes DNS pods

3. Check the status of the Kubernetes cluster

4. Check the logs of the Kubernetes DNS pods

5. Check the status of the Kubernetes DNS service

6. Check the CPU usage of the node(s) hosting the Kubernetes DNS pods

Repair

Optimize the configuration of the DNS pods to reduce their resource consumption and improve efficiency, such as adjusting resource requests and limits, or using a more lightweight DNS solution.

Learn more

Related Runbooks

High 5xx Errors on Traffic Server

Increase of the errors/second rate for Tomcat server

High % of Busy Threads on Tomcat Host (TEST)

Istio High Request Latency

Support