---
id: 06af2f32-e5ab-4933-8698-c6d29639dc8d
---
# High Request Duration on CoreDNS
---

This incident type refers to an alert triggered due to high request duration on CoreDNS, which is a DNS server that translates domain names to IP addresses. The alert indicates that the request duration process for CoreDNS is high, meaning that it is taking longer than expected to process DNS requests. This can cause delays or failures in the resolution of domain names, leading to potential service disruptions or outages.

### Parameters
```shell
# Environment Variables

export POD_NAME="PLACEHOLDER"

export SERVICE_NAME="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"

export DEPLOYMENT_NAME="PLACEHOLDER"

export CPU_LIMIT="PLACEHOLDER"

export MEMORY_LIMIT="PLACEHOLDER"

export CPU_REQUEST="PLACEHOLDER"

export MEMORY_REQUEST="PLACEHOLDER"
```

## Debug

### Get the name(s) of the pod(s) running CoreDNS
```shell
kubectl get pods -l k8s-app=kube-dns
```

### Check the logs of the CoreDNS pod(s) for errors
```shell
kubectl logs ${POD_NAME} -n kube-system
```

### Check the CPU and memory usage of the CoreDNS pod(s)
```shell
kubectl top pods -n kube-system | grep ${POD_NAME}
```

### Check the Kubernetes events related to the CoreDNS pod(s)
```shell
kubectl get events --sort-by=.metadata.creationTimestamp | grep ${POD_NAME}
```

### Check the status of the container(s) in the CoreDNS pod(s)
```shell
kubectl describe pod ${POD_NAME} -n kube-system | grep -A 2 -B 2 ContainerStatuses:
```

### Check the network latency between the CoreDNS pod(s) and other pods/services
```shell
kubectl exec ${POD_NAME} -n kube-system -- nslookup ${SERVICE_NAME}
```

### Check the Kubernetes services and endpoints related to CoreDNS
```shell
kubectl get svc,endpoints -n kube-system | grep kube-dns
```

## Repair

### Scale the CoreDNS deployment to handle the increased load.
```shell
kubectl scale deployment $DEPLOYMENT_NAME --replicas=$REPLICAS -n $NAMESPACE
```

### Update resources limits for CoreDNS deployment to handle the increased load.
```shell
kubectl patch deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE} --type=json -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources", "value": {"limits": {"cpu": "${CPU_LIMIT}", "memory": "${MEMORY_LIMIT}"}, "requests": {"cpu": "${CPU_REQUEST}", "memory": "${MEMORY_REQUEST}"}}}]'
```

### Increase the resources allocated to the affected system to handle the increased load.
```shell
kubectl set resources deployment <deployment-name> --limits=<resource-limits>
```


This incident type refers to an alert triggered due to high request duration on CoreDNS, which is a DNS server that translates domain names to IP addresses. The alert indicates that the request duration process for CoreDNS is high, meaning that it is taking longer than expected to process DNS requests. This can cause delays or failures in the resolution of domain names, leading to potential service disruptions or outages.


This incident type occurs when the number of 4xx errors on Traffic Server is at an anomalous level, higher than usual. It could be an indicator of an issue with the server or an increase in traffic that is causing errors. It requires investigation and resolution to ensure the server is functioning correctly and not impacting users.


High 4xx Errors on Traffic Server

This incident type typically refers to a situation where there is an anomaly in the Tomcat max processing time on a host. The maximum amount of time it takes for the server to process one request, from the time an available thread starts processing the request to the time it returns a response, has exceeded the expected threshold. This could indicate that a JSP page isn’t loading or an associated process (such as a database query) is taking too long to complete.


Anomalous max processing time for Tomcat host

This incident type indicates that there is a high latency issue in the execution of a Spark job. Spark is a distributed computing framework that is used for processing large datasets. High latency in this context means that the time taken to execute the Spark job is significantly longer than expected or normal. This can result in delays in processing data and can impact the performance of the application or system that is utilizing Spark.


High Latency Incident for Spark Job Execution.

This incident type refers to a scenario where there is a high rate of statement timeouts in a Postgresql database instance. This can lead to degraded performance and potentially impact the availability of the database. It is important to quickly identify and address the underlying cause of the timeouts to ensure the stability of the system.


Postgresql high rate statement timeout incident.

The "High error rate on NGINX" incident type refers to a situation where the error rate on the NGINX server is above 1% for the last 5 minutes. This can result in degraded performance or downtime of the affected service, impacting user experience and potentially leading to lost revenue. The incident requires immediate attention and resolution to minimize the impact on users and prevent further damage.


High error rate on NGINX incident

```shell
# Environment Variables

export POD_NAME="PLACEHOLDER"

export SERVICE_NAME="PLACEHOLDER"

export NAMESPACE="PLACEHOLDER"

export DEPLOYMENT_NAME="PLACEHOLDER"

export CPU_LIMIT="PLACEHOLDER"

export MEMORY_LIMIT="PLACEHOLDER"

export CPU_REQUEST="PLACEHOLDER"

export MEMORY_REQUEST="PLACEHOLDER"
```


### Get the name(s) of the pod(s) running CoreDNS

```shell
kubectl get pods -l k8s-app=kube-dns
```

### Check the logs of the CoreDNS pod(s) for errors

```shell
kubectl logs ${POD_NAME} -n kube-system
```

### Check the CPU and memory usage of the CoreDNS pod(s)

```shell
kubectl top pods -n kube-system | grep ${POD_NAME}
```

### Check the Kubernetes events related to the CoreDNS pod(s)

```shell
kubectl get events --sort-by=.metadata.creationTimestamp | grep ${POD_NAME}
```

### Check the status of the container(s) in the CoreDNS pod(s)

```shell
kubectl describe pod ${POD_NAME} -n kube-system | grep -A 2 -B 2 ContainerStatuses:
```

### Check the network latency between the CoreDNS pod(s) and other pods/services

```shell
kubectl exec ${POD_NAME} -n kube-system -- nslookup ${SERVICE_NAME}
```

### Check the Kubernetes services and endpoints related to CoreDNS

```shell
kubectl get svc,endpoints -n kube-system | grep kube-dns
```


### Scale the CoreDNS deployment to handle the increased load.

```shell
kubectl scale deployment $DEPLOYMENT_NAME --replicas=$REPLICAS -n $NAMESPACE
```

### Update resources limits for CoreDNS deployment to handle the increased load.

```shell
kubectl patch deployment ${DEPLOYMENT_NAME} -n ${NAMESPACE} --type=json -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/resources", "value": {"limits": {"cpu": "${CPU_LIMIT}", "memory": "${MEMORY_LIMIT}"}, "requests": {"cpu": "${CPU_REQUEST}", "memory": "${MEMORY_REQUEST}"}}}]'
```

### Increase the resources allocated to the affected system to handle the increased load.

```shell
kubectl set resources deployment <deployment-name> --limits=<resource-limits>
```


High Request Duration on CoreDNS

Overview

Parameters

Debug

Get the name(s) of the pod(s) running CoreDNS

Check the logs of the CoreDNS pod(s) for errors

Check the CPU and memory usage of the CoreDNS pod(s)

Check the status of the container(s) in the CoreDNS pod(s)

Check the network latency between the CoreDNS pod(s) and other pods/services

Repair

Scale the CoreDNS deployment to handle the increased load.

Update resources limits for CoreDNS deployment to handle the increased load.

Increase the resources allocated to the affected system to handle the increased load.

Learn more

Related Runbooks

High 4xx Errors on Traffic Server

Anomalous max processing time for Tomcat host

High Latency Incident for Spark Job Execution.

Postgresql high rate statement timeout incident.

Support