Runbook

CoreDNS Excessive Cache Utilization Incident

Back to Runbooks

Overview

This incident type typically occurs when the CoreDNS service is consuming an excessive amount of cache. CoreDNS is a flexible and extensible DNS server that is used in Kubernetes clusters for service discovery and load balancing. However, when the cache utilization becomes too high, it can result in performance degradation, service disruptions, and even system crashes. This incident requires immediate attention from the DevOps team to identify the root cause and implement a solution to prevent it from happening again.

Parameters

Debug

List all the pods in the default namespace

Check the logs of the CoreDNS pods to see if there are any errors or warnings

Check the resource usage of the CoreDNS pods to see if they are consuming too much memory or CPU

Check the CoreDNS configuration file to see if there are any misconfigurations that could be causing the excessive cache utilization

Restart the CoreDNS pods to see if it resolves the issue temporarily

Repair

Increase the resources allocated to the Kubernetes cluster, such as memory or CPU, to accommodate the increased cache utilization.

Configure CoreDNS to limit the maximum size of the cache to prevent it from consuming too many resources.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.