This incident type is related to high pod count per node in a Kubernetes cluster. This can happen due to various reasons such as misconfigurations, resource constraints, or issues with the application itself. The incident can cause service disruptions or outages if not addressed in a timely manner. It requires investigation and resolution by the DevOps team to ensure proper functioning of the Kubernetes cluster and the applications running on it.
Parameters
Debug
List all nodes in the cluster
Check the pod count per node
Check the status of the pods
Check the logs of a pod
Check the metrics for the node
Repair
Define variables
Check the deployment status
Scale down the pods
Wait for pods to terminate
Identify and terminate any unnecessary or redundant pods running on the nodes.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.