Runbook

Kubernetes Memory Usage Alert

Back to Runbooks

Overview

This incident type is related to an alert triggered when the available memory on a Kubernetes node drops below a certain threshold (in this case, 90%). The alert is designed to monitor the memory usage percentage and notify the relevant teams when the threshold is breached. This incident type is critical as it helps ensure that Kubernetes clusters are operating within acceptable memory usage levels and that potential issues are identified and resolved promptly.

Parameters

Debug

Get the list of Kubernetes nodes

Describe a specific node to check its resource usage

Get the list of Kubernetes pods in a specific node

Describe a specific pod to check its resource usage

Get the logs of a specific container in a specific pod

Repair

Remove node from cluster.

Identify and terminate any resource-intensive pods on the impacted node(s) to free up memory.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.