Back to solutions
Kubernetes

Pods Stuck in Terminating

When Kubernetes pods won’t leave the terminating state, they must be identified and safely drained.
High
Customer experience impact
Apps fail to schedule, causing unavailability
Weekly
Occurrence frequency
For fleets with hundreds of nodes
High
Time to repair manually
2-4 hours
Low
Shoreline time to repair
0 minutes
Time to diagnose manually
Cost impact
Security

The problem

When Kubernetes pods won’t leave the terminating state, this suggests that the underlying node is likely broken. When this occurs, apps may fail to schedule, causing unavailability. This can become a financial drain on your organization because this issue can lead to unnecessary scaling.

This is a difficult issue for many teams to diagnose because Kubernetes pods are often in the terminating state, meaning it’s tricky to know which ones have been around for too long. Fixing this issue is complex since Node draining in Kubernetes must be configured in a way to work for your environment. This will need to take into account time-out periods, pod disruption policies and other cluster-wide configurations.

The solution

Shoreline’s Pods Stuck in Terminating Op Pack talks to Kubernetes master and checks on various pod states and determines if a pod has been terminating for too long. This is done by cordoning, draining, and then terminating the node so that it is safely cleaned up so that it is not impacting other software.