Back to solutions
Kubernetes

Delete Old Argo Pods

Argo makes declaratively managing workflows easy, but it can leave behind many stale pods after workflow execution.
Medium
Customer experience impact
You may exhaust the IP address on worker nodes. This can lead to down-time
High
Occurrence frequency
Depends on customer usage. Can occur weekly depending on how frequently nodes are rotated.
Time to repair manually
Low
Shoreline time to repair
~ 0
Time to diagnose manually
Medium
Cost impact
Over-provisioning hardware leads to higher costs
Security

The problem

Many infrastructure groups deploy Argo for workflow orchestration on Kubernetes. While Argo makes declaratively managing workflows easy, it can leave behind many stale pods after workflow execution.

In Kubernetes, each of these stale pods consumes one IP address, whether it is running or not. Since every Argo pod claims an IP address, you must delete them all eventually. When IPs are exhausted on a node, Kubernetes cannot use any free CPU and memory for scheduling. In an autoscaled environment, this means that Argo IP exhaustion can trigger the provisioning of new capacity prematurely. Most customers overcome this hurdle by either over-provisioning hardware which leads to higher costs or implementing custom logic for cluster auto-scaling and workflow clean-up.

The solution

Shoreline’s Argo Op Pack dramatically reduces the operational burden of administering Argo, decreasing wasteful overcapacity and lowering operating costs. It constantly monitors the local node, comparing the number of allocated IPs against a configurable threshold maximum. From there, Shoreline automatically cleans up old Argo garbage pods if the total assigned IPs exceeds the threshold.

The Argo Op Pack comes with several additional features, including:

  • Configurable job and workflow state rules
  • Configurable job and workflow age rules
  • Automatic capacity provisioning
  • Plus, extra Argo management functions