Runbook
Kubernetes Cronjob Failure
Back to Runbooks
Overview
A Kubernetes Cronjob Failure incident occurs when a scheduled task, or cronjob, in a Kubernetes cluster fails to execute as expected. This may be due to a variety of reasons, such as misconfiguration, resource constraints, or software bugs. The incident requires investigation and debugging to identify the root cause of the failure and resolve the issue to restore normal operation.There is also a kubernetes limitation that permanently stops a cronjob after too many (e.g. 100) execution errors or failures to schedule.
Parameters
Debug
Check if the cronjob is still active
Check if the pods created by the cronjob are still running
Check the logs of the pods created by the cronjob
Check if the cronjob schedule is correct
Check if there are any errors in the cronjob events
Check if the cronjob image exists in the container registry
Check the status of the last cronjob run
Check if the cronjob is running on the expected node
Check if the pod has sufficient resources
Check if there are any errors in the pod events
Repair
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.