Runbook

Airflow worker node outage.

Back to Runbooks

Overview

An Airflow worker node is a component of the Apache Airflow system that performs scheduled tasks and manages workflow execution. When a worker node goes down, it means that the tasks and workflows assigned to that node cannot be executed until the issue is resolved. This can cause delays and disruptions to scheduled processes and may require urgent attention from the system administrators to restore functionality. The incident may occur due to hardware or software failures, connectivity issues, or other technical problems.

Parameters

Debug

Check if the Airflow worker node is responding to ping requests

Check if the Airflow worker node is listening on the expected port

Check the logs of the Airflow worker process

Check the status of the Airflow worker service

Insufficient resources on the Airflow worker node, leading to memory or CPU exhaustion.

Repair

If the issue persists, consider increasing the resources allocated to the worker node, such as CPU, memory, or disk space.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.