Runbook

Spark executor failure during job execution.

Back to Runbooks

Overview

This incident type refers to a failure in one or more Spark executors during the execution of a job. Spark executors are worker processes that run computations and store data in memory or on disk. When an executor fails, it can cause the entire job to fail or result in degraded performance. This type of incident can occur for a variety of reasons, such as hardware or network issues, memory errors, or software bugs.

Parameters

Debug

Check the status of the Spark application

View the logs for the failed executor

Check the resource usage of the executor

Check the system logs for any relevant error messages

Insufficient resources allocated to the Spark executor leading to failure during job execution.

Repair

Check if the executor has sufficient resources such as memory, CPU cores, and disk space. Increase the resources if necessary.