Runbook

Cassandra Node Unavailability

Back to Runbooks

Overview

This incident type refers to a situation where a Cassandra node becomes unavailable. This can cause interruptions to the expected functionality of the system and can lead to data loss or corruption. The cause of this incident can be due to a variety of reasons such as hardware failure, network issues, or software bugs. It is critical to resolve the issue as soon as possible to minimize the impact on the system and ensure the smooth functioning of the application.

Parameters

Debug

Check Cassandra service status

Check Cassandra process status

Check Cassandra system logs

Check Cassandra node health

Check Cassandra node ring information

Check Cassandra node gossip information

Check Cassandra node log for errors

Check Cassandra node log for warnings

Check for any network connectivity issues

Check for any firewall issues

Repair

Attempt to repair the Cassandra installation or reinstall it if necessary.

Restore from a recent backup if data loss or corruption has occurred.

Reboot the node to attempt to clear any software issues that may be causing the unavailability.

Restore from a cassandra backup if data loss or corruption has occurred.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.