Runbook

Overheating reported by Prometheus exporter at {{ $labels.instance }} for Cassandra service.

Back to Runbooks

Overview

This incident type refers to the report of overheating by Prometheus exporter at a specific instance for the Cassandra service. It could be caused by a high volume of traffic or some issues with the server, and it requires immediate attention to avoid any potential downtime or data loss. The incident is assigned to an engineer who will investigate and resolve the issue as quickly as possible.

Parameters

Debug

Check the status of the Cassandra pods

Check the CPU and memory usage of the Cassandra pods

Check the CPU and memory usage of the Prometheus exporter pod

View the logs of the Prometheus exporter pod

Check the status of the Prometheus service

Check the status of the Prometheus Node Exporter pods

View the logs of the Prometheus Node Exporter pod

The server hosting the Prometheus exporter may be overloaded with traffic, causing it to overheat and trigger the incident.

Repair

If the overheating is caused by high traffic, consider scaling up the server or optimizing the Cassandra service to handle the load.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.