This incident type refers to the report of overheating by Prometheus exporter at a specific instance for the Cassandra service. It could be caused by a high volume of traffic or some issues with the server, and it requires immediate attention to avoid any potential downtime or data loss. The incident is assigned to an engineer who will investigate and resolve the issue as quickly as possible.
Parameters
Debug
Check the status of the Cassandra pods
Check the CPU and memory usage of the Cassandra pods
Check the CPU and memory usage of the Prometheus exporter pod
View the logs of the Prometheus exporter pod
Check the status of the Prometheus service
Check the status of the Prometheus Node Exporter pods
View the logs of the Prometheus Node Exporter pod
The server hosting the Prometheus exporter may be overloaded with traffic, causing it to overheat and trigger the incident.
Repair
If the overheating is caused by high traffic, consider scaling up the server or optimizing the Cassandra service to handle the load.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.