Runbook

Datadog Agent Not Running in Kubernetes Cluster.

Back to Runbooks

Overview

This incident type involves an alert triggered by the Datadog monitoring agent indicating that it has stopped running in a particular Kubernetes cluster. This can lead to a variety of issues such as loss of visibility into cluster performance, potential security risks, and other problems that can impact operations. The incident requires immediate attention and resolution to ensure the Datadog agent is running properly and that the cluster is operating as expected.

Parameters

Debug

Check if the Datadog agent is deployed in the Kubernetes cluster

Check the logs of the Datadog agent pod

Check if the Datadog agent is running on all nodes in the Kubernetes cluster

Check if the Datadog agent is able to connect to the Datadog backend

Check the status of the Kubernetes nodes in the cluster

Check the status of the Kubernetes pods in the cluster

Check the status of the Kubernetes services in the cluster

There may have been an issue with the network or communication between the Datadog agent and the Kubernetes cluster.

Repair

Verify the connection between the Kubernetes cluster and the Datadog agent.

Perform a rolling restart of the Datadog agent DaemonSet.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.