Datadog Incident Repair Kit

Find it with Datadog. Fix it with Shoreline.
Get started now

Fix issues right from Datadog

Shoreline is built right into the Datadog UI so you never need to leave the Datadog UI to repair issues.

Get started in minutes

Shoreline comes with many standard repair actions that you can configure and deploy in minutes.

Build custom repair actions in an afternoon

Need a custom repair action?  Build even the most complex repair actions in just a few hours.

Find it with Datadog. Fix it with Shoreline.

Shoreline’s Datadog Incident Repair Kit allows customers to connect up to 20 pre-built Shoreline repair actions with Datadog Monitors to enable incident self-healing for up to 500 nodes. Additionally, the Shoreline team will build up to 5 additional custom actions that are tailored for your environment.
Get started now

Remediate with Pre-built Repair Actions

Instead of documenting playbooks in wikis, create live, best-practice remediations with role-based access controls. Give your on-call team the flexibility to apply these remediations to one or many hosts at the same time. Built-in blast controls and circuit breakers safeguard against mistakes.

Create Remediations and Tie Them to Datadog Monitors

For your most common incidents, create automated remediations and eliminate interruptions and 2 am wake up calls. Create fine grained monitors that determine the root cause with confidence. Tie a Shoreline automation to a Datadog monitor in just a few clicks.

View Audit Logs In The Datadog Event Stream

Every repair action run is automatically published to the Datadog Event Stream.  See when actions are run and the results they generate, regardless of whether they are run automatically or manually.

Ready to get started?

Turn on Shoreline in Datadog

Getting started with Shoreline’s Datadog Incident Repair Kit is quick and easy. Simply fill this form and we will take care of the rest. We can even help you set up your first action.

Shoreline Solutions

Streamline Kubernetes Operations

  • Kubernetes node retirement - Gracefully terminate nodes when marked for retirement.
  • Kubernetes pod out of memory (OOM) - Generate diagnostic information and restart pods that ran out of memory.
  • Kubernetes pods stuck in terminating - Safely drain, and restart stuck pods.
  • Kubernetes pods restarting too often - Capture diagnostics to identify the root cause.
  • IP exhaustion - Clear away failed jobs/pods that are consuming too many IP addresses.
  • Stuck Argo workflows - Delete stale pods after workflow execution.
  • Restart CoreDNS service - Execute a rolling restart when latency increases.

Reduce Toil (VMs and/or Kubernetes)

  • Disk resize / disk clean - Clear out log files and/or resize disks
  • Intermittent JVM issues - Capture diagnostic information for intermittent issues that are hard to reproduce and debug.
  • Server drift - Restore uniformity when configuration files, databases, and data sources differ.
  • Config drift - Ensure observed state matches desired state on your system configuration, e.g. Kubernetes yaml, Cloud config, etc.
  • Disk failures in kern.log - Capture diagnostics and recycle the VM
  • Network failures in kern.log - Capture diagnostics and recycle the VM
  • Kafka data processing lag - Restart slow/broken consumers when systems are falling behind
  • Process list check - Restart missing processes or kill old versions of processes
  • Certificate rotation - Rotate expiring certificates
  • Integrate and send updates to all your DevOps tools, including:  Slack, Jira, PagerDuty, ServiceNow or OpsGenie

Reduce Toil (VMs and/or Kubernetes)

  • Need a custom action?  Build it yourself or have Shoreline do it for you.  Up to 5 custom actions are included in the Datadog Incident Repair Kit

Learn more about Shoreline’s platform

Ready to dig deeper? Explore our platform