Back to Blog

Shoreline Enhancements Improve Safety for Cloud Production Operations

Shoreline announces customer-driven enhancements that provide enterprise customers with critical safeguards against human errors when executing large scale automations across their multi-cloud infrastructure.

When we first launched in 2020, we provided developers and operators a powerful way to resolve production incidents with the world’s fastest incident resolution platform . Users flocked to Shoreline to reduce their incident resolution costs by using our hosted manager, Op Pack catalog, Notebooks and monitoring platform. Since then, we have partnered with our enterprise customers and co-design partners to further simplify the production-scale adoption of Shoreline.

Today, we’re excited to announce these customer-driven enhancements to Shoreline. These improvements provide enterprise customers with critical safeguards against human errors when executing large scale automations across their multi-cloud infrastructure.

Circuit Breakers

A popular request from our customers is to configure and manage guard rails against human errors when executing automated incident resolution against their fleet of Kubernetes or VM clusters. Ensuring that actions of a single individual do not impact resources beyond their control is a core security and resiliency requirement for our enterprise customers. By controlling the frequency and scope by which automated actions run against their infrastructure, customers can avoid costly incidents such as the outage at Joyent Cloud when an administrator was able to simultaneously reboot all virtual servers in a datacenter.

In our latest release, we are introducing a new Shoreline capability: Circuit Breakers. Circuit Breakers allow users to get global execution constraints against the automated Actions they have configured on Shoreline across different resource clusters and lines of business. Users can configure these constraints for a specific set of resources, for specific automated actions and for a specific amount of time. Each Circuit Breaker represents a combination of the automated action, resource query, desired execution limits of the action and duration for which these limits should be in effect.

Circuit Breakers not only limit the blast radius of human errors when remediating costly incidents and reduce operational toil, but they also allow operations teams to build resiliency in their automations so that they can be executed even for the most complicated incidents.

Example of a circuit breaker to limit the numbers of pods to restart to between 5 and 10, that are consuming more than 40% of memory.

When combined with new Notebook Auto-run capability, Circuit Breakers provide a complete control loop for your automations when remediating incidences without any manual intervention.

Notebook Auto-run

Since their introduction on the Shoreline platform in November 2021, we have seen rapid adoption of Notebooks by customers to automate entire workflows of incident remediation. These workflows would traditionally reside in runbooks, many of which were outdated and have multiple manual steps to execute. Notebooks have enabled businesses to increase their customer satisfaction by reducing the MeanTime to Recovery (MTTR) of incidents by removing over 70% of the manual tasks from traditional runbooks. Our customers have told us that they want their monitoring and incident remediation tools to be further integrated so that actions can be taken transparently with minimal disturbance to on-call teams.

With the current release of Shoreline we are further expanding the automation capabilities of Notebook by enabling their execution when an incident alarm is triggered. By linking the execution of Notebooks with alarms, customers can now further their goal of automating away any manual remediation for incidents. Notebooks can be linked to one or more alarms while within the context of the Notebook. By linking to multiple alarms, customers can ensure that incidents do not fall through the cracks and appropriate remediation action can be taken.

Choose to link a Notebook to an alarm within the Notebook screen.

Pop-up to select the alarms to link the Notebook for auto-run.