Back to videos

Why You Should Automate Production Ops

Most of the on-call issues are commonplace, which means they happen again and again. It’s important to automate these issues because it’s a one-time investment, doesn’t make mistakes, and stays with you forever.
2 min
play_arrow
Summary

Let’s talk about the value of automation for production operations.

Most of the on-call issues are commonplace, which means they happen again and again.

So if you’re trying to fix it manually, you run into the following problems:

- People are less efficient.

It can take them an hour to register that something happened, find the right runbook, and make the fix, which wastes their time and causes unavailability.

- People make mistakes.

They make even more mistakes if things are commonplace because they don't have their head in the game.

- People leave.

You might put in a lot of resources to train your people, but when they leave, they take that expertise with them.

That’s why it’s important to automate these issues using software as it’s a one-time investment, doesn’t make mistakes (unless there’s a bug), and stays with you forever.

Here are 2 main reasons why people don’t automate their commonplace incidents:

1. It takes a long time.

When I was at AWS, each automation would take about a month to build, which is a long time.

So we’d go through the cost-benefit analysis to decide whether to focus on that or some other dev tasks.

But if it takes just a couple of hours to build (like how we do it at Shoreline), the cost is always low.

So it doesn't even matter. You just build the automation as it takes the same amount of time as fixing the issue once.

2. It may run amok.

People know how to build the solution for an individual box, but they often make mistakes when scaling it across the fleet.

At Shoreline, we're distributed systems people. We work with circuit breakers, leases, etc., to ensure that the automations are safe and fast.

That’s how we help you build automations that enable you to:
- be less dependent on expensive, high-churn labor
- improve your availability to the customers
- sleep stress-free at night

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
3 min
How to Reduce Waste for Unexpected Demands
Shoreline's back ends are low utilization most of the time. But once an hour, we pull telemetry data from all agents, resulting in a CPU, memory, and network utilization spike. See how we convert some of the over-provisioned resources for demand spikes to waste and eliminate it.
2 min
Niall Murphy on his experience with Shoreline's Incident Automation Platform
Niall Murphy, former SRE at Google and Microsoft and author of the O'Reilly book, Site Reliability Engineering, shares his experience of using Shoreline's Incident Automation Platform.
6 min
Automation Anywhere Connects Sumo Logic with Shoreline for Auto-remediation
Automaton Anywhere links Sumo Logic's data and log monitoring with Shoreline's automated incident repairs to improve customer experiences and save Dev time