“Automation takes us too much time. We're way too busy fighting fires to think about it.”
I get it. But here's the problem with that approach:
48% of incidents are straightforward and repetitive.
So instead of having your people fix them manually every time, you can simply teach the computer how to do it.
And in my experience working on different projects at AWS and then with various companies at Shoreline, 50% of all your incidents are probably due to just 4 issues.
Another quarter of the problems are due to the next 10.
So the benefit is clearly there for automation.
The challenge is the cost.Here’s what I’d say to that:
If it takes you a month to automate something, there are not a lot of things that are worth automating.
But if it takes you just an afternoon, almost everything is.
To do that, you must build a culture around continuous improvement for resiliency.
Just like you have everyone write unit tests and get their code reviewed to ensure quality, you need to apply that same methodology to resiliency.
For example, for every issue you run into on call, you must have a plan to automate away over time (at least the ones that aren’t one in a million).
Then you can find tools that help you quickly build automations so that you can run that flywheel really fast.