“How do I do continuous improvement in operations?” You do it by creating a culture around it. Let’s understand this with an analogy to Agile and quality improvement.
A lot of Agile is about creating continuous improvement and automation where you need to figure out 3 things:
- an output metric
- an input metric that drives the output metric
- the work item that drives the input metric
The output metric = The number of defects that escaped your QA and testing process and made it into the wild
The input metric (my preference) = The percentage of automated testing or your code coverage
The work item = Building test cases
Similarly, in operations:
The output metric = The number of tickets
The output metric = The number of tickets x the duration of the event x the number of people impacted.
I prefer the latter because something that affects a lot of people is more important than something that affects just one.
The input metric = The number of automations you've built. It's hard to go back and fix all your code, so you must remediate it. You need to employ the machine to fix issues in a few seconds rather than having a human do it in an hour or more, especially when many people are impacted.
The work item = Building the automations. How do you do that? The good news is that you get ~100 new tickets every week. Just automate one per week. If you run that loop every week, things will get better and better over time.
That's how you do continuous improvement in operations.