“Should I separate development from operations to manage incidents?”
This question comes from a deep pain point.
No one wants to do on-call.
It’s because you can't control when the incident happens. It might happen over a weekend and overnight. And if it's in your shift, you're the one carrying the load. But the notion of separating Dev and Ops misses the point.
It's like separating development from QA. Yes, you have QA people, but their job is to ensure that:
- deployments go cleanly,
- regression testing is being done properly,
- things that escaped the dev testing process get handled, etc.
But you still have developers writing tests.
Similarly, you need to have developers taking on-call shifts because:
- that way, the load gets shared and becomes easier to manage.
- More importantly, you share the problem instead of hiding it in one community. This incentivizes you to solve it for yourself and your customers.
How do you solve it?
You do it by building automations that eliminate common production incidents.
That’s what we do at Shoreline: Enabling your DevOps engineers to build automations in an afternoon that fix issues forever. As a result, you get fewer incidents and better on-call.