Transitioning from SRE to Backend Engineering: Key Strategies and Insights

Recently there was a Reddit post asking for advice about moving from Site Reliability Engineering to Backend Eng. I started writing a response to it, the response got long, and so I turned it into a blog post.

In the post, OP mentions a couple of things driving the motivation for the transition. One is a concern that they may be losing development skills because they’re spending so much time creating scripts and automating. The other reason is that they’re having trouble adjusting to on-call life.

There are a few things we’ll look at. The first is how dynamic Ops and SRE are as fields, in that new technologies mean these engineers need to retool their skillset with regularity, and the marketable skills this helps you develop as an engineer. The second is what may be a misconception that backend engineering is more creative than SRE, or that SRE is less creative than engineering, which isn’t necessarily the case.

The third is the on-call rotation, which is getting more difficult to avoid these days.

The ever-changing SRE skillset

If you stay in Ops for more than 5 years, you’ll end up needing to retool your skillset when the platform inevitably changes. If you want to stay at the cutting edge, there’s no getting around this. Everyone who once called themselves a SysAdmin retooled for DevOps, and everyone who got used to calling themselves DevOps is retooling for SRE. And I'm just recalling what's happened in my own lifetime.

As soon as we got great at administering Linux servers, we needed to figure out how to get great at doing cloud. Now we’re all figuring out how to get great at Kubernetes. Serverless or something else is looming on the horizon.

Retooling is true for almost every job, but this feels particularly extreme for SREs and Ops engineers. There are exceptions, of course. Things like shell are here to stay for the foreseeable future.

But if you compare Ops with programming, the contrast becomes clear. The most popular programming languages stick around for decades, and that gives programmers more sticking power in their work, and it fosters a particular type of creativity that feels distinct from the scripting and automation work of DevOps and SRE. But that misses a lot of the creativity that is present in SRE work.

Some of the best developers I know are fluent SREs who could swap between roles just fine, and one of the things that make them great developers is that it's so much faster to develop when you know how to actually deploy the software.

Developers who are familiar with SRE avoid "it works on my box" problems because they package their code up correctly from the get-go.

As a field, SRE is very knowledge oriented. For example, to use Kubernetes well, you simply have to build up a lot of knowledge. You more or less have to read the manual. Whereas a lot of programming is intuitive, and once you've picked up the fundamental syntax and primitives you can start building.

The iterative and process-driven nature of SRE work feels, at first, a bit less creative, but inspires a very valuable strain of creativity.

The creativity of SRE isn’t about creating more code, it’s about accomplishing a task with the bare minimum of work. Some of the most beautiful engineering solutions I've seen involved an SRE making minor changes to the packaging of something pre-existing and applying it to a brand-new use case.

LiteStream for SQLite, a replication engine for SQLite is a great example. They didn't touch SQLite to build this, but they understood how it writes its log files to disk. You can drop in and create backups for SQLite without changing the code at all. It's a zero-line change.

This is SRE-oriented creativity. And it requires being intimately familiar with the software. If you just intuited a solution, you'd come up with a solution that required you to write a bunch of new code. It might be a brilliant technical solution, but it would be inefficient.

Similarly, one of the smarter things that the Kubernetes project did was avoid developing volume APIs because the cloud providers had them with AWS EBS and Google Cloud Disks. They understood that the competitive advantage was in organizing and creating a common interface to disk. The existing mechanisms to provide disk were already great.

It may be because the field evolves so quickly, SREs learn to identify the smallest piece of marginal work that will accomplish a task. They don’t have the luxury of starting from scratch every time. But this is a disciplined way of thinking that makes engineers much more effective and valuable.

SRE and Backend are converging

The work you’re doing as an SRE will partly depend on your company culture. Without a doubt, some organizations will relegate their SREs to driving existing processes like watching the on-call make sure there are no tickets, running deployments, etc. This can make folks feel like they aren’t progressing.

However, today there are a lot more things you can do as an SRE than you once could. You used to just have Bash. Now you have many automation opportunities that will hone your programming skills. You can configure Kubernetes and Terraform. There's a bunch of code-oriented tools that you can use. You can orchestrate your stuff in Python. You could also use something Shoreline if you want it, which is “programming for operations,” and allows you to think of the world in terms of control loops, and how you can automate there.

DevOps has also increased the Venn diagram overlap between SRE and Backend engineering. Previously, it was engineers using version control and engineers using package managers, which was separate from SREs using deployment systems and SREs using Linux administration tools.

Everybody's using version control, everyone's looking at the output of test runs now, everyone cares about containers. The interface isn’t an executable passing from development to SRE any longer. Now there’s this build pipeline and SREs, DevOps, and company, and engineers are all looking at it. And so, I think that's also a way that could help people transition is that it's just not actually, the tooling is converging too.

And it does matter which company you're at. And so in a more ideal world, especially a GitOps world, you're going to be checking everything into version control and you're going to be making sure everything is root caused and fixed as an SRE. You're going to be debugging all the time, which is actually what engineers are doing with their code all day.

In the end, being a software engineer isn’t about typing, it's not even really about writing code. The actual act of writing code is not how you’ll spend a large percentage of your time. Instead, you’ll be debugging the code, testing the code, figuring out why your assumptions were wrong and resulted in bad code, and then editing the bad code to be less bad code.

From that perspective, I think the roles have a lot in common. You’re either debugging at the system level or you’re debugging at the unit and functional level.

Now, you may have read all this and still want to bail on SRE just want to get away from on-call, which is fine. On-call isn’t for everyone. But don’t miss out on the opportunities your SRE rotation affords you.