Software Engineering Daily: Fleet Automation with Anurag Gupta
Anurag had the opportunity to chat with Jeff Myerson on his podcast, Software Engineering Daily
Anurag had the opportunity to chat with Jeff Myerson on his podcast, Software Engineering Daily, today to discuss why we're still in the "dial-up" age of cloud computing, how he thought about strategy when he was at AWS, and the operational pain that consumed his team's time at AWS and served as the inspiration for starting Shoreline.
Because Jeff is so passionate about the industry, he usually does a good job with his interviews, and this was no exception. If you're curious about AWS, or interested in learning more about Shoreline's thesis and technology this interview is well worth a listen.
Here are some excerpts from the podcast.
Shoreline's inspiration is tools to make production automation simple is a huge problem and an underserved market.
(At AWS) half my overall teams were doing the operations and the control planes and the work necessary to make the products simple to use, scale easily, and deal with outages and stuff like that. And so I think people underestimate as they go from being product companies to being SaaS companies how much operations work there is.
What got me excited was extinguishing tickets once and for all and through automation. So we're really an incident automation company. We think of that as the third big area in production ops. What we're trying to do is say that, “Hey, for the things that happen again, and again, and again, why is the human being doing it?”
Anurag on the lack of innovation in troubleshooting software over the last few decades.
30 years ago, if you were an operator, you'd basically get a page and you'd go and crack open your Compact laptop or something and then you'd go and try to VPN into some box and fix it. How is that meaningfully different than what you're doing today? It's not. That's nuts.
Shoreline has a lot of tech under the covers, but this is a good elevator pitch.
So in one way you can think about Shoreline as it's kind of like Splunk, except it doesn't have any lag. And it lets you change stuff, not just look at it. On the automation side of things, you take what you built debugging, and then you convert it into an action in a bot so that you can generate an alarm and just automatically repair.
We give you a console that lets you run across your boxes. Filter the resources that you want to operate on. And it's basically a simple connection between resources, metrics, and Linux commands, which can just be pipe delimited together.
Aren't SREs suspicious of tools that automatically make changes to production?
One of the challenges with Shoreline is that you're actually changing your environment, right? It's not just looking at it. It's much easier to sell an observability tool, I think, or a process improvement tool. So I think it will take some time. But 10 years ago, no one would have let Kubernetes do an OOM killer. And yet it's an obvious piece of infrastructure that we do right now.