At Shoreline, we enable highly targeted fleet-wide debugging and repair.
It allows you to:
- run a command across all your boxes in parallel
- decide whether to run a second command that gives you more detail, or
- go in a different direction
It’s similar to what you’d do to debug an individual box, but you're debugging across the fleet in about the same amount of time.
You can do many things in this model that you couldn't through dashboards.
At AWS, a large-scale event happened once due to a BIOS upgrade.
There's no way we could have a log file or a dashboard for it.
The only way out was to log into the boxes and find out what the heck was going on.
So I had ~20 people run this manual parallelization process (which is obviously ridiculous).
But that was the only way back then.
Today, you can use Shoreline to safely run individual commands across a lot of boxes simultaneously, all by yourself.
It is executed in a parallel distributed framework (like everything else we do at Shoreline).
That’s how our fleet-wide debugging and repair works.
Have you ever done fleetwide debugging? Could you use this capability?