Back to videos

How to Reduce Waste for Unexpected Demands

Shoreline's back ends are low utilization most of the time. But once an hour, we pull telemetry data from all agents, resulting in a CPU, memory, and network utilization spike. See how we convert some of the over-provisioned resources for demand spikes to waste and eliminate it.
3 min
play_arrow
Summary

Let’s see how we can convert some of the over-provisioned resources for demand spikes to waste and eliminate it.

At Shoreline, our back ends are pretty low utilization most of the time.

But once an hour, we pull telemetry data from all of our agents, resulting in a big spike in CPU, memory, and network utilization.

So for about a minute, the system is running all out.

Here are some ways how we smooth out the load:

- Rather than making a request to all of the agents every hour, we make a request to 20% of the agents for 1 min every 5 mins.

This leads to the load submerging to 20%.

As we do that, we can reduce the size of instances by a quarter, which saves a lot of money.

- Since Shoreline is a multi-tenant system that runs pods inside EKS, we can just have different back-end pods issuing these requests at different times, which further smooth things out.

- Even better, we can have the agents push the data themselves whenever it's ready, rather than shipping on the request.

They can store it on the back-end and process it locally whenever it has spare cycles, avoiding the interaction between foreground and background activity.

- Most of our customers don't need long-term retention data, because they already have observability systems for long-term retention (even though we do a pretty good job at it).

They only need the real-time per second data when something goes wrong.

So we load the customers that want us to do the observability alongside those that don’t, which further smooths things out to a great extent.

Transcript

View more Shoreline videos

Looking for more? View our most recent videos
2 min
Niall Murphy on his experience with Shoreline's Incident Automation Platform
Niall Murphy, former SRE at Google and Microsoft and author of the O'Reilly book, Site Reliability Engineering, shares his experience of using Shoreline's Incident Automation Platform.
2 min
About Shoreline’s Fleet-Wide Debugging and Repair
Shoreline enables highly targeted fleet-wide debugging and repair allowing you to debug across the fleet in about the same amount of time as an individual box.
2 min
Shoreline Incident Insights
A quick overview video that shows automated categorization, filtering, and analysis of incidents.