Let’s see how we can convert some of the over-provisioned resources for demand spikes to waste and eliminate it.
At Shoreline, our back ends are pretty low utilization most of the time.
But once an hour, we pull telemetry data from all of our agents, resulting in a big spike in CPU, memory, and network utilization.
So for about a minute, the system is running all out.
Here are some ways how we smooth out the load:
- Rather than making a request to all of the agents every hour, we make a request to 20% of the agents for 1 min every 5 mins.
This leads to the load submerging to 20%.
As we do that, we can reduce the size of instances by a quarter, which saves a lot of money.
- Since Shoreline is a multi-tenant system that runs pods inside EKS, we can just have different back-end pods issuing these requests at different times, which further smooth things out.
- Even better, we can have the agents push the data themselves whenever it's ready, rather than shipping on the request.
They can store it on the back-end and process it locally whenever it has spare cycles, avoiding the interaction between foreground and background activity.
- Most of our customers don't need long-term retention data, because they already have observability systems for long-term retention (even though we do a pretty good job at it).
They only need the real-time per second data when something goes wrong.
So we load the customers that want us to do the observability alongside those that don’t, which further smooths things out to a great extent.