Product
Product overview
Explore our end-to-end solution
How Shoreline works
Automated Runbooks
Empower on-call teams with proven recipes
Datadog Incident Repair Kit
Find it with Datadog. Fix it with Shoreline.
Architecture
Shoreline’s modern architecture
Incident Insights - FREE
Auto-organize & analyze ticket data
Automated remediation
Fix issues permanently to eliminate ops toil
Debug & repair
Instantly access all clouds & clusters
Integrations
How we integrate with your tools
Solutions
Solutions Overview
Pre-built automations and runbooks
View all solutions
Kubernetes Debugging
There are a million things that can break within your Kubernetes cluster. Don’t waste time searching for that needle in the haystack.
Disk Resize/Disk Clean
Disk full incidents can lead to wide-spread outages and data loss that can damage customer experiences and lose revenue.
Intermittent JVM Memory Issues
JVMs often face memory issues that can lead to hours of SSH-ing into box after box trying to catch the issue as it happens.
Pods Stuck in Terminating
When Kubernetes pods won’t leave the terminating state, they must be identified and safely drained.
Idle EC2 Instances Cost Savings
Avoid rapidly increasing cloud costs by identifying and automatically shutting down idle EC2 instances across your fleet.
Delete Old Argo Pods
Argo makes declaratively managing workflows easy, but it can leave behind many stale pods after workflow execution.
Networking Issues
Network related issues are often hard to diagnose, and can lead to a very bad experience for customers.
Developers
Architecture
Shoreline’s modern architecture
Security
Built with security in mind
Availability
There when you need it
Safety
Minimize mistakes & their impact
Docs
Explore software specifics
Getting Started
How to quickly get going
Support
Contact our support team
Tutorial
A self-guided walkthrough
Resources
Blog
Stay up to date
Videos
Tips, tricks and walkthroughs
Webinars and Podcasts
Learn from industry leaders
Self-guided demos
Click-through at your pace
ROI Calculator
Better on-call pays off
Company
About us - team & purpose
News
Shoreline in the news
Support
Contact our support team
Events
Let's meet up!
Pricing
Log in
Free Trial
Free Trial
Our videos
Approaches and tips
Learn from our quick explanations, demos, and best practices.
Featured video
Automation Anywhere Connects Sumo Logic with Shoreline for Auto-remediation
Automaton Anywhere links Sumo Logic's data and log monitoring with Shoreline's automated incident repairs to improve customer experiences and save Dev time
6 min
Incident Automation
Reliability
Product
Company
Clear all
4 min
Shoreline on Shoreline: Idle EC2 Cost Savings Op Pack
Hear from Shoreline Op Pack Engineer, Kaustubh Prabhakar, on how valuable it is to use our Idle EC2 Cost Savings Op Pack.
Company
7 min
Debugging an eCommerce Microservice - High Request Latency Debugging with Shoreline
Charles Carey, Shoreline CTO, walks us through Shoreline's automated runbook experience.
Product
4 min
Shoreline on Shoreline: Unauthorized Root Access Detector
Hear from Shoreline Op Pack Engineer, Kaustubh Prabhakar, on how valuable it is to use Shoreline Unauthorized Root Access Detector.
Company
1 min
"The Power is Huge" with Shoreline
Hear how TigerGraph VP of Product and Innovation, Dr. Jay Yu, used Shoreline to drive continuous improvement and bring up the productivity of his DevOps teams.
Incident Automation
14 min
theCUBE Interviews Shoreline CEO Anurag Gupta at AWS re:Invent
Anurag Gupta joined John Walls to discuss innovation in the cloud with DevOps teams for the Global Startup Program at AWS re:Invent 2022.
Company
2 min
Shoreline Incident Insights
A quick overview video that shows automated categorization, filtering, and analysis of incidents.
Product
1 min
Shoreline on Shoreline: Alarms & Actions for Release Testing
Hear from Senior Director, Haritha Gongalore, on how rewarding it is to use Shoreline Alarms and Actions to test and certify our own releases.
Company
17 min
[Training] Debugging Kubernetes with Runbooks
In this training, we walk you through the common issues and challenges troubleshooting Kubernetes, and Shoreline's pre-built K8s debugging runbooks.
Incident Automation
7 min
The Surprising Cost of On-Call at Trace3 Evolve 2022
Ashley Stirrup dives into the hidden costs of on-call and discuss how one team saved 20 hours of DevOps time per month thanks to Shoreline.io automations.
Incident Automation
3 min
Shoreline Datadog Incident Repair Kit Demo
Create a library of best practice debugging tools and pre-built remediation actions so that everyone on-call is as good as your best SRE with Shoreline's Datadog Incident Repair Kit.
Product
3 min
Is Automation Too Time-Consuming?
Automation takes us too much time. The problem with this approach is that 48% of incidents are straightforward and repetitive. Don't have people fix them manually. Teach the computer how to do it.
Incident Automation
2 min
Risks of Automation vs. Human Errors
Automation is risky. Errors in the remediation code could worsen an outage. While that’s true, we also know that human error causes 5x more incidents than automation. You can fix code. You can't fix people.
Incident Automation
1 min
Shoreline Customer Spotlight: TigerGraph
Automating mundane tasks and debugging were just a few of the DevOps requirements TigerGraph VP of Product and Innovation, Dr. Jay Yu, needed to scale in the cloud with his small team. Shoreline delivered.
Incident Automation
2 min
How to Reduce On-Call Incidents
Shoreline's recent survey found that 48% of incidents are straightforward and repetitive while 55% of them escalate beyond the 1st line on call. If your on-call sucks, you must find a path to make incidents incidental.
Incident Automation
3 min
How to Manage Failure without Wasting Resources
How can you better utilize the resources you keep aside for failover purposes? Here's how we utilized resources kept just for failover purposes to do things that could be stopped for some time when a failure happens and had resources doing useful background activity that can be deferred to when things hit the fan.
Reliability
6 min
Automation Anywhere Connects Sumo Logic with Shoreline for Auto-remediation
Automaton Anywhere links Sumo Logic's data and log monitoring with Shoreline's automated incident repairs to improve customer experiences and save Dev time
Incident Automation
3 min
How to Reduce Waste for Unexpected Demands
Shoreline's back ends are low utilization most of the time. But once an hour, we pull telemetry data from all agents, resulting in a CPU, memory, and network utilization spike. See how we convert over-provisioned resources for demand spikes to waste and eliminate it.
Reliability
3 min
Actively Managing Systems to Improve Utilization
We're all being asked to do more with less now a days. For those of us in production operations, one of the best ways we can do that is eliminate waste with automation to drive higher utilization.
Incident Automation
2 min
Slack vs. Waste
Waste is when resources are deeply over-provisioned, underutilized, or not utilized at all. Slack appears like the same thing, but you create it with purpose. It's important to understand the difference to drive costs down.
Reliability
2 min
Why You Should Automate Production Ops
Most of the on-call issues are commonplace, which means they happen again and again. It’s important to automate these issues because it’s a one-time investment, doesn’t make mistakes, and stays with you forever.
Incident Automation
2 min
How Notebooks Empower Your On-Call Teams
Some issues can't be automated. For things that require human judgment, we provide on-call teams with notebooks that are optimized for operations. That way you know what action to take and when.
Incident Automation
2 min
Our Community-Driven Library of Shared Automations
We're all sitting on the same infrastructure in Production Ops, but build our systems as if we’re starting new. Insane! That's why Shoreline Op Packs are available for free.
Incident Automation
2 min
About Shoreline’s Fleet-Wide Debugging and Repair
Shoreline enables highly targeted fleet-wide debugging and repair allowing you to debug across the fleet in about the same amount of time as an individual box.
Reliability
2 min
The Best Way to Improve Your On-Call
No one wants to do on-call because you can't control when the incident happens. Improve your on-call by building automations that eliminate common production incidents.
Reliability
2:40 min
How to Do Continuous Improvement in Operations
Things that enabled me to do more with lower cloud computing costs
Reliability
3 min
3 Hacks to Reduce Your Cloud Computing Bill
Things that enabled me to do more with lower cloud computing costs
Reliability
1 min
Shoreline on Shoreline: Open Port Check
It's critical to close ports like 22 and 3389 that can be opened unintentionally in a development environment
Incident Automation
1 min
How to Safely Fix Issues Without Escalation
The only real solution is incident automation.
Incident Automation
2 min
About Company Values
Part of the reason to create a company is to create the environment you want to be in.So it’s important that you reflect your values in your interview process. Otherwise, the sheer number of people joining will dilute things.
Company
1 min
Role of Empathy in Building a Great Company Culture
Obsessing about customers is important, but so is creating a culture where people take care of others and feel cared for. That’s why we put our values right on our website.
Company
2 min
How to Fix an Incident Before It Happens
It requires predictive maintenance, including monitoring brownout and performing control actions
Reliability
2 min
Debugging a Fleet as Easily as an Individual Box
Underneath the covers, the underpinning technology is a lot like a parallel SQL database.
Product
2 min
Why We Leverage Wavelets for Data Compression
Wavelets are the best way to deal with errors in the underlying data stream
Product
3 min
How to Manage Your Operational Data Efficiently
"How long should we keep operational data?"
Reliability
3 min
How to Boost Reliability Without Hiring More SREs
How can companies increase reliability without hiring an army of engineers?
Reliability
2 min
Automate Based on Frequency not Recency
Beware of recency bias when automating incidents!
Incident Automation
3 min
Shoreline Makes Production-Ops Smarter and Faster
Often people try to build a solution like Shoreline on their own. Here's why they fail.
Product
2 min
What We Do at Shoreline (In 140 Seconds)
Shoreline helps on-call operators reduce incidents resulting in a better on-call experience and better availability for their customers.
Company
4 min
Why I Started Shoreline
Companies spend more on the people managing their cloud infrastructure than on the cloud infrastructure itself.
Company
1 min
Shoreline Operations Notebooks
Record, curate, and publish incident debug and repair best practices to safely empower on-call teams.
Product
1 min
Shoreline End-to-end Automation
Easily and safely automate incident remediations with a few lines of code.
Product
1 min
Shoreline Fleetwide Debugging
Run a single command across the entire fleet to diagnose incidents more quickly.
Product
1 min
Shoreline Fleetwide Repairs
Safely fix incidents across your entire fleet, with less overhead, and with fewer errors.
Product
1 min
Shoreline Actionable Alarms
Shoreline Alarms identify issues with high specificity so that they are immediately actionable.
Product
2 min
Shoreline Incident Automation Overview
Shoreline’s Incident Automation Platform was built to reduce manual and repetitive work, so that you can repair issues faster, increase team productivity, and eliminate thousands of hours of degraded service.
Product
9 min
Datadog + Shoreline Integration Demo
See issues and act in real-time, directly from Datadog
Product
13 min
Shoreline Incident Automation Demo
See Shoreline in action, debugging an incident and automating remediations in a fraction of the usual time.
Product
1 min
Using Shoreline.io to root-cause transient issues (like JVM garbage collection)
Shoreline makes it easy to collect diagnostic information when you're doing a root-cause analysis of an issue.
Product
2 min
Niall Murphy on his experience with Shoreline's Incident Automation Platform
Niall Murphy, former SRE at Google and Microsoft and author of the O'Reilly book, Site Reliability Engineering, shares his experience of using Shoreline's Incident Automation Platform.
Product
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Find more Shoreline resources
Looking for more information? Visit our other resource sections
Webinars
Learn from industry leaders, explore our product, and revolutionize your production ops.
Events
See Shoreline up close and in person at some of the biggest industry events of the year.
News
Read all the latest Shoreline news and media coverage.