Solutions

Pre-built automations.

Your head start to fixing common issues impacting availability. Shoreline’s Op Pack library offers open source blueprints for automating away your most common incidents.
Metrics
Monitor the right data in real-time so that you can proactively address incidents.
Bots
Bots make it easy to tie repair actions to very specific alarms.
Alarms
Very specific, granular alarms so that you can have the confidence to automate repair actions.
Notebooks
Step-by-step recipes for both diagnosing and repairing incidents.
Actions
Pre-written actions for both diagnosis and repair.
Tests
Included test actions allow you trigger incidents in development so that you can test the alarm and repair actions.
Scripts
Scripts come in multiple languages, like Shell and Python, and can be used on-box to either diagnose or repair incidents.
Disk Resize/Disk Clean
Disk full incidents can lead to wide-spread outages and data loss that can damage customer experiences and lose revenue.
Disk Resize/Disk Clean
Disk full incidents can lead to wide-spread outages and data loss that can damage customer experiences and lose revenue.
Major outage
Intermittent JVM Memory Issues
JVMs often face memory issues that can lead to hours of SSH-ing into box after box trying to catch the issue as it happens.
Intermittent JVM Memory Issues
JVMs often face memory issues that can lead to hours of SSH-ing into box after box trying to catch the issue as it happens.
Major outage
Kubernetes Debugging
There are a million things that can break within your Kubernetes cluster. Don’t waste time searching for that needle in the haystack.
Kubernetes Debugging
There are a million things that can break within your Kubernetes cluster. Don’t waste time searching for that needle in the haystack.
Kubernetes
Networking Issues
Network related issues are often hard to diagnose, and can lead to a very bad experience for customers.
Networking Issues
Network related issues are often hard to diagnose, and can lead to a very bad experience for customers.
Networking
Pod Out of Memory (OOM)
Many different types of application errors can lead to out of memory errors (OOMs) in Kubernetes.
Pod Out of Memory (OOM)
Many different types of application errors can lead to out of memory errors (OOMs) in Kubernetes.
Kubernetes
Certificate Rotation
Sooner or later every company gets bitten by expired certificates and when they do, it can cause a catastrophic outage.
Certificate Rotation
Sooner or later every company gets bitten by expired certificates and when they do, it can cause a catastrophic outage.
Major outage
Detect Cryptocurrency Mining Operations
Unauthorized cryptocurrency miners must be stopped from abusing free tiers of cloud service providers.
Detect Cryptocurrency Mining Operations
Unauthorized cryptocurrency miners must be stopped from abusing free tiers of cloud service providers.
Networking
Pods Stuck in Terminating
When Kubernetes pods won’t leave the terminating state, they must be identified and safely drained.
Pods Stuck in Terminating
When Kubernetes pods won’t leave the terminating state, they must be identified and safely drained.
Kubernetes
Kafka Lag
Restart slow or broken consumers when systems are falling behind in processing messages through a queue.
Kafka Lag
Restart slow or broken consumers when systems are falling behind in processing messages through a queue.
Major outage
Kubernetes Node Retirement
When AWS Systems Manager marks a node for retirement, companies must gracefully terminate work on that node.
Kubernetes Node Retirement
When AWS Systems Manager marks a node for retirement, companies must gracefully terminate work on that node.
Kubernetes
Kubernetes Pods Restarting Too Often
Detect pod restart loops and capture diagnostics to identify the root cause.
Kubernetes Pods Restarting Too Often
Detect pod restart loops and capture diagnostics to identify the root cause.
Kubernetes
Delete Old Argo Pods
Argo makes declaratively managing workflows easy, but it can leave behind many stale pods after workflow execution.
Delete Old Argo Pods
Argo makes declaratively managing workflows easy, but it can leave behind many stale pods after workflow execution.
Kubernetes
Restart CoreDNS Service
CoreDNS, the default Kubernetes DNS service, can degrade in performance with too many calls causing massive latency.
Restart CoreDNS Service
CoreDNS, the default Kubernetes DNS service, can degrade in performance with too many calls causing massive latency.
Kubernetes
Log Processing at the Edge
Many production incidents are caused by issues that can be identified by analyzing log files. Unfortunately, centralized logging can be very expensive.
Log Processing at the Edge
Many production incidents are caused by issues that can be identified by analyzing log files. Unfortunately, centralized logging can be very expensive.
Debugging
Privileged Container Check
Flag any container or pod running in privileged mode.
Privileged Container Check
Flag any container or pod running in privileged mode.
Security
Delete Unused EBS Volumes / Snapshots
Eliminate costs from unused resources.
Delete Unused EBS Volumes / Snapshots
Eliminate costs from unused resources.
Cost savings
Kafka Topic Management
When the length of your Kafka topic is too long, applications may begin to break.
Kafka Topic Management
When the length of your Kafka topic is too long, applications may begin to break.
Major outage
Manage Data Transfer Costs
Detect increased data transfer volumes, and pinpoint the reasons.
Manage Data Transfer Costs
Detect increased data transfer volumes, and pinpoint the reasons.
Cost savings
Excessive Use of On-Demand Hosts
Determine if converting on-demand VMs to reserved instances would create substantial savings.
Excessive Use of On-Demand Hosts
Determine if converting on-demand VMs to reserved instances would create substantial savings.
Cost savings
Open Port Check
Ports can easily be opened unintentionally in a development environment, especially port 22 for SSH and port 3389 for remote login.
Open Port Check
Ports can easily be opened unintentionally in a development environment, especially port 22 for SSH and port 3389 for remote login.
Security
Reclaim Idle Hosts
Mark low utilization virtual machine instances for inactivity, then terminate them.
Reclaim Idle Hosts
Mark low utilization virtual machine instances for inactivity, then terminate them.
Cost savings
Connections from Unexpected Ports
Detect network connections on ports that are not found on an approved list.
Connections from Unexpected Ports
Detect network connections on ports that are not found on an approved list.
Security
Elastic Sharding Replica Management
Determine when your elastic search clusters have too few replicas per shard, and automatically kick off healing.
Elastic Sharding Replica Management
Determine when your elastic search clusters have too few replicas per shard, and automatically kick off healing.
Major outage
Memory Exhaustion
Running out of memory rapidly degrades customer experience and must be preempted.
Memory Exhaustion
Running out of memory rapidly degrades customer experience and must be preempted.
Kubernetes
Disk Failures in kern.log
Detect when a disk has errors or has entirely failed by inspecting the OS’s kern.log. Automatically capture these events and kick off fixes such as recycling the VM.
Disk Failures in kern.log
Detect when a disk has errors or has entirely failed by inspecting the OS’s kern.log. Automatically capture these events and kick off fixes such as recycling the VM.
Debugging
Config Drift
Ensure observed state matches desired state on your system configuration.
Config Drift
Ensure observed state matches desired state on your system configuration.
Debugging
Processes Consuming Too Many Resources
Determine if the system is using too much memory or CPU at the process level.
Processes Consuming Too Many Resources
Determine if the system is using too much memory or CPU at the process level.
Debugging
Network Failures in kern.log
Detect when a network interface has errors or has entirely failed by inspecting the OS’s kern.log. Automatically capture these events and initiate fixes such as recycling the VM.
Network Failures in kern.log
Detect when a network interface has errors or has entirely failed by inspecting the OS’s kern.log. Automatically capture these events and initiate fixes such as recycling the VM.
Debugging
Server Drift
Restore uniformity when configuration files, databases and data sources on your VMs and containers differ.
Server Drift
Restore uniformity when configuration files, databases and data sources on your VMs and containers differ.
Debugging
Rightsize POD CPU & Memory Allocations
Automatically reduce pod CPU and/or memory limits that are set too high.
Rightsize POD CPU & Memory Allocations
Automatically reduce pod CPU and/or memory limits that are set too high.
Security
Users with Root Access Check
Flag any VM or container which has server processes running as a user with root permissions.
Users with Root Access Check
Flag any VM or container which has server processes running as a user with root permissions.
Security
Process List
Server environments can often be challenging to run. Sometimes processes silently die. Other times old versions of processes are left running.
Process List
Server environments can often be challenging to run. Sometimes processes silently die. Other times old versions of processes are left running.
Security
Endpoints Unreachable
Determine when there are no endpoints behind your Kubernetes service or these endpoints have become unreachable.
Endpoints Unreachable
Determine when there are no endpoints behind your Kubernetes service or these endpoints have become unreachable.
Kubernetes
DNS Lag
Trigger rolling restarts of the DNS servers when they are responding slowly and causing widespread system issues.
DNS Lag
Trigger rolling restarts of the DNS servers when they are responding slowly and causing widespread system issues.
Major outage
IP Exhaustion
Clear away failed jobs or pods that are consuming too many IP addresses.
IP Exhaustion
Clear away failed jobs or pods that are consuming too many IP addresses.
Networking
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.