Runbook

High replication delay in PostgreSQL service

Back to Runbooks

Overview

This incident type refers to a high replication delay in a PostgreSQL service. Replication delay is the time it takes for a change made in the primary database to be replicated to the standby database. When the delay is abnormally high, it can indicate a problem with the replication process or the database itself. This can lead to data inconsistencies and other issues that can impact the performance and availability of the service. The incident usually requires investigation and troubleshooting to identify the root cause of the delay and to implement a solution to resolve the issue.

Parameters

Debug

Connect to the database server and run psql command

Check the replication status on the master

Check the replication status on the standby

Check the replication lag time on the standby

Repair

Restart the PostgreSQL service

Restart the replication process by resetting the standby server to the latest checkpoint on the primary server. This can be done by stopping the standby server, removing all files in the PostgreSQL data directory, and starting the server again.

Verify that the standby server is up to date with the primary server by checking the WAL files on the standby server. If there are any discrepancies, restore the missing files from the primary server.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.