Runbook

MongoDB Oplog Growth Impacting Replication

Back to Runbooks

Overview

This incident type refers to a situation where the MongoDB Oplog, a capped collection in MongoDB that records all the write operations to a database, grows rapidly and impacts the replication process. This can occur for various reasons, such as high write volume or inefficient oplog retention settings. It can result in replication lag and failures, impacting the availability and consistency of the database.

Parameters

Debug

Check the current oplog size and usage

Check the current replication status and lag

Check the current and historical oplog growth rate

Check the oplog retention settings

Optimize the oplog retention settings to reduce oplog growth

Repair

Increase the size of the oplog and adjust the retention settings to accommodate the write volume and replication needs of the system.

Implement sharding or horizontal scaling to distribute the load across multiple nodes and reduce the write volume on individual nodes.

Learn more

Related Runbooks

Check out these related runbooks to help you debug and resolve similar issues.