This incident type refers to an issue with a MongoDB replica set, where one or more members of the set have been marked as unhealthy. This can happen due to various causes, such as network issues, hardware failures, or configuration problems. When this occurs, it can impact the availability and performance of the database system, which can lead to data loss or corruption. Prompt resolution of this incident is necessary to prevent further damage and restore the normal functioning of the replica set.
Parameters
Debug
Check if MongoDB is running
Check the replica set status
Check the replica set configuration
Check the replica set members
Get the MongoDB log file
Check the disk usage
Check the memory usage
Check the MongoDB process ID
Check the CPU usage
Check the MongoDB replica set members status
Check the MongoDB replica set members health
Check the MongoDB replica set members state
Check the MongoDB version
Check the MongoDB storage engine
Check the MongoDB memory usage
Check the MongoDB network usage
Check the MongoDB oplog size
Check the MongoDB oplog window
Check the MongoDB oplog length
Check the MongoDB oplog utilization
Check the MongoDB oplog capacity
Check the MongoDB oplog status
Check the MongoDB oplog sync status
Check the MongoDB oplog lag time
Check the MongoDB oplog sync source
Check the MongoDB oplog sync state
Repair
Define the IP addresses of the MongoDB replica members
Check the network connectivity between MongoDB replica members
Verify the MongoDB configuration file for replica set configuration and ensure that it has correct replica set name, members list, and priority settings.
Restart the MongoDB replica set members one by one to ensure that the latest data is replicated to all members and the issue is resolved.
Learn more
Related Runbooks
Check out these related runbooks to help you debug and resolve similar issues.