---
id: ffe2f692-a079-402e-bb01-7306cd49cb74
---

# MongoDB replica member unhealthy incident.
---

This incident type refers to an issue with a MongoDB replica set, where one or more members of the set have been marked as unhealthy. This can happen due to various causes, such as network issues, hardware failures, or configuration problems. When this occurs, it can impact the availability and performance of the database system, which can lead to data loss or corruption. Prompt resolution of this incident is necessary to prevent further damage and restore the normal functioning of the replica set.

### Parameters
```shell
# Environment Variables

export REPLICA_SET_NAME="PLACEHOLDER"

export MEMBER_HOSTNAME="PLACEHOLDER"

export PATH_TO_CONFIG_FILE="PLACEHOLDER"

export REPLICA_SET_PRIORITIES="PLACEHOLDER"

export REPLICA_SET_MEMBERS="PLACEHOLDER"
```

## Debug

### Check if MongoDB is running
```shell
systemctl status mongod
```

### Check the replica set status
```shell
mongo --eval "rs.status()"
```

### Check the replica set configuration
```shell
mongo --eval "rs.conf()"
```

### Check the replica set members
```shell
mongo --eval "rs.isMaster()"
```

### Get the MongoDB log file
```shell
tail -f /var/log/mongodb/mongod.log
```

### Check the disk usage
```shell
df -h
```

### Check the memory usage
```shell
free -h
```

### Check the MongoDB process ID
```shell
pgrep mongod
```

### Check the CPU usage
```shell
top
```

### Check the MongoDB replica set members status
```shell
mongo --eval "rs.status().members"
```

### Check the MongoDB replica set members health
```shell
mongo --eval "rs.status().members.forEach(function(member) { print(member.name + ' is ' + member.stateStr); })"
```

### Check the MongoDB replica set members state
```shell
mongo --eval "rs.status().members.forEach(function(member) { print(member.name + ' is ' + member.stateStr); })"
```

### Check the MongoDB version
```shell
mongo --eval "db.version()"
```

### Check the MongoDB storage engine
```shell
mongo --eval "db.serverStatus().storageEngine"
```

### Check the MongoDB memory usage
```shell
mongo --eval "db.serverStatus().mem"
```

### Check the MongoDB network usage
```shell
mongo --eval "db.serverStatus().network"
```

### Check the MongoDB oplog size
```shell
mongo --eval "db.getReplicationInfo().logSizeMB"
```

### Check the MongoDB oplog window
```shell
mongo --eval "db.getReplicationInfo().timeDiff"
```

### Check the MongoDB oplog length
```shell
mongo --eval "db.getReplicationInfo().oplogLength"
```

### Check the MongoDB oplog utilization
```shell
mongo --eval "db.getReplicationInfo().usedMB"
```

### Check the MongoDB oplog capacity
```shell
mongo --eval "db.getReplicationInfo().totalMB"
```

### Check the MongoDB oplog status
```shell
mongo --eval "db.printReplicationInfo()"
```

### Check the MongoDB oplog sync status
```shell
mongo --eval "rs.printSlaveReplicationInfo()"
```

### Check the MongoDB oplog lag time
```shell
mongo --eval "rs.printSlaveReplicationInfo().syncMillis"
```

### Check the MongoDB oplog sync source
```shell
mongo --eval "rs.printSlaveReplicationInfo().source"
```

### Check the MongoDB oplog sync state
```shell
mongo --eval "rs.printSlaveReplicationInfo().state"
```
## Repair

### Define the IP addresses of the MongoDB replica members
```shell
PRIMARY="PLACEHOLDER"

SECONDARY="PLACEHOLDER"

ARBITER="PLACEHOLDER"
```

### Check the network connectivity between MongoDB replica members
```shell
if ping -c 3 $PRIMARY && ping -c 3 $SECONDARY && ping -c 3 $ARBITER; then

    echo "Network connectivity between MongoDB replica members is OK"

else

    echo "Network connectivity between MongoDB replica members is not OK"

    # Restart network service to resolve the issue

    service network restart

    echo "Network service restarted"

fi
```

### Verify the MongoDB configuration file for replica set configuration and ensure that it has correct replica set name, members list, and priority settings.
```shell
#!/bin/bash

# Set the replica set name, members list, and priority settings

REPLICA_SET_NAME=${REPLICA_SET_NAME}

REPLICA_SET_MEMBERS=${REPLICA_SET_MEMBERS}

REPLICA_SET_PRIORITIES=${REPLICA_SET_PRIORITIES}

# Verify the MongoDB configuration file for replica set configuration

if grep -Fxq "replSetName=$REPLICA_SET_NAME" /etc/mongod.conf

then

    echo "Replica set name is already set correctly"

else

    sed -i "s/#replication:/replication:\n  replSetName: $REPLICA_SET_NAME/" /etc/mongod.conf

    echo "Replica set name has been updated"

fi

if grep -Fxq "members:\n  - _id: 0\n    host: $REPLICA_SET_MEMBERS[0]\n    priority: $REPLICA_SET_PRIORITIES[0]" /etc/mongod.conf

then

    echo "Replica set members are already set correctly"

else

    sed -i "s/members:/members:\n  - _id: 0\n    host: $REPLICA_SET_MEMBERS[0]\n    priority: $REPLICA_SET_PRIORITIES[0]\n  - _id: 1\n    host: $REPLICA_SET_MEMBERS[1]\n    priority: $REPLICA_SET_PRIORITIES[1]\n  - _id: 2\n    host: $REPLICA_SET_MEMBERS[2]\n    priority: $REPLICA_SET_PRIORITIES[2]/" /etc/mongod.conf

    echo "Replica set members have been updated"

fi

# Restart MongoDB service to apply changes

systemctl restart mongod

echo "MongoDB service has been restarted"

```
### Restart the MongoDB replica set members one by one to ensure that the latest data is replicated to all members and the issue is resolved.
```shell
#!/bin/bash

# Define the replica set members

MEMBER1="PLACEHOLDER"
 
MEMBER2="PLACEHOLDER"

MEMBER3="PLACEHOLDER"

replica_set_members=(${MEMBER1} ${MEMBER2} ${MEMBER3})

# Loop through each replica set member and restart them

for member in "${replica_set_members[@]}"

do

    echo "Restarting MongoDB replica set member: $member"

    ssh $member "sudo systemctl restart mongodb"

done

echo "MongoDB replica set members have been restarted."

```

This incident type refers to an issue with a MongoDB replica set, where one or more members of the set have been marked as unhealthy. This can happen due to various causes, such as network issues, hardware failures, or configuration problems. When this occurs, it can impact the availability and performance of the database system, which can lead to data loss or corruption. Prompt resolution of this incident is necessary to prevent further damage and restore the normal functioning of the replica set.


The Redis too many masters incident occurs when there are too many master nodes in a Redis cluster, leading to connection issues and potential data loss. This can happen due to misconfiguration, network issues, or other factors, and requires immediate attention to prevent further damage.


Redis too many masters incident

This incident type refers to an issue with Redis replication, which means that there is a problem with the synchronization of data between Redis instances. This issue could impact the availability and performance of the system and may require immediate attention to restore the replication and ensure data consistency. The incident could be caused by various factors, such as network problems, hardware failures, or configuration issues. The incident must be investigated and resolved as soon as possible to avoid any data loss or downtime.


Redis replication broken incident.

The Redis missing master incident occurs when the Redis cluster has no node marked as master. This can cause problems with the Redis service and may result in service disruptions or failures. It is important to address this issue promptly to ensure the smooth operation of the Redis service.


Redis missing master incident

A Redis instance down incident refers to a situation where the Redis server, which is a popular open-source in-memory data structure store, is not functioning or accessible. This can cause interruption in the service and impact the performance of applications that rely on Redis for caching or data storage. Typically, this type of incident is considered high urgency and requires immediate attention from the responsible team to investigate, diagnose, and resolve the issue.


Redis instance down incident

This incident type refers to an issue with Redis where one or more slave instances have become disconnected, resulting in replication failure. This can cause data inconsistencies and may require immediate attention to restore normal functioning. The incident may be caused by a variety of factors, such as network issues, server failures, or misconfiguration.


Redis disconnected slaves incident

```shell
# Environment Variables

export REPLICA_SET_NAME="PLACEHOLDER"

export MEMBER_HOSTNAME="PLACEHOLDER"

export PATH_TO_CONFIG_FILE="PLACEHOLDER"

export REPLICA_SET_PRIORITIES="PLACEHOLDER"

export REPLICA_SET_MEMBERS="PLACEHOLDER"
```


### Check if MongoDB is running

```shell
systemctl status mongod
```

### Check the replica set status

```shell
mongo --eval "rs.status()"
```

### Check the replica set configuration

```shell
mongo --eval "rs.conf()"
```

### Check the replica set members

```shell
mongo --eval "rs.isMaster()"
```

### Get the MongoDB log file

```shell
tail -f /var/log/mongodb/mongod.log
```

### Check the disk usage

```shell
df -h
```

### Check the memory usage

```shell
free -h
```

### Check the MongoDB process ID

```shell
pgrep mongod
```

### Check the CPU usage

```shell
top
```

### Check the MongoDB replica set members status

```shell
mongo --eval "rs.status().members"
```

### Check the MongoDB replica set members health

```shell
mongo --eval "rs.status().members.forEach(function(member) { print(member.name + ' is ' + member.stateStr); })"
```

### Check the MongoDB replica set members state

```shell
mongo --eval "rs.status().members.forEach(function(member) { print(member.name + ' is ' + member.stateStr); })"
```

### Check the MongoDB version

```shell
mongo --eval "db.version()"
```

### Check the MongoDB storage engine

```shell
mongo --eval "db.serverStatus().storageEngine"
```

### Check the MongoDB memory usage

```shell
mongo --eval "db.serverStatus().mem"
```

### Check the MongoDB network usage

```shell
mongo --eval "db.serverStatus().network"
```

### Check the MongoDB oplog size

```shell
mongo --eval "db.getReplicationInfo().logSizeMB"
```

### Check the MongoDB oplog window

```shell
mongo --eval "db.getReplicationInfo().timeDiff"
```

### Check the MongoDB oplog length

```shell
mongo --eval "db.getReplicationInfo().oplogLength"
```

### Check the MongoDB oplog utilization

```shell
mongo --eval "db.getReplicationInfo().usedMB"
```

### Check the MongoDB oplog capacity

```shell
mongo --eval "db.getReplicationInfo().totalMB"
```

### Check the MongoDB oplog status

```shell
mongo --eval "db.printReplicationInfo()"
```

### Check the MongoDB oplog sync status

```shell
mongo --eval "rs.printSlaveReplicationInfo()"
```

### Check the MongoDB oplog lag time

```shell
mongo --eval "rs.printSlaveReplicationInfo().syncMillis"
```

### Check the MongoDB oplog sync source

```shell
mongo --eval "rs.printSlaveReplicationInfo().source"
```

### Check the MongoDB oplog sync state

```shell
mongo --eval "rs.printSlaveReplicationInfo().state"
```


### Define the IP addresses of the MongoDB replica members

```shell
PRIMARY="PLACEHOLDER"

SECONDARY="PLACEHOLDER"

ARBITER="PLACEHOLDER"
```

### Check the network connectivity between MongoDB replica members

```shell
if ping -c 3 $PRIMARY && ping -c 3 $SECONDARY && ping -c 3 $ARBITER; then

    echo "Network connectivity between MongoDB replica members is OK"

else

    echo "Network connectivity between MongoDB replica members is not OK"

    # Restart network service to resolve the issue

    service network restart

    echo "Network service restarted"

fi
```

### Verify the MongoDB configuration file for replica set configuration and ensure that it has correct replica set name, members list, and priority settings.

```shell
#!/bin/bash

# Set the replica set name, members list, and priority settings

REPLICA_SET_NAME=${REPLICA_SET_NAME}

REPLICA_SET_MEMBERS=${REPLICA_SET_MEMBERS}

REPLICA_SET_PRIORITIES=${REPLICA_SET_PRIORITIES}

# Verify the MongoDB configuration file for replica set configuration

if grep -Fxq "replSetName=$REPLICA_SET_NAME" /etc/mongod.conf

then

    echo "Replica set name is already set correctly"

else

    sed -i "s/#replication:/replication:\n  replSetName: $REPLICA_SET_NAME/" /etc/mongod.conf

    echo "Replica set name has been updated"

fi

if grep -Fxq "members:\n  - _id: 0\n    host: $REPLICA_SET_MEMBERS[0]\n    priority: $REPLICA_SET_PRIORITIES[0]" /etc/mongod.conf

then

    echo "Replica set members are already set correctly"

else

    sed -i "s/members:/members:\n  - _id: 0\n    host: $REPLICA_SET_MEMBERS[0]\n    priority: $REPLICA_SET_PRIORITIES[0]\n  - _id: 1\n    host: $REPLICA_SET_MEMBERS[1]\n    priority: $REPLICA_SET_PRIORITIES[1]\n  - _id: 2\n    host: $REPLICA_SET_MEMBERS[2]\n    priority: $REPLICA_SET_PRIORITIES[2]/" /etc/mongod.conf

    echo "Replica set members have been updated"

fi

# Restart MongoDB service to apply changes

systemctl restart mongod

echo "MongoDB service has been restarted"

```

### Restart the MongoDB replica set members one by one to ensure that the latest data is replicated to all members and the issue is resolved.

```shell
#!/bin/bash

# Define the replica set members

MEMBER1="PLACEHOLDER"
 
MEMBER2="PLACEHOLDER"

MEMBER3="PLACEHOLDER"

replica_set_members=(${MEMBER1} ${MEMBER2} ${MEMBER3})

# Loop through each replica set member and restart them

for member in "${replica_set_members[@]}"

do

    echo "Restarting MongoDB replica set member: $member"

    ssh $member "sudo systemctl restart mongodb"

done

echo "MongoDB replica set members have been restarted."

```


MongoDB replica member unhealthy incident.

Overview

Parameters

Debug

Check if MongoDB is running

Check the replica set status

Check the replica set configuration

Check the replica set members

Get the MongoDB log file

Check the disk usage

Check the memory usage

Check the MongoDB process ID

Check the CPU usage

Check the MongoDB replica set members status

Check the MongoDB replica set members health

Check the MongoDB replica set members state

Check the MongoDB version

Check the MongoDB storage engine

Check the MongoDB memory usage

Check the MongoDB network usage

Check the MongoDB oplog size

Check the MongoDB oplog window

Check the MongoDB oplog length

Check the MongoDB oplog utilization

Check the MongoDB oplog capacity

Check the MongoDB oplog status

Check the MongoDB oplog sync status

Check the MongoDB oplog lag time

Check the MongoDB oplog sync source

Check the MongoDB oplog sync state

Repair

Define the IP addresses of the MongoDB replica members

Check the network connectivity between MongoDB replica members

Verify the MongoDB configuration file for replica set configuration and ensure that it has correct replica set name, members list, and priority settings.

Restart the MongoDB replica set members one by one to ensure that the latest data is replicated to all members and the issue is resolved.

Learn more

Related Runbooks

Redis too many masters incident

Redis replication broken incident.

Redis missing master incident

Redis instance down incident