Operational Defect Database

BugZero found this defect 58 days ago.

MongoDB | 2620209

A normal secondary with hidden = false couldn't receive query with readPreference = secondaryPreferred

Last update date:

3/22/2024

Affected products:

MongoDB Server

Affected releases:

4.4.30

6.0.15

7.0.8

5.0.27

Fixed releases:

No fixed releases provided.

Description:

Info

We discovered a strange phenomenon. After in-depth research, we found that it was a bug in the implementation of ScanningReplicaSetMonitor. First, prepare a single-shard cluster, with one primary and two secondary, and a script query secondary, like this ``` import pymongo from pymongo import MongoClient import time c = MongoClient("mongodb://xxxxx/admin?readPreference=secondaryPreferred") while True: for _ in c.db.coll。find(): pass ``` Then , let’s look at a series of common operations and the phenomena behind them. *  Set secondary 1 hidden = true , a few seconds later,set secondary 1 hidden = false. at this time , I will find that only node 2 has query operation. And node 1 have a large replicaSetPingTimesMillis in mongos>db.adminCommand("getDiagnosticData").data.connPoolStats.replicaSetPingTimesMillis { "mongo109" : "x1:27017" : 2.459, "x2:27017" : 2.289, "x3:27017" : 9223372036854776 }, } Restart node 1 Then everything will return to normal, queries are distributed normally, and replicaSetPingTimesMillis is normal. Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ; at this time , I will find queries are distributed normally but all secondary replicaSetPingTimesMillis is large ; * After restart node 1 ,onle secondary 1 has query operation. The key reason behind the above phenomenon is that : * ServerPingMonitor::onTopologyDescriptionChangedEvent just remove monitors that are missing from the topology ; but don't add new monitors; int struct LatencyWindow , Due to the following code, there will be a Window (max(),max()) upper = (lowerBound == HelloRTT::max()) ? lowerBound : lowerBound + windowWidth; I think ServerPingMonitor is a bug, LatencyWindow is a feature

Top User Comments


Steps to Reproduce

a single-shard cluster, with one primary and two secondary, and a script query secondary Set secondary 1 hidden = true , a few seconds later,set secondary 1 hidden = false. Restart node 1 Set secondary 1 and secondary 2 both hidden = true and a few seconds later revert to hidden = false ; Restart node 1

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Needs Verification

Learn More

Search:

...