Operational Defect Database

BugZero found this defect 73 days ago.

MongoDB | 2602527

Queries which run on secondaries and exceed orphanCleanupDelaySecs may miss documents which were donated by chunk migration

Last update date:

3/15/2024

Affected products:

MongoDB Server

Affected releases:

4.2.0

4.4.0

5.0.0

6.0.0

7.0.0

Fixed releases:

No fixed releases provided.

Description:

Info

The shard version protocol guarantees that a query will see each document from the shard which at the very beginning of query execution originally owned the document and the query won't see the same document from other shards even if the chunk range is later migrated to them. This means a query in a sharded cluster won't ever return the same document twice. However, range deletion will delete the stale copy of the document from the donor shard 15 minutes (default value for orphanCleanupDelaySecs server parameter) after the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. This means a query in a sharded cluster may return incomplete results in the following situations: Query runs on a secondary for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started. Query begins running on a primary and the primary steps down. Query then runs on the former primary, now secondary, for longer than 15 minutes (orphanCleanupDelaySecs) and a chunk migration had occurred after the query started. Query runs on a secondary for any amount of time and a chunk migration is run with _waitForDelete == true either manually or by the balancer. Setting the _waitForDelete option to true results in range deletion deleting the stale copy of the document from the donor shard without waiting for 15 minutes (orphanCleanupDelaySecs). Instead the range deleter only waits until the last remaining query which was using the placement information from prior to the chunk migration completing is done running on the primary of the donor shard. The _waitForDelete option is documented as only being meant for internal testing purposes though. https://www.mongodb.com/docs/manual/reference/command/moveChunk/ https://www.mongodb.com/docs/manual/tutorial/manage-sharded-cluster-balancer/#wait-for-delete

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Investigating

Learn More

Search:

...