Operational Defect Database

BugZero found this defect 1171 days ago.

MongoDB | 1640575

Calling move/split/mergeChunk after one another from different MongoS is not causally consistent

Last update date:

3/11/2024

Affected products:

MongoDB Server

Affected releases:

3.6.22

4.2.12

4.0.23

4.4.4

4.9.0-alpha4

5.0.1

Fixed releases:

No fixed releases provided.

Description:

Info

Note: This is not a correctness bug, just an annoyance for tests and for people who do manual chunk operations outside of the Balancer. The move/split/mergeChunk set of commands only involve the chunk's owner shard and the config server, but they don't propagate any kind of causality token to the client, similar to causally-consistent writes for example. This means that if one issues a split on one MongoS and then move from another, the move may actually not see the effects of the split and return an error that chunk with the exact specified bounds doesn't exist. This is not a problem for the Balancer, because (a) it always runs on the config server primary, which is as up-to-date as can be and (b) because it almost always runs on the same node.

Top User Comments

xgen-internal-githook commented on Sun, 8 Aug 2021 10:46:38 +0000: Author: {'name': 'Simon Graetzer', 'email': 'simon.gratzer@mongodb.com'} Message: SERVER-54979 Let chunkSplit+ splitVector participate in the shard versioning protocol Branch: master https://github.com/mongodb/mongo/commit/8974dbdec0286ac47086b794c49214a9f26677bc kaloian.manassiev commented on Thu, 22 Jul 2021 16:13:55 +0000: Passing on to simon.gratzer to confirm that with his changes for split/merge to participate in the shard versioning protocol, this has now gone away. kaloian.manassiev commented on Thu, 1 Apr 2021 08:52:28 +0000: There is really no good way to fix this other than making all refreshes from the MongoS to be linearisable so they always read from the latest primary in order to ensure they see all the previous effects. Given that this is just a rare annoyance for tests, I am putting this for sharding/product sync to get a permission to close it as Won't Fix.

Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Backlog

Learn More

Search:

...