Operational Defect Database

BugZero found this defect 61 days ago.

MongoDB | 2613932

checkMetadataConsistency interleaves with collMod during upgrade / downgrade

Last update date:

3/19/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

The aggregate commands used by checkMetadataConsistency [1] [2] don't set a {readConcern: {level: 'snapshot', atClusterTime: }}. So that means that the prior reference metadata captured by the shard may be stale if the metadata is modified before the aggregate command runs. As a result, it's possible for a collMod (or some other metadata modifying operation) that acts on a shard directly without taking the DDL lock, such as during an upgrade / downgrade, to interleave with the checkMetadataConsistency command, and create a situation where the previous metadata doesn't match with the new metadata, even for the same shard. It's not clear whether the bug here is that checkMetadataConsistency doesn't use a snapshot, or that collMod during upgrade / downgrade doesn't take the DDL lock Reproducer where I issue a collMod to the shard directly, to make it interleave with checkMetadataConsistency, and checkMetadataConsistency complains that shard 0's metadata doesn't match its own metadata: // Shard the coll mongos> db.adminCommand({ shardCollection: 'test.mycoll', key: {_id: 1} }) // On the shard that the collection lives, set a failpoint here: // https://github.com/mongodb/mongo/blob/aadd0e171ac7aa8982618db9aad0dab283d7cdeb/src/mongo/db/s/metadata_consistency_util.cpp#L649 shard-rs0:primary> db.adminCommand({ configureFailPoint: "pauseBeforeAgg", mode: "alwaysOn" }); // Try to check metadata consistency - this will hang on the failpoint. mongos> db.checkMetadataConsistency(); // Collmod on the shard directly. This is something that upgrading / downgrading // would usually trigger: shard-rs0:primary> db.runCommand({collMod: 'mycoll', validator: {a: {$gt: -10}}}); // Turn off the failpoint to let checkMetadataConsistency complete shard-rs0:primary> db.adminCommand({ configureFailPoint: "pauseBeforeAgg", mode: "off" }); // checkMetadataConsistency would have errored: { "cursor" : { "id" : NumberLong(0), "ns" : "test.$cmd.aggregate", "firstBatch" : [ { "type" : "CollectionOptionsMismatch", "description" : "Collection registered on the sharding catalog not found on the given shards", "details" : { "namespace" : "test.mycoll", "options" : [ { "shards" : [ "shard-rs0" ], "options" : { "uuid" : UUID("095a4222-0ba3-4d22-b295-fbbf010ce6f9"), "validator" : { "a" : { "$gt" : -10 } }, "validationLevel" : "strict", "validationAction" : "error" } }, { "shards" : [ "shard-rs0" ], "options" : { "uuid" : UUID("095a4222-0ba3-4d22-b295-fbbf010ce6f9") } } ] } } ] }, "ok" : 1, ... }

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Needs Scheduling

Learn More

Search:

...