Operational Defect Database

BugZero found this defect 96 days ago.

MongoDB | 2575483

DDL coordinators can re-commit their changes in a stepdown

Last update date:

3/14/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

Suppose we have a shard that's attempting to commit a DDL operation. Before doing so we may refresh data from the config shard in order to verify if a previous node already did so and failed after doing the operation on the config shard. This behavior is problematic if we rely on the gossiped Vector Clock since we could end up mistakenly failing the check above and performing the same operation twice. This can occur in the following scenario: Shard S1 has three nodes. Config Shard CS has three nodes. S1's Primary commits the DDL operation on CS with majority writeConcern and performs a stepdown before it persists the new vector clock. S1's new primary chosen has the previous Vector Clock. S1's new primary refreshes its catalog metadata by contacting a stale CS node that is still observing the old Vector Clock and is at a stale majority timestamp. This can happen because we do not have a PrimaryOnly readPreference for this read. S1's new primary fails the check since from it's perspective we're still in the old pre-commit world. S1's new primary then re-commits the DDL operation.

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Closed

Learn More

Search:

...