Operational Defect Database

BugZero found this defect 39 days ago.

MongoDB | 2638371

Unsatisfied write concern waiters can grow the replication waiter list unbounded

Last update date:

4/10/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

When replication waits for write concern we'll call awaitReplication, which inserts a new 'waiter' into a std::multimap called the _replicationWaiterList. Waiters are sorted by opTime, and when a primary advances its commit point it will check if any write concern waiters in the map can be satisfied with the new optime. It iterates through the map until it hits a write concern with an optime greater than the new optime to check against, at which point it will end the check. Waiters in the list are only ever removed once the write concern is satisfied or there is an error returned from the function we are calling on the waiter. Even if a request with write concern times out (the future hits the deadline), the waiter exists in the list until its satisfied. There are cases where unsatisfiable write concern values exist in the waiter list for an extended period of time, requiring any call to _wakeReadyWaiters to iterate through a large number of write concerns. This iteration happens under the replication coordinator mutex, slowing down any operations that are waiting on the mutex. Consider a write concern value greater than w: majority: in a 3-node replica set with 1 node down, performing writes with w:3 will result in a timeout, with an unsatisfied write concern. The waiter will still exist in the list until the third node is brought back up. Any new majority write that moves the primary's timestamp forward will need to iterate through the list containing timed out write concern waiters. A possible solution is to remove a waiter from the list if the future deadline is exceeded here.

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Needs Scheduling

Learn More

Search:

...