Operational Defect Database

BugZero found this defect 68 days ago.

MongoDB | 2606647

capped collection write workload can negatively impact other workload in the system

Last update date:

3/13/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

Capped collection writer (includes insert, updates & delete ops) are serialized using a special resource metadata lock to maintain insertion(natural) order across the replica set. This lock is acquired after acquiring the write ticket and reserving the oplog slot. However, heavy write workloads on capped collections can negatively impact other workloads in the system by depleting write tickets or causing repl lag (due to oplog hole). After PM-2983 which ensures natural ordering for all collections, including capped collections, the serialization of capped collection writers and the use of the resource mutex lock are no longer necessary. This feature is targeted 8.1 and will not be backported. However, we've recently encountered two incidents on older mongod versions (5.0 and 7.0)with different customers where heavy capped write workloads led to ticket depletion and affecting other workloads in the system. This isn't great. I think we should acquire the resource metadata lock before acquiring the write ticket. This change be easily backported all the way till to version 5.0. This ensures serialization capped collection design only affects the perf of capped collection, and not the other workloads in the system. Note: We aimed to improve capped collection performance by enabling concurrent capped writer support through SERVER-82863. However, while implementing, we realized that without substantial changes, it could potentially introduce correctness issues. Therefore, we've decided to close the ticket and await the release of PM-2983 in version 8.1/ or improve TTL. I believe this change should be straightforward and reduce the blast radius for existing customers.

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Needs Scheduling

Learn More

Search:

...