Operational Defect Database

BugZero found this defect 33 days ago.

MongoDB | 2643535

Server stalled after a CPU spike

Last update date:

4/16/2024

Affected products:

MongoDB Server

Affected releases:

4.2.12

Fixed releases:

No fixed releases provided.

Description:

Info

We have a PSA Replica Set, each data-bearing node has 32 cores, 64GB memory and 3TB SSD. This has been running fine for over two years now, but recently, while the data size keeps growing, we ran into a weird problem, twice in a month: When high traffic occurred, primary's CPU(we use primaryPrefrred read preference) first went up to around 90%, then drop down to below 50%, and all queries slowed down after the drop. We have examined systctl params, ulimits params, filesystem configs(XFS, no TPH) , WiredTiger cache usage(arount 80%), disk limits(throughput and IOPS), WiredTiger cache dirty percentage(around %5), etc, but couldn't figure out what's the rational behind the stall. Please help to confirm if this is a bug, or give us a clue on what are we doing working. See attachments for related FTDC files. We know version 4.2.12 has been EoL, apologes first if you find this issue is inapposite. Many Thanks!

Top User Comments


Steps to Reproduce

We couldn't find stable reproduce steps.

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Needs Verification

Learn More

Search:

...