Operational Defect Database

BugZero found this defect 48 days ago.

MongoDB | 2628491

concurrent start+stop causes mongostream crash due to invariant "line":983,"expr":"!status.isOK()","file":"src/mongo/util/future.h"

Last update date:

4/3/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

A stop request during an ongoing start flow can cause an invariant: "msg":"Invariant failure","svc":"-","attr":{"line":983,"expr":"!status.isOK()","file":"src/mongo/util/future.h" } catch (const SPException& e) { // This catch block gets hit with an SPException that has an OK status. // This leads to the invariant when we call setError LOGV2_WARNING(75900, "encountered stream processor exception, exiting runLoop(): {error}", "context"_attr = _context, "errorCode"_attr = e.code(), "reason"_attr = e.reason(), "unsafeErrorMessage"_attr = e.unsafeReason(), "error"_attr = e.what()); _promise.setError(e.toStatus()); promiseFulfilled = true; } catch (const DBException& e) { Example 1 (staging): https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=-4h%40m&latest=now&q=s[…]ype=events&display.events.type=raw&sid=1712030881.9694837 4:34:59.650 AM — Agent starting stream processor 4:34:59.651 AM — About to start stream processor // SPM sends an errant stop request due to bug in heartbeat rejection logic. 4:35:01.523 AM — Stopping stream processor 4:35:01.568 AM — started operator dag 4:35:01.568 AM — encountered stream processor exception, exiting runLoop(): {error} 4:35:01.568 AM — Invariant failure Example 2 (prod): 8:10:17.832 AM – Starting stream processor // This is the k8s shutdown flow. A side question is, why is it happening now? 8:10:18.745 AM – Stopping all streamProcessors 8:10:18.745 AM – Stopping stream processor 8:10:18.833 AM – encountered stream processor exception, exiting runLoop(): {error} 8:10:18.833 AM – expr: !status.isOK(), file: src/mongo/util/future.h, line: 983 https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=1712045408.833&latest=1712045428.834&q=search%20index%3Dmhouse%20(66043c5a834b6c388c081dd0%20OR%20%22Stopping%20all%20streamProcessors%22)%20host%3Dstreams-spp-56b79c874d-znrps%20source%3Dstreams-spp%20c%3DSTREAMS%20((attr.errorCode%3D0%20AND%20exception)%20OR%20%22Stopping%22%20OR%20%22Starting%22)&display.page.search.mode=smart&dispatch.sample_ratio=1&display.page.search.tab=events&display.general.type=events&sid=1712067177.9766529 Example 3 (prod): https://splunk.corp.mongodb.com/en-US/app/streams/search?earliest=1711580331.209&latest=1711580351.21&q=search%20index%3Dmhouse%20source%3Dstreams-spp%2065f418f9e00ced3c072f9e58%20host%3Dstreams-spp-56b79c874d-dnsng%20(c%3DSTREAMS%20OR%20%22Agent%20starting%20stream%20processor%22)&display.page.search.mode=smart&dispatch.sam

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

In Progress

Learn More

Search:

...