Operational Defect Database

BugZero found this defect 52 days ago.

MongoDB | 2625769

Backlogged $merge causes 30min stop

Last update date:

3/28/2024

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

This is the customer's pipeline: [{"$source": {"connectionName": "KafkaConfluent","topic": "OutputTopic"}},{"$merge": {"into": {"connectionName": "LyricsCluster","db": "streamingvectors","coll": "lyrics"},"on": "_id","whenMatched": "merge","whenNotMatched": "insert"}}] Root cause (see this splunk): When the customer issued the stop, there were roughly 4,163,319,725 bytes input, but only 2,670,960,838 bytes output. The sink was backlogged. As part of the stop, we start writing a checkpoint. The $source processes the checkpoint at 3/28/24 5:03:37.180 PM The sink doesn't finish processing the checkpoint until 3/28/24 5:44:11.283 PM One related issue is-- why did the backlog get up to 2GB? Our code should prevent that, limiting the backlog in the sink to ~100MB. ==== Customer report ==== It has failed again with the same error in the new stream processing cluster I have created. { id: '6603d8c486a1abd293b773c5', name: 'lyrics_destination_cluster', lastModified: ISODate('2024-03-27T08:28:52.546Z'), state: 'STARTED', errorMsg: '', workers: [ 'worker-56b79c874d-9wjr2' ], pipeline: [ { '$source': { connectionName: 'KafkaConfluent', topic: 'OutputTopic' } }, { '$merge': { into: { connectionName: 'LyricsCluster', db: 'streamingvectors', coll: 'lyrics' } , on: '_id', whenMatched: 'merge', whenNotMatched: 'insert' } } ], lastStateChange: ISODate('2024-03-28T17:03:31.206Z') }, The processor subscribed to the Kafka topic has stopped working. It still shows a STARTED state and I can’t stop it

Top User Comments


Steps to Reproduce


Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Open

Learn More

Search:

...