Operational Defect Database

BugZero found this defect 2411 days ago.

MongoDB | 405776

[SERVER-30217] applyOps doesn't wait for replication on the last op if it's a noop

Last update date:

12/6/2022

Affected products:

MongoDB Server

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Info

When a write concern is provided to the applyOps command, we normally wait on the OpTime of whichever operation successfully completed last. This is erroneous, however, if the last operation in the array happens to be a write no-op and thus isn’t assigned an OpTime. Let the second to last operation in the applyOps be write A, the last operation in applyOps be write B. Let B do a no-op write and let the operation that caused B to be a no-op be C. If C has an OpTime ahead of A, then we won’t wait for C to be replicated and it could be rolled back, even though B was acknowledged. To fix this, we should wait for replication of the node’s last applied OpTime if the last write operation was a no-op write.

Top User Comments

greg.mckeon commented on Tue, 19 Jun 2018 18:32:36 +0000: If we fix any applyOps correctness bugs, we want to fix this one. cramaechi commented on Mon, 1 Jan 2018 01:08:32 +0000: Still wrapping my head around this, but if this issue is only related to the non-atomic form of applyOps, which I suspect is _applyOps() in src/mongo/db/repl/apply_ops.cpp, then I suppose the first step in resolving this issue would be to prevent _applyOps() from ignoring no-op write operations by removing the following fragment of code: const char* opType = opObj["op"].valuestrsafe(); if (*opType == 'n') continue; I would then proceed cautiously by adding the following block to the lambda expression passed to writeConflictRetry(): { repl::UnreplicatedWritesBlock uwb(opCtx); uassertStatusOK(_applyOps(opCtx, dbName, applyOpCmd, oplogApplicationMode, &result, &numApplied, opsBuilder.get())); } I believe the first line of code in the above block would suppress replication for non-atomic operations until the last successfully completed operation in the array. In other words, it would wait for replication of the last op, even if it's a no-op write. Not sure if any of this even makes sense, but this is as far as I've gotten . Please share your thoughts! spencer commented on Thu, 14 Dec 2017 18:12:58 +0000: This only applies to the non-atomic form of applyOps

Additional Resources / Links

Share:

BugZero Risk Score

Coming soon

Status

Backlog

Have you been affected by this bug?

cost-cta-background

Do you know how much operational outages are costing you?

Understand the cost to your business and how BugZero can help you reduce those costs.

Discussion

Login to read and write comments.

Have you ever...

had your data corrupted from a

VMware

bug?

Search:

...