...
BugZero found this defect 2660 days ago.
When a write concern is provided to the applyOps command, we normally wait on the OpTime of whichever operation successfully completed last. This is erroneous, however, if the last operation in the array happens to be a write no-op and thus isn’t assigned an OpTime. Let the second to last operation in the applyOps be write A, the last operation in applyOps be write B. Let B do a no-op write and let the operation that caused B to be a no-op be C. If C has an OpTime ahead of A, then we won’t wait for C to be replicated and it could be rolled back, even though B was acknowledged. To fix this, we should wait for replication of the node’s last applied OpTime if the last write operation was a no-op write.
greg.mckeon commented on Tue, 19 Jun 2018 18:32:36 +0000: If we fix any applyOps correctness bugs, we want to fix this one. cramaechi commented on Mon, 1 Jan 2018 01:08:32 +0000: Still wrapping my head around this, but if this issue is only related to the non-atomic form of applyOps, which I suspect is _applyOps() in src/mongo/db/repl/apply_ops.cpp, then I suppose the first step in resolving this issue would be to prevent _applyOps() from ignoring no-op write operations by removing the following fragment of code: const char* opType = opObj["op"].valuestrsafe(); if (*opType == 'n') continue; I would then proceed cautiously by adding the following block to the lambda expression passed to writeConflictRetry(): { repl::UnreplicatedWritesBlock uwb(opCtx); uassertStatusOK(_applyOps(opCtx, dbName, applyOpCmd, oplogApplicationMode, &result, &numApplied, opsBuilder.get())); } I believe the first line of code in the above block would suppress replication for non-atomic operations until the last successfully completed operation in the array. In other words, it would wait for replication of the last op, even if it's a no-op write. Not sure if any of this even makes sense, but this is as far as I've gotten . Please share your thoughts! spencer commented on Thu, 14 Dec 2017 18:12:58 +0000: This only applies to the non-atomic form of applyOps