...
BugZero found this defect 2743 days ago.
Replica set with 2 nodes and an arbiter with one node down When performing a w:"majority" write that would otherwise fail with Duplicate Key Exception, I expect it to fail with Write Concern Timeout exception after the timeout has expired. However, it still produces the dup key exception after the wait: replset:PRIMARY> db.test.insertOne({_id:1}) { "acknowledged" : true, "insertedId" : 1 } replset:PRIMARY> db.test.insertOne({_id:1},{writeConcern:{w:"majority",wtimeout:5000}}) ... WAIT ... 2017-04-26T18:09:57.685-0400 E QUERY [thread1] WriteError: E11000 duplicate key error collection: test.test index: _id_ dup key: { : 1.0 } : WriteError({ "index" : 0, "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.test index: _id_ dup key: { : 1.0 }", "op" : { "_id" : 1 } }) WriteError@src/mongo/shell/bulk_api.js:469:48 Bulk/mergeBatchResults@src/mongo/shell/bulk_api.js:836:49 Bulk/executeBatch@src/mongo/shell/bulk_api.js:906:13 Bulk/this.execute@src/mongo/shell/bulk_api.js:1150:21 DBCollection.prototype.insertOne@src/mongo/shell/crud_api.js:242:9 @(shell):1:1 replset:PRIMARY> rs.status() { "set" : "replset", "date" : ISODate("2017-04-26T22:10:05.449Z"), "myState" : 1, "term" : NumberLong(3), "heartbeatIntervalMillis" : NumberLong(2000), "optimes" : { "lastCommittedOpTime" : { "ts" : Timestamp(1493242117, 1), "t" : NumberLong(1) }, "appliedOpTime" : { "ts" : Timestamp(1493244598, 1), "t" : NumberLong(3) }, "durableOpTime" : { "ts" : Timestamp(1493244598, 1), "t" : NumberLong(3) } }, "members" : [ { "_id" : 0, "name" : "AD-MAC10G.local:27017", "health" : 0, "state" : 8, "stateStr" : "(not reachable/healthy)", "uptime" : 0, "optime" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "optimeDurable" : { "ts" : Timestamp(0, 0), "t" : NumberLong(-1) }, "optimeDate" : ISODate("1970-01-01T00:00:00Z"), "optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"), "lastHeartbeat" : ISODate("2017-04-26T22:10:04.105Z"), "lastHeartbeatRecv" : ISODate("2017-04-26T21:28:42.092Z"), "pingMs" : NumberLong(0), "lastHeartbeatMessage" : "Connection refused", "configVersion" : -1 }, { "_id" : 1, "name" : "AD-MAC10G.local:27018", "health" : 1, "state" : 1, "stateStr" : "PRIMARY", "uptime" : 2740, "optime" : { "ts" : Timestamp(1493244598, 1), "t" : NumberLong(3) }, "optimeDate" : ISODate("2017-04-26T22:09:58Z"), "infoMessage" : "could not find member to sync from", "electionTime" : Timestamp(1493244557, 1), "electionDate" : ISODate("2017-04-26T22:09:17Z"), "configVersion" : 1, "self" : true }, { "_id" : 2, "name" : "AD-MAC10G.local:27019", "health" : 1, "state" : 7, "stateStr" : "ARBITER", "uptime" : 57, "lastHeartbeat" : ISODate("2017-04-26T22:10:03.982Z"), "lastHeartbeatRecv" : ISODate("2017-04-26T22:10:02.801Z"), "pingMs" : NumberLong(0), "configVersion" : 1 } ], "ok" : 1 } replset:PRIMARY> This makes it impossible to distinguish the write concern failure.
spencer commented on Mon, 3 Jul 2017 21:48:03 +0000: Looks like the issue was actually with the shell behavior, not the command behavior. spencer commented on Wed, 17 May 2017 22:49:01 +0000: Actually, I am unable to reproduce this outside of the shell bulk api. alex.komyagin, can you reproduce this using the command api directly? i.e. db.runCommand({insert:'test', documents: [{_id:2}], writeConcern:{w:3, wtimeout:3000}, ordered: true}) spencer commented on Wed, 17 May 2017 21:51:17 +0000: So, hilariously, the difference between ordered and unordered is coming from the shell, not the server. I filed SERVER-29263 for that bug. There may still be a bug in the server where we don't return the write concern error in some places. I'm still investigating the server side of the issue. alex.komyagin@10gen.com commented on Thu, 27 Apr 2017 20:45:13 +0000: Just to add to this, ordered and unordered bulk operations are not consistent in reporting write concern failures after a timeout: replset:PRIMARY> db.test1.bulkWrite([{insertOne:{document:{_id:1}}}],{ordered:false,writeConcern:{w:2,wtimeout:1000}}) 2017-04-27T16:38:17.558-0400 E QUERY [thread1] BulkWriteError: write error at item 0 and problem enforcing write concern in bulk operation : BulkWriteError({ "writeErrors" : [ { "index" : 0, "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.test1 index: _id_ dup key: { : 1.0 }", "op" : { "_id" : 1 } } ], "writeConcernErrors" : [ { "code" : 64, "codeName" : "WriteConcernFailed", "errInfo" : { "wtimeout" : true }, "errmsg" : "waiting for replication timed out" } ], "nInserted" : 0, "nUpserted" : 0, "nMatched" : 0, "nModified" : 0, "nRemoved" : 0, "upserted" : [ ] }) BulkWriteError@src/mongo/shell/bulk_api.js:372:48 BulkWriteResult/this.toError@src/mongo/shell/bulk_api.js:336:24 Bulk/this.execute@src/mongo/shell/bulk_api.js:1173:1 DBCollection.prototype.bulkWrite@src/mongo/shell/crud_api.js:191:20 @(shell):1:1 replset:PRIMARY> db.test1.bulkWrite([{insertOne:{document:{_id:1}}}],{ordered:true,writeConcern:{w:2,wtimeout:1000}}) 2017-04-27T16:38:39.568-0400 E QUERY [thread1] BulkWriteError: write error at item 0 in bulk operation : BulkWriteError({ "writeErrors" : [ { "index" : 0, "code" : 11000, "errmsg" : "E11000 duplicate key error collection: test.test1 index: _id_ dup key: { : 1.0 }", "op" : { "_id" : 1 } } ], "writeConcernErrors" : [ ], "nInserted" : 0, "nUpserted" : 0, "nMatched" : 0, "nModified" : 0, "nRemoved" : 0, "upserted" : [ ] }) milkie commented on Wed, 26 Apr 2017 23:58:31 +0000: Arguably, however, you should still be able to know of your timeout expired, because otherwise you would assume that your read could see the data even when using read concern majority. This seems like a bug that will be fixed if we merge the w timeout with maxTimeMS. milkie commented on Wed, 26 Apr 2017 23:54:10 +0000: This behavior is by design. You were always destined to get the duplicate key error, regardless of your write concern. However, because you used w:majority, the server has to delay its response to you so that the preexisting data that caused your error can first replicate to a majority of nodes. This is to allow you to immediately issue a read concern level majority read and be able to see the duplicate data that caused the write error.