...
BugZero found this defect 2650 days ago.
In a replica set create a collection by inserting a document restart the replica set then drop the collection or database On the primary the corresponding WT tables may not be dropped, and the .wt file will remain. The following script reproduces the problem about half the time. The script ends with a loop that waits for the .wt file for the collection to disappear from all members; the problem is reproduced if the loop at the end never terminates and the offending file is associated with the member that was elected as primary. Since the script also reliably reproduces SERVER-31101, you have to check whether the .wt file associated with the primary remains in order to determine whether this issue has been reproduced. The script prints replica set status before going into the wait loop to help with this. Note also that the script uses killall -w, so as written will work on Linux but not OSX. function repro { db=/ssd/db # change this as required uri='mongodb://localhost:27017/test?replicaSet=rs' function clean { killall -9 -w mongod rm -rf $db } function start { for i in 0 1 2; do mkdir -p $db/r$i mongod --dbpath $db/r$i --logpath ./r$i.log --port 27${i}17 --replSet rs --fork done } function stop { killall -w mongod } function initiate { mongo --quiet --eval ' rs.initiate({ _id: "rs", members: [ {_id: 0, host: "localhost:27017"}, {_id: 1, host: "localhost:27117"}, {_id: 2, host: "localhost:27217"} ] }) ' } # get collection filename for port $1 function fn { mongo --quiet --port $1 --eval ' rs.slaveOk() print(db.runCommand({collStats: "c"}).wiredTiger.uri.substr(17)) ' } # start new replica set clean; start; initiate # create test.c, wait for replication mongo --quiet $uri --eval 'db.c.insert({})' sleep 5 # note collection filenames on each member fn0=$db/r0/$(fn 27017).wt fn1=$db/r1/$(fn 27117).wt fn2=$db/r2/$(fn 27217).wt # restart stop; start # drop collection sleep 5 mongo --quiet $uri --eval 'print("drop:", db.c.drop())' # print member status so we can tell which is offending member sleep 5 mongo --quiet $uri --eval ' members = rs.status().members for (var i in members) print(members[i].name, members[i].stateStr) ' # wait for all files to disappear # problem is reproduced if this waits forever while [[ -e $fn0 || -e $fn1 || -e $fn2 ]]; do ls -l $fn0 $fn1 $fn2 sleep 1 done }
bruce.lucas@10gen.com commented on Tue, 17 Oct 2017 13:27:22 +0000: daniel.gottlieb has identified the cause of the repro above to be a long-running OperationContext created by the HMAC thread that holds a cached cursor that prevents the WT table from being dropped. This was identified on SERVER-31101 as one of the long-running OperationContexts that could potentially cause this problem, so I'll dup this ticket to that one.