...
Server died while using Genghis on primary server in two-node replication set. Both running on m1.large instances in EC2. Storage on EBS. Was testing scope of namespace for duplicate _id. Add was allowed and sync killed secondary. Fri Dec 27 01:10:33.835 [initandlisten] MongoDB starting : pid=19511 port=27017 dbpath=/usr/local/var/mongod/db 64-bit host=ip-10-238-205-161 Fri Dec 27 01:10:33.836 [initandlisten] db version v2.4.6 Fri Dec 27 01:10:33.836 [initandlisten] git version: b9925db5eac369d77a3a5f5d98a145eaaacd9673 Fri Dec 27 01:10:33.836 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49 Fri Dec 27 01:10:33.836 [initandlisten] allocator: tcmalloc Fri Dec 27 01:10:33.836 [initandlisten] options: { bind_ip: "0.0.0.0", config: "/etc/mongod.conf", dbpath: "/usr/local/var/mongod/db", fork: "true", logappend: "true", logpath: "/log/mongodb-2.log", replSet: "knrs" } Fri Dec 27 01:10:33.844 [initandlisten] journal dir=/usr/local/var/mongod/db/journal Fri Dec 27 01:10:33.844 [initandlisten] recover begin Fri Dec 27 01:10:33.844 [initandlisten] recover lsn: 0 Fri Dec 27 01:10:33.844 [initandlisten] recover /usr/local/var/mongod/db/journal/j._0 Fri Dec 27 01:10:33.852 [initandlisten] recover cleaning up Fri Dec 27 01:10:33.852 [initandlisten] removeJournalFiles Fri Dec 27 01:10:34.132 [initandlisten] recover done Fri Dec 27 01:10:34.132 [initandlisten] preallocating a journal file /usr/local/var/mongod/db/journal/prealloc.0 Fri Dec 27 01:11:49.551 [initandlisten] waiting for connections on port 27017 Fri Dec 27 01:11:49.552 [websvr] admin web console waiting for connections on port 28017 Fri Dec 27 01:11:49.562 [rsStart] replSet I am ec2-54-227-255-57.compute-1.amazonaws.com:27017 Fri Dec 27 01:11:49.562 [rsStart] replSet STARTUP2 Fri Dec 27 01:11:49.564 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote Fri Dec 27 01:11:49.565 [rsHealthPoll] replset info ec2-54-227-252-146.compute-1.amazonaws.com:27017 thinks that we are down Fri Dec 27 01:11:49.565 [rsHealthPoll] replSet member ec2-54-227-252-146.compute-1.amazonaws.com:27017 is up Fri Dec 27 01:11:49.565 [rsHealthPoll] replSet member ec2-54-227-252-146.compute-1.amazonaws.com:27017 is now in state SECONDARY Fri Dec 27 01:11:50.564 [rsSync] replSet still syncing, not yet to minValid optime 52bcad0e:1 Fri Dec 27 01:11:50.857 [initandlisten] connection accepted from 10.138.56.84:41413 #1 (1 connection now open) Fri Dec 27 01:11:50.857 [conn1] end connection 10.138.56.84:41413 (0 connections now open) Fri Dec 27 01:11:50.858 [initandlisten] connection accepted from 10.138.56.84:41414 #2 (1 connection now open) Fri Dec 27 01:11:56.563 [rsBackgroundSync] replSet syncing to: ec2-54-227-252-146.compute-1.amazonaws.com:27017 Fri Dec 27 01:11:56.565 [rsSync] replSet still syncing, not yet to minValid optime 52bcad0e:1 Fri Dec 27 01:11:56.579 [repl writer worker 1] ERROR: writer worker caught exception: invalid operator: $oid on: { ts: Timestamp 1388096782000|1, h: -6012491477194752701, v: 2, op: "i", ns: "kn1.accounts", o: { _id: { $oid: "5283fe3ea782c9aff9d96654" } , api_key: "604f5baa-fbc2-453a-ad9d-6e09a7d5e713", billing: {}, customEvents: {}, defaultDS: "ds.4444444.sample.internetTraffic", filters: [ { _id: { $oid: "5283fe3e9d46cd49718b457b" } , days_in_week: {}, desc: "Filter out non-business hours", hours_in_day: [ 17, 19, 20, 21, 22, 23, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ], name: "Business Hours", type: "hourDayMonth", visibility: "account" }, { _id: { $oid: "5283fe3e9d46cd49718b457c" } , days_in_week: [ 0, 6 ], desc: "Filter out any data from Saturday and Sunday.", hours_in_day: {}, name: "No Weekends", type: "hourDayMonth", visibility: "account" } ], knId: "4945496", kn_subscription: { subscriptionType: "", startDate: "", endDate: "", autoRenew: false } , name: "Know Normal, Inc.", nr_account_id: "279085", nr_api_key: "4cdafbfa8a636aefc1be23e0a21bbc027605ee8ed754d4a", nr_data_key: "52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a", nr_event_feed_uri: "/account_feeds/52086bb8297a0bf139511477b0f77dbb4dee674cd754d4a/events.rss", nr_subscription: { annual-renewal-on: { @attributes: { nil: "true" } }, expires-on: { @attributes: { nil: "true" } }, number-of-hosts: "1", starts-on: "2013-03-22", state: "paid", product-name: "Pro" }, owner: "nr@knownormal.com", sources: [ { knColName: "ds.4444444.sample.wx", knDesc: "Raleigh-Durham, North Carolina Hourly Weather for 2012", knCustomName: "Weather Data", knSourceType: "sample", knFieldNames: [ "TmpF", "DPf", "RH", "CC", "Wind" ], knDateField: "Date", knTimeField: "Time", knDateTimeField: -1 } , { knColName: "ds.4444444.sample.internetTraffic", knDesc: "Internet traffic data (in bits) from United Kingdom academic network backbone (19 November 2004 thru 27 January 2005). Hourly data.", knCustomName: "ISP Network Traffic", knSourceType: "sample", knFieldNames: [ "bits" ], knDateField: null, knTimeField: null, knDateTimeField: "Time" } , { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.kn.279085.summary", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App", knDesc: "", nrAppId: 1368487, nrAppName: "KN App", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1368487", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1368487/servers" } , { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1783546.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "KN App Dev Maury", knDesc: "", nrAppId: 1783546, nrAppName: "KN App Dev Maury", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1783546", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1783546/servers" } , { knSourceType: "NR", knSourceSubType: "summary", knColName: "ds.4945496.nr.279085.1942741.summary.hour", knEventSource: "events.4945496.nr.279085", knCustomName: "My Application", knDesc: "", nrAppId: 1942741, nrAppName: "My Application", nrOverviewUrl: "https://rpm.newrelic.com/accounts/279085/applications/1942741", nrServerUrl: "https://rpm.newrelic.com/api/v1/accounts/279085/applications/1942741/servers" } ], users: [ { knId: "4819339", role: "owner" } ] } } Fri Dec 27 01:11:56.579 [repl writer worker 1] Fatal Assertion 16360 0xdddd81 0xd9dc13 0xc26bfc 0xdab721 0xe26609 0x7fb086d29c6b 0x7fb0860cf5ed /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd9dc13] /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc] /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721] /usr/bin/mongod() [0xe26609] /lib64/libpthread.so.0(+0x7c6b) [0x7fb086d29c6b] /lib64/libc.so.6(clone+0x6d) [0x7fb0860cf5ed] Fri Dec 27 01:11:56.583 [repl writer worker 1] ***aborting after fassert() failure Fri Dec 27 01:11:56.583 Got signal: 6 (Aborted). Fri Dec 27 01:11:56.586 Backtrace: 0xdddd81 0x6d0d29 0x7fb0860219c0 0x7fb086021945 0x7fb08602325b 0xd9dc4e 0xc26bfc 0xdab721 0xe26609 0x7fb086d29c6b 0x7fb0860cf5ed /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81] /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29] /lib64/libc.so.6(+0x349c0) [0x7fb0860219c0] /lib64/libc.so.6(gsignal+0x35) [0x7fb086021945] /lib64/libc.so.6(abort+0x17b) [0x7fb08602325b] /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd9dc4e] /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x12c) [0xc26bfc] /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdab721] /usr/bin/mongod() [0xe26609] /lib64/libpthread.so.0(+0x7c6b) [0x7fb086d29c6b] /lib64/libc.so.6(clone+0x6d) [0x7fb0860cf5ed]
xgen-internal-githook commented on Thu, 6 Feb 2014 15:01:37 +0000: Author: {u'username': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'} Message: SERVER-12209: check _id for $fields at insert time Branch: master https://github.com/mongodb/mongo/commit/34d899304d971942e6ce4743489495d11a4d48c1 thomasr commented on Mon, 3 Feb 2014 12:00:46 +0000: Eliot, this causes the bug in 2.5.5. Prior to that, _skipValidation wasn't available. // start replica set var rs = new ReplSetTest({name: "insert_dollar_oid", nodes: 3}) rs.startSet() rs.initiate() rs.awaitReplication() var conn = rs.getMaster() var db = conn.getDB('test') // disable shell validation db._mongo._skipValidation = true // insert bad document db['coll'].insert({_id: {$oid: 1}}) // throws Fatal Assertion 16360 on secondaries rs.awaitReplication() eliot commented on Mon, 3 Feb 2014 11:53:24 +0000: I figured one out. eliot commented on Mon, 3 Feb 2014 11:02:50 +0000: Is there a repro without godinsert? scotthernandez commented on Mon, 27 Jan 2014 17:45:10 +0000: // This insert works, which causes the problem below > db.runCommand({godinsert:"u", obj:{_id:{$oid:1}}}) { "ok" : 1 } // Which turns into an update with replication > db.u.update({_id:{$oid:1}}, {_id:{$oid:1}}, true) SingleWriteResult({ "writeErrors" : [ ... "errmsg" : "could not canonicalize query { _id: { $oid: 1.0 } }; :: caused by :: unknown operator: $oid", "op" : { "q" : { "_id" : { "$oid" : 1 } }, "u" : { "a" : 1 }, "multi" : false, "upsert" : false } ... // 2.4 has this error msg { ... "err" : "invalid operator: $oid", ... } shaneo@knownormal.com commented on Mon, 13 Jan 2014 21:53:22 +0000: http://pecl.php.net/package/mongo – Shane O. ======================== Shane O'Donnell Know Normal, Inc. cell: +1.919.323.5380 skype: shaneodonnell email: shaneo@knownormal.com ======================== :wq! ger.hartnett commented on Mon, 13 Jan 2014 14:02:14 +0000: AFAIKS the released version of the MongoDB PHP driver is 1.3.0 http://ie2.php.net/manual/en/changelog.mongo.php shaneo@knownormal.com commented on Sun, 12 Jan 2014 00:04:27 +0000: RESPONSES INLINE WITH >>> *** I appears that secondaries are failing when trying to replicate documents whose _id field has a key starting with a $. >>> The primary crashed as well on the initial attempt to remove the document (via the mongo shell) The format { "_id" : { "$oid" : "52d0c341330e7982e125ad02" } } is a representation in our extended JSON format for ObjectIds in "strict" and "json" mode (mongoexport uses this for example to write valid JSON), but it is not a valid ObjectId in the server. Keys should not start with a $ sign, those are reserved operators in MongoDB. We'd like to find out how you ended up with such _id values. >>> Me too! What driver and version are you using? >>> In our current environment, we are accessing data in MongoDB via 4 different mechanisms: >>> 1) PHP driver (currently version 1.4.5 (dev), 1.3.4 & 1.4.2 (prod – now upgraded to 1.4.5), and records in MongoDB could have been installed by almost anything in between) >>> 2) Java driver 2.11.0 (now, but previous versions were definitely used). >>> 3) mongo shell >>> 4) Genghis Did you create them yourself through a driver that does not validate for this format? >>> See above. Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting). >>> Not that I'm aware of. How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with use kn1 db.accounts.find({_id: {$type: 3}}).count() This query finds all documents with an ObjectId of type 3 (= sub-document), which would only target the problematic ones if the rest have a true ObjectId. If there are many of them and you want to retain the data, we may be able to come up with a script that converts these back to normal ObjectIds (or alternatively just remove them all, if you don't want to keep them). It's likely though that you would have to take your primary down, convert/remove the documents in standalone mode, then bring it back up and re-sync your secondaries afterwards. >>> I can't run a query against it because it's still in ROLLBACK state. I don't know how to clear that. No rollback files were generated. That said, this collection only had six or so documents in it, and I'd looked at it numerous times and only noted the one document with the {$oid} key. In my tests, a repair did not fix or remove the documents. >>> In my test, a repair shrunk the size on disk so markedly (from 168GB to less than 1GB) that I was suspicious that I didn't lose production data, so I restored a backup (taken just prior to the --repair). thomasr commented on Sat, 11 Jan 2014 05:00:11 +0000: Hi Shane, I appears that secondaries are failing when trying to replicate documents whose _id field has a key starting with a $. The format { "_id" : { "$oid" : "52d0c341330e7982e125ad02" } } is a representation in our extended JSON format for ObjectIds in "strict" and "json" mode (mongoexport uses this for example to write valid JSON), but it is not a valid ObjectId in the server. Keys should not start with a $ sign, those are reserved operators in MongoDB. We'd like to find out how you ended up with such _id values. What driver and version are you using? Did you create them yourself through a driver that does not validate for this format? Did you use mongoexport to export your data previously and insert the documents manually again? (mongoimport would convert this format back to a proper ObjectId before inserting). How many documents with $oid do you have in your collection? If all your other documents in the collection have a proper ObjectId, then you can query for the $oid ones by their type, with use kn1 db.accounts.find({_id: {$type: 3}}).count() This query finds all documents with an ObjectId of type 3 (= sub-document), which would only target the problematic ones if the rest have a true ObjectId. If there are many of them and you want to retain the data, we may be able to come up with a script that converts these back to normal ObjectIds (or alternatively just remove them all, if you don't want to keep them). It's likely though that you would have to take your primary down, convert/remove the documents in standalone mode, then bring it back up and re-sync your secondaries afterwards. In my tests, a repair did not fix or remove the documents. Regards, Thomas shaneo@knownormal.com commented on Fri, 10 Jan 2014 16:13:08 +0000: If I can edit the original ticket, I can't figure out how. Versions should be: PRIMARY: 2.4.6 SECONDARY: 2.4.8 SECONDARY: 2.4.8 Note that I have since upgraded PRIMARY to 2.4.8 as well and have a related-if-not-duplicate bug to file, where bad $oid reference kills PRIMARY and replication then replicates the reference to SECONDARYs and kills them too.