...
BugZero found this defect 152 days ago.
Hi, Observed in mongocxx 3.7.0 with MongoDB version 6.0.15. Seems that when starting a client with (example) timeoutMs=5000 and the starting a change-stream watch() with it: it is getting timed out internally without throwing any exception it starts another collscan operation on the same stream this creates a situation that the watch creates a CPU leak - collscan is getting piled up, until the server is arriving to 100% CPU and hundreds of collscans are created in a short time. Issue was fixed by using the default timeoutMs=0 and letting the collscan whatever time is needs. Is a known issue? or if this is the expected behavior then why is that? Thanks.
JIRAUSER1265262 commented on Mon, 12 Aug 2024 22:30:10 +0000: Thanks for the extra data oraiches@zadarastorage.com! I spent some time today working on this and was unable to replicate this behavior. It's possible there may be something additional going on or another circumstance required to make this occur on 6.0.15. I did the following to initialize a 2 shard cluster with the balancer disabled: m 6.0.15 mlaunch init --replicaset --sharded 2 mongosh use yourDatabase sh.stopBalancer() // force all the docs onto one shard mongosh < insert_docs.js // insert 50 million docs mongosh use yourDatabase sh.enableSharding("yourDatabase"); db.yourCollection.createIndex({ id: "hashed" }); sh.shardCollection("yourDatabase.yourCollection", { id: "hashed" }); I then opened a change stream against yourDatabase.yourCollection, and re-enabled the balancer: sh.startBalancer() sh.status() I observed at most 3 COLLSCANs being reported by your script, and it never seemed to increment beyond that (and CPU was not excessive) [direct: mongos] yourDatabase> print("Number of operations of type 'op' with planSummary 'COLLSCAN': " + collScanOpCount); Number of operations of type 'op' with planSummary 'COLLSCAN': 3 Notes Change Streams do not do a traditional COLLSCAN because they are not querying any of your collections. They are watching/querying the oplog. This could be more or less expensive depending on the size of the oplog, and how frequently it's being updated To set expectations.. Given that I can't replicate this behavior, we likely cannot help with an immediate solution for you. I suspect you may be encountering some known limitations in our change stream production recommendations but I cannot verify without diagnostic data. Questions Instead, I will pass this to the relevant team to answer a few targeted questions pertaining to the issue presented: In a sharded cluster, will change streams cancel their oplog scan if the requesting client (that opened the change stream) times out due to hitting timeoutMs? Do change streams from timed out clients remain open? (I wasn't able to observe this behavior in my testing) Would COLLSCAN operations be expected to build up due to a connection storm from the user's app reconnecting, and rescanning the oplog? The only circumstance I can see this happening is if a new change stream is being opened before the last COLLSCAN completes, implying it's taking over 5 seconds to search the oplog on one or more of the shard(s) This suggests to me we may be hitting some limitation outlined in change stream production recommendations To guarantee total ordering of changes, for each change notification the mongos checks with each shard to see if the shard has seen more recent changes. Sharded clusters with one or more shards that have little or no activity for the collection, or are "cold", can negatively affect the response time of the change stream as the mongos must still check with those cold shards to guarantee total ordering of changes. This effect may be more apparent with geographically distributed shards, or workloads where the majority of operations occur on a subset of shards in the cluster. To minimize latency for cold shards, you can specify a lower periodicNoopIntervalSecs value. JIRAUSER1264630 commented on Thu, 8 Aug 2024 09:55:37 +0000: Hi chris.kelly@mongodb.com , Unfortunately I do not have these diagnostics anymore, and already fixed the issue by setting timeoutMs=0. I can give you these simple steps to recreate: create a 1 sharded cluster. insert 50M documents add a 2nd shard and let the rebalance run. while it runs, read from the change-stream using an old ClusterTime as the start_at_operation_time option for the watch() call. Here, use a client that is set with low timeoutMs (5000ms like I did or even lower). use this read methodology: change_stream stream = db.watch(pipeline, options); while(true) { for (const auto& event : stream) { doStuff(event); } } Since the 2nd shard is being added and the migration is going on, then the server will be working extra hard and take more than timeoutMs to get the response in the change-stream loop, you can see the number of running collscans increasing and the CPU of the PRIMRY host increasing as well, you can check it like this in the mongoshell: // Get the current operations var currentOps = db.currentOp().inprog; // Function to count operations of type 'op' with planSummary 'COLLSCAN' function countCollScanOps(ops) { var count = 0; for (var i = 0; i < ops.length; i++) { var op = ops[i]; if (op.type === 'op' && op.planSummary === 'COLLSCAN') { count++; } } return count; } // Get the count of matching operations var collScanOpCount = countCollScanOps(currentOps); // Print the result print("Number of operations of type 'op' with planSummary 'COLLSCAN': " + collScanOpCount); On my setup, every replica runs on a different host. you will start seeing that the collscan increases every few seconds (every timeoutMs actually) running the mongoshell code above These steps recreated the issue every time. Thanks for the help. JIRAUSER1265262 commented on Wed, 7 Aug 2024 16:34:17 +0000: Hi oraiches@zadarastorage.com, Thanks for your report. Can you share an example of the change stream, and share some diagnostic data covering the time you try it? Providing timestamps to represent when you use the timeout or not would be helpful. Please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.