BugZero | MongoDB BugID 2737913 - Change-Stream client timeoutMs detaches and create...

MongoDB - Defect ID: 2737913

Change-Stream client timeoutMs detaches and creates another collscan instead of killing old collscan

MongoDB - Defect ID: 2737913

Change-Stream client timeoutMs detaches and creates another collscan instead of killing old collscan

Last updated on 9/20/2024

Overall: 4.34.3

Severity: 4.64.6

Community: 5.55.5

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Minor - P4
Status: Closed

Overall: 4.34.3

Severity: 4.64.6

Community: 5.55.5

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Minor - P4
Status: Closed

Info

Hi, Observed in mongocxx 3.7.0 with MongoDB version 6.0.15. Seems that when starting a client with (example) timeoutMs=5000 and the starting a change-stream watch() with it: it is getting timed out internally without throwing any exception it starts another collscan operation on the same stream this creates a situation that the watch creates a CPU leak - collscan is getting piled up, until the server is arriving to 100% CPU and hundreds of collscans are created in a short time. Issue was fixed by using the default timeoutMs=0 and letting the collscan whatever time is needs. Is a known issue? or if this is the expected behavior then why is that? Thanks.

Top User Comments

JIRAUSER1259305 commented on Fri, 20 Sep 2024 13:53:56 +0000: Answering your questions: In a sharded cluster, will change streams cancel their oplog scan if the requesting client (that opened the change stream) times out due to hitting timeoutMs? $changeStreams operate on a tailable cursor that doesn't automatically close on maxTimeMS, instead this is the maximum amount of time the stream will spend processing getMore() call on the cursor before returning empty batch. Therefore by changing the number to 0 you tell the cursor to not wait on any new data potentially being inserted and instead return ASAP On the first opening of the change stream, the cursor needs to find the position in the oplog from which it should start returning data to the users, which will ignore the original maxTimeMS value Do change streams from timed out clients remain open? (I wasn't able to observe this behavior in my testing) Change streams cursors remain open, unless the cursor is considered idle for over 10 minutes. Would COLLSCAN operations be expected to build up due to a connection storm from the user's app reconnecting, and rescanning the oplog? Only if user opens another change stream cursor I am closing this ticket. In case you are able to reproduce it and have some diagnostics data, we will be glad to have a look at it. JIRAUSER1265262 commented on Mon, 12 Aug 2024 22:30:10 +0000: Thanks for the extra data oraiches@zadarastorage.com! I spent some time today working on this and was unable to replicate this behavior. It's possible there may be something additional going on or another circumstance required to make this occur on 6.0.15. I did the following to initialize a 2 shard cluster with the balancer disabled: m 6.0.15 mlaunch init --replicaset --sharded 2 mongosh use yourDatabase sh.stopBalancer() // force all the docs onto one shard mongosh < insert_docs.js // insert 50 million docs mongosh use yourDatabase sh.enableSharding("yourDatabase"); db.yourCollection.createIndex({ id: "hashed" }); sh.shardCollection("yourDatabase.yourCollection", { id: "hashed" }); I then opened a change stream against yourDatabase.yourCollection, and re-enabled the balancer: sh.startBalancer() sh.status() I observed at most 3 COLLSCANs being reported by your script, and it never seemed to increment beyond that (and CPU was not excessive) [direct: mongos] yourDatabase> print("Number of operations of type 'op' with planSummary 'COLLSCAN': " + collScanOpCount); Number of operations of type 'op' with planSummary 'COLLSCAN': 3 Notes Change Streams do not do a traditional COLLSCAN because they are not querying any of your collections. They are watching/querying the oplog. This could be more or less expensive depending on the size of the oplog, and how frequently it's being updated To set expectations.. Given that I can't replicate this behavior, we likely cannot help with an immediate solution for you. I suspect you may be encountering some known limitations in our change stream production recommendations but I cannot verify without diagnostic data. Questions Instead, I will pass this to the relevant team to answer a few targeted questions pertaining to the issue presented: In a sharded cluster, will change streams cancel their oplog scan if the requesting client (that opened the change stream) times out due to hitting timeoutMs? Do change streams from timed out clients remain open? (I wasn't able to observe this behavior in my testing) Would COLLSCAN operations be expected to build up due to a connection storm from the user's app reconnecting, and rescanning the oplog? The only circumstance I can see this happening is if a new change stream is being opened before the last COLLSCAN completes, implying it's taking over 5 seconds to search the oplog on one or more of the shard(s) This suggests to me we may be hitting some limitation outlined in change stream production recommendations To guarantee total ordering of changes, for each change notification the mongos checks with each shard to see if the shard has seen more recent changes. Sharded clusters with one or more shards that have little or no activity for the collection, or are "cold", can negatively affect the response time of the change stream as the mongos must still check with those cold shards to guarantee total ordering of changes. This effect may be more apparent with geographically distributed shards, or workloads where the majority of operations occur on a subset of shards in the cluster. To minimize latency for cold shards, you can specify a lower periodicNoopIntervalSecs value. JIRAUSER1264630 commented on Thu, 8 Aug 2024 09:55:37 +0000: Hi chris.kelly@mongodb.com , Unfortunately I do not have these diagnostics anymore, and already fixed the issue by setting timeoutMs=0. I can give you these simple steps to recreate: create a 1 sharded cluster. insert 50M documents add a 2nd shard and let the rebalance run. while it runs, read from the change-stream using an old ClusterTime as the start_at_operation_time option for the watch() call. Here, use a client that is set with low timeoutMs (5000ms like I did or even lower). use this read methodology: change_stream stream = db.watch(pipeline, options); while(true) { for (const auto& event : stream) { doStuff(event); } } Since the 2nd shard is being added and the migration is going on, then the server will be working extra hard and take more than timeoutMs to get the response in the change-stream loop, you can see the number of running collscans increasing and the CPU of the PRIMRY host increasing as well, you can check it like this in the mongoshell: // Get the current operations var currentOps = db.currentOp().inprog; // Function to count operations of type 'op' with planSummary 'COLLSCAN' function countCollScanOps(ops) { var count = 0; for (var i = 0; i < ops.length; i++) { var op = ops[i]; if (op.type === 'op' && op.planSummary === 'COLLSCAN') { count++; } } return count; } // Get the count of matching operations var collScanOpCount = countCollScanOps(currentOps); // Print the result print("Number of operations of type 'op' with planSummary 'COLLSCAN': " + collScanOpCount); On my setup, every replica runs on a different host. you will start seeing that the collscan increases every few seconds (every timeoutMs actually) running the mongoshell code above These steps recreated the issue every time. Thanks for the help. JIRAUSER1265262 commented on Wed, 7 Aug 2024 16:34:17 +0000: Hi oraiches@zadarastorage.com, Thanks for your report. Can you share an example of the change stream, and share some diagnostic data covering the time you try it? Providing timestamps to represent when you use the timeout or not would be helpful. Please archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) and upload them to this support uploader location. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Steps to Reproduce

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 2737913

Change-Stream client timeoutMs detaches and creates another collscan instead of killing old collscan

MongoDB - Defect ID: 2737913

Change-Stream client timeoutMs detaches and creates another collscan instead of killing old collscan

Last updated on 9/20/2024

Vendor details

Vendor details

Description

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB defects by risk score

Ready to prevent the next vendor outage?