BugZero | MongoDB BugID 2235698 - Random high query response and very high CPU on ru...

MongoDB - Defect ID: 2235698

Random high query response and very high CPU on running rs combination with 4.2.20 and 4.4.18 versions

MongoDB - Defect ID: 2235698

Random high query response and very high CPU on running rs combination with 4.2.20 and 4.4.18 versions

Last updated on 8/12/2024

Overall: 6.26.2

Severity: 6.46.4

Community: 7.47.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Overall: 6.26.2

Severity: 6.46.4

Community: 7.47.4

Lifecycle: 9.19.1

What is the BugZero Risk Score?

Vendor details

Priority: Major - P3
Status: Closed

Info

Scenario - We run tests with same load on 2 mongo versions(4.2.20,4.4.20). We found high query response(in range of 3-4 seconds) and very high cpu(100%) on primary for 4.2.20 version but no issue was found on 4.4.18 As per the suggestion in Server-67217, we also collected disk usage but we did not find any high disk read/write at the issue time. In Server-67217, we did same test between (4.2.20 and 4.0.27) and still issue was found on 4.2.20 primary members only. Kindly refer Server-67217 for more details about the issue. Note: Please provide cloud link to upload respective logs.

Top User Comments

JIRAUSER1265262 commented on Mon, 12 Aug 2024 23:23:38 +0000: Closing this issue as we didn't get to an RCA and our latest, supported version is now MongoDB 5.0. If the issue recurs we can reprioritize this issue in the future. JIRAUSER1264730 commented on Fri, 17 Mar 2023 05:02:10 +0000: Gentle Reminder 2! Regards, Kapil JIRAUSER1264730 commented on Fri, 3 Mar 2023 04:35:44 +0000: Hi Chris, Gentle Reminder! Did you get anything on this? Regards, Kapil JIRAUSER1264730 commented on Thu, 23 Feb 2023 05:41:49 +0000: Hi Chris, I have uploaded the logs. Please find the logs file detail given below. Parent file name: Logs.zip child files: cps@vpas-B-persistence-db-9:~$ tar -tf PrimaryLogs.tar.gz diagnostic.data/ diagnostic.data/metrics.interim diagnostic.data/metrics.2023-02-22T11-44-24Z-00000 journalctlPrimary.logs kernalDmesgLogs mongo-27040.log mongo-27040.log.1 mongo-27040.log_verbose NOTE: high cpu is seen multiple times(almost every 2 minutes) from 22.02.23 12:45:00 to 22.02.23 13:45:00 (mentioned in mongo-27040.log and mongo-27040.log.1) and also I enabled log verbosity from 2023-02-22T13:09:21.295+0000 (mentioned in mongo-27040.log_verbose ) to get more logs cps@vpas-B-persistence-db-10:~$ tar -tf Secondary4.2Logs.tar.gz diagnostic.data/ diagnostic.data/metrics.interim diagnostic.data/metrics.2023-02-22T13-09-06Z-00000 diagnostic.data/metrics.2023-02-22T04-56-59Z-00000 mongo-27040.log cps@vpas-A-persistence-db-10:~$ sudo tar -tf Secondary4.4Logs.tar.gz diagnostic.data/ diagnostic.data/metrics.interim diagnostic.data/metrics.2023-02-22T09-59-46Z-00000 diagnostic.data/metrics.2023-02-22T00-13-58Z-00000 mongo-27040.log cps@vpas-A-persistence-db-9:~$ sudo tar -tf Secondary_2_4.4Logs.tar.gz diagnostic.data/ diagnostic.data/metrics.interim diagnostic.data/metrics.2023-02-22T00-14-27Z-00000 diagnostic.data/metrics.2023-02-22T09-59-46Z-00000 mongo-27040.log Thanks, Kapil JIRAUSER1265262 commented on Wed, 22 Feb 2023 17:13:48 +0000: Hi Kapil, I've created a new secure upload portal link for you. Christopher JIRAUSER1264730 commented on Wed, 22 Feb 2023 13:59:34 +0000: Hi Chris, Sorry for delay as we lost our last logs. we recreated the issue now but unfortunately the upload link has been expired. As per your suggestion, we checked that feasibility( have lower version secondaries)as well but here scenario is like -[1 primary and 1 secondary on 4.2 and 2 secondaries on 4.4]also we tested same with (SERVER-67217 )[ 1 primary and 1 secondary on 4.0 and 2 secondaries on 4.2 ]. Each time we got high cpu issue on 4.2 primary member only. Kindly provide the new upload link to upload the logs. Thanks, Kapil. JIRAUSER1265262 commented on Wed, 8 Feb 2023 19:28:31 +0000: We still need additional information to diagnose the problem. If this is still an issue for you, would you please supply the requested information if possible? Also, just as an FYI, it seems like you are pointing out that you are observing latency on a higher version primary node, when you have lower version secondaries. Per the upgrade steps, you should be upgrading these secondaries first to the higher version, and then finishing your upgrade by stepping down the primary (which should be the last one to be upgraded) Effectively, this sounds like it should mitigate your problem. If it does not, please provide further detail regarding this process (and the requested information above). Thanks! JIRAUSER1265262 commented on Thu, 19 Jan 2023 16:51:41 +0000: Hi Kapil, I've created a secure upload portal for you. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. For each node in the replica set spanning a time period that includes the incident, would you please archive (tar or zip) and upload to that link: the mongod logs the $dbpath/diagnostic.data directory (the contents are described here) Additionally, if you have a test driver that reproduces the workload you're having issues with, that would be significantly helpful in pinning down what may be occurring here. Christopher

Steps to Reproduce

1- run 2 data members with 4.2.20 and 2 data members with 4.4.18 2- put medium to high load(read/writes) on rs and observe

5.9Defect ID: 2956672
Some time-series tests implicitly rely on measurement insertion order for unordered inserts when checking bucket catalog stats
6.14Defect ID: 2965528
Remove push, publish_packages, and crypt_push tasks from Graviton 4 variants in v7.0 and v8.0
6.14Defect ID: 2947969
[SBE] Release storage engine resources when saveState() or restoreState() throws
5.68Defect ID: 2919474
StackLocator broken by v5 toolchain ASAN
5.88Defect ID: 2968769
Make new write path helper functions use acquireAndValidateBucketsCollection instead of acquireCollection

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

MongoDB - Defect ID: 2235698

Random high query response and very high CPU on running rs combination with 4.2.20 and 4.4.18 versions

MongoDB - Defect ID: 2235698

Random high query response and very high CPU on running rs combination with 4.2.20 and 4.4.18 versions

Last updated on 8/12/2024

Vendor details

Vendor details

Description

Info

Top User Comments

Steps to Reproduce

Links

Top MongoDB defects by risk score

Ready to prevent the next vendor outage?