...
While updating tests to run on Windows Server 2022 for Mongo 8.0 platform support, several issues were discovered in the noPassthrough suite: https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21?execution=2&sortBy=STATUS&sortDir=ASC https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/tests?execution=1&sortBy=STATUS&sortDir=ASC https://spruce.mongodb.com/task/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/tests?execution=0&sortBy=STATUS&sortDir=ASC The commit this branch is based off of does not have this issue and the only changes are switching the evergreen host distro from "windows-vsCurrent-large" (Windows Server 2019) to "windows-2022-large" (Windows Server 2022) The version upgrade will use a workaround to decrease resmoke concurrency to avoid exhausting the system's memory, but it's still unclear why the upgrade caused memory usage to increase. max.hirschhorn@mongodb.com's analysis: The Evergreen timeout in execution #3 appears to be caused by slow resmoke logging which led to the primary of the replica set stepping down and hitting fassert(7152000) due to being unable to step down quickly enough since the mongod was fsyncLocked. [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:49:41.751+01:00 I REPL 21809 [S] [ReplCoord-0] "Can't see a majority of the set, relinquishing primary" ... [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:50:11.832+01:00 F REPL 5675600 [S] [ReplCoord-0] "Time out exceeded waiting for RSTL, stepUp/stepDown is not possible thus calling abort() to allow cluster to progress","attr":{"lockRep":{"ReplicationStateTransition":{"acquireCount": {"W":1} ,"acquireWaitCount": {"W":1} ,"timeAcquiringMicros":{"W":30079690}}}} [js_test:sharded_pit_backup_restore_simple] d20846| 2024-06-13T01:50:11.832+01:00 F ASSERT 23089 [S] [ReplCoord-0] "Fatal assertion","attr": {"msgid":7152000,"file":"src\\mongo\\db\\repl\\replication_coordinator_impl.cpp","line":2964} https://parsley.mongodb.com/test/mongodb_mongo_master_enterprise_windows_all_feature_flags_required_noPassthrough_1_windows_enterprise_patch_d60231163ae986719f5b012c47fb065331fabdab_6669f1b564e1ae0007c8514b_24_06_12_19_07_21/2/af21249a209a8a57122acbfa50b9bb32?bookmarks=0,118966,137712,239798,242772&filters=10020846%255C%257C.%2A%255C%255BReplCoord-0%255C%255D&shareLine=0 The Evergreen timeout in execution #2 appears to be caused by out_timeseries_cleans_up_bucket_collections.js though I couldn't say why. The logs are incomplete for the other tests because the flush thread had a MemoryError exception. The memory usage hits ~100% at 22:36 UTC but neither the system logs nor system_resource_info.json can identify what is consuming the excessive memory. Notably, the sum of the memory among the processes listed only totals to 10-13GB of the 33GB available. The Evergreen failure in execution #1 has 7 of the 8 tests failing with "out of memory".
xgen-internal-githook commented on Mon, 8 Jul 2024 22:57:27 +0000: Author: {'name': 'Louis Williams', 'email': 'louiswilliams@users.noreply.github.com', 'username': 'louiswilliams'} Message: SERVER-91824 Remove TODO for SERVER-91466 (#24430) GitOrigin-RevId: a40e69bc20b36dfe7ffc3e241a7f77a2930cbfb3 Branch: master https://github.com/mongodb/mongo/commit/691442bb1ec633ef090fdbda3a7457dc7fd6df8b gregory.noma commented on Tue, 25 Jun 2024 15:32:52 +0000: The sharded backup tests spawns 9 mongods and one CSRS so it's a pretty resource-intensive test. We also in the future won't be supporting Windows as a production platform anyway, and we fixed the failures here already. So, closing out this ticket. dbeng-pm-bot commented on Thu, 13 Jun 2024 19:41:51 +0000: This issue has been flagged for rapid response! Assignees of rapid response tickets are responsible for providing a daily update on this issue using the 'Server Rapid Response' canned comment template. Any questions about this ticket can be directed to the #server-rapid-response Slack channel and more information on the Server Rapid Response process can be found on the Wiki louis.williams commented on Thu, 13 Jun 2024 13:22:02 +0000: Thanks max.hirschhorn@mongodb.com. I agree that we are likely running too many tasks on these hosts. I'm going to re-assign to Dev Prod Build to investigate if reducing the number of tasks solves the problem.