...
MDM events logs - LDAP logins sent to the MDM endlessly. 2021-06-03 16:19:45.981000:0075477:MDM_BECOMING_MASTER WARNING This MDM is switching to Master mode. MDM will start running. 2021-06-03 16:19:46.731000:0075480:MDM_CLUSTER_NORMAL INFO MDM cluster is now in NORMAL mode. ... 2021-06-03 16:20:56.042000:0075613:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [1829] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-03 16:20:56.180000:0075614:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [1829] 2021-06-03 16:25:56.023000:0075615:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [11066] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-03 16:25:56.043000:0075616:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [11066] 2021-06-03 16:30:55.989000:0075617:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [20301] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-03 16:30:56.048000:0075618:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [20301] 2021-06-03 16:32:31.807000:0075619:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'W577189@somecustomer.com'. [23256] Login banner read and approved manually. Originating source IP: 1.1.1.4 2021-06-03 16:32:31.827000:0075620:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [23256] ... 2021-06-04 07:53:21.848000:0076521:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [1725334] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 07:53:21.868000:0076522:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [1725334] 2021-06-04 07:55:56.309000:0076523:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [1730092] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 07:55:56.331000:0076524:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [1730092] 2021-06-04 08:00:56.591000:0076525:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [1739335] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 08:00:56.650000:0076526:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [1739335] 2021-06-04 08:05:56.300000:0076527:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [1748563] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 08:05:56.319000:0076528:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [1748563] ... 2021-06-04 17:50:52.581000:0077903:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [2831963] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 17:50:52.602000:0077904:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [2831963] 2021-06-04 17:55:52.586000:0077905:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'F690420@somecustomer.com'. [2841214] Login banner read and approved manually. Originating source IP: 1.1.1.1 2021-06-04 17:55:52.604000:0077906:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [2841214] 2021-06-04 17:59:09.640000:0077907:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'admin'. [2847297] Login banner read and approved manually. Originating source IP: 1.1.1.3 2021-06-04 17:59:09.665000:0077908:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [2847297] 2021-06-04 17:59:10.716000:0077909:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'admin'. [2847334] Login banner read and approved manually. Originating source IP: 1.1.1.3 2021-06-04 17:59:10.719000:0077910:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [2847334] ... 2021-06-04 20:21:21.359000:0078091:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'W577189@somecustomer.com'. [3110475] Login banner read and approved manually. Originating source IP: 1.1.1.4 2021-06-04 20:21:21.378000:0078092:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [3110475] 2021-06-04 20:22:02.269000:0078093:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'W577189@somecustomer.com'. [3111783] Login banner read and approved manually. Originating source IP: 1.1.1.4 2021-06-04 20:22:02.289000:0078094:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [3111783] 2021-06-04 20:22:08.479000:0078097:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'E828695@somecustomer.com'. [3111983] . Originating source IP: 1.1.1.2 2021-06-04 20:22:08.500000:0078098:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [3111983] 2021-06-04 20:22:34.029000:0078101:MDM_CLI_COMMAND_RECEIVED INFO Command login received, User: 'W577189@somecustomer.com'. [3112774] Login banner read and approved manually. Originating source IP: 1.1.1.4 2021-06-04 20:22:34.048000:0078102:CLI_COMMAND_SUCCEEDED INFO Command login succeeded. [3112774] ... 2021-06-04 20:30:12.557000:0041654:MDM_CLUSTER_LOST_CONNECTION WARNING The MDM, original_PRI (ID 5ab3891e52fa81d0), has lost connection to the cluster. 2021-06-04 20:30:12.557000:0143199:MDM_CLUSTER_LOST_CONNECTION WARNING The MDM, original_PRI (ID 5ab3891e52fa81d0), has lost connection to the cluster. 2021-06-04 20:30:12.820000:0143200:MDM_CLUSTER_BECOMING_MASTER WARNING This MDM, new_PRI (ID 3eb92c804fa11cd2), took control of the cluster and is now the Master MDM. 2021-06-04 20:30:13.036000:0143201:MDM_CLUSTER_NODE_DEGRADED ERROR MDM cluster node is now DEGRADED and is in offline node original_PRI (ID 5ab3891e52fa81d0); IPs: [1.2.3.1,1.2.3.2,1.2.3.3,1.2.3.4], Port: 9011 . 2021-06-04 20:30:13.036000:0143202:MDM_CLUSTER_NODE_NORMAL INFO MDM cluster node SEC (ID 38cdcf821f4f2881); IPs: [1.2.3.1,1.2.3.2,1.2.3.3,1.2.3.4], Port: 9011 is now in NORMAL state. 2021-06-04 20:30:13.038000:0143203:MDM_BECOMING_MASTER WARNING This MDM is switching to Master mode. MDM will start running. MDM trc.x - the following might appear in the logs before MDM crash: Listen socket is closedRunning UMT (nil), found to be stuckMarking as downLong UMT wait timeSocket receive failed, Status ABORTEDAborted zero inflight RPC messages due to disconnectMDM Busy 04/06 20:30:12.265143 7f69b57a5db0:mosCompressedTrace_DumpNow:00802: Compressed Trace dump starting, milis 50 04/06 20:30:12.265432 7f69baae5db0:netListen_Mit:00164: Listen socket is closed 04/06 20:30:12.265759 7f69bab12db0:netListen_Mit:00164: Listen socket is closed ... 04/06 20:30:12.408716 0:schedThrdGuard_SampleLivnes:02005: WARNING: pSchedThread 0x445d0a0, pOsThrd 0x445d0c0, #1 in scheduler 0x21d42c0, running UMT (nil), found to be stuck. 04/06 20:30:12.408784 0:schedThrdGuard_SampleLivnes:02005: WARNING: pSchedThread 0x445d560, pOsThrd 0x445d580, #2 in scheduler 0x21d42c0, running UMT (nil), found to be stuck. 04/06 20:30:12.408814 0:mosUmtSchedThrd_CheckAndReportLongWait:03178: WARNING: Long UMT wait time on umt sched thread 0x7f69c4005710, os thread 0x7f69c4005730, running UMT 0x7f69b57a5db0, scheduler 0x214b540. Max wait time 130 millis, Waiting UMT 0x7f69b578adb0, Number of waiters 1. ... 04/06 20:30:12.513227 0:schedThrdGuard_SampleLivnes:02005: WARNING: pSchedThread 0x445da20, pOsThrd 0x445da40, #3 in scheduler 0x21d42c0, running UMT (nil), found to be stuck. 04/06 20:30:12.513274 0:schedThrdGuard_SampleLivnes:02005: WARNING: pSchedThread 0x7f69c4005710, pOsThrd 0x7f69c4005730, #0 in scheduler 0x214b540, running UMT 0x7f69b57a5db0, found to be stuck. ... 04/06 20:30:12.826753 7f69b5793db0:netPath_IsKaNeeded:02298: :: Connected Live CLIENT path 0x7f69c4119df0 of portal 0x7f69c4119af0 net 0x7f69c4010240 socket 32 inflights 0 HS:0 didn't receive message for 6 iterations from 1.2.3.4:9011. Marking as down 04/06 20:30:12.826847 7f69b5793db0:netPath_IsKaNeeded:02298: :: Connected Live CLIENT path 0x7f69c411e1d0 of portal 0x7f69c4119c70 net 0x7f69c4010240 socket 34 inflights 0 HS:0 didn't receive message for 6 iterations from 1.2.3.4:9011. Marking as down ... 04/06 20:30:12.827076 7f69b5793db0:netPath_IsKaNeeded:02298: :: Connected Live CLIENT path 0x7f69c4117390 of portal 0x7f69c4110d00 net 0x7f69c4010240 socket 26 inflights 0 HS:0 didn't receive message for 6 iterations from 1.2.3.4:9011. Marking as down 04/06 20:30:12.827090 7f69b5793db0:netPath_IsKaNeeded:02298: :: Connected Live CLIENT path 0x7f69c4110dc0 of portal 0x7f69c4110ac0 net 0x7f69c4010240 socket 54 inflights 0 HS:0 didn't receive message for 6 iterations from 1.2.3.4:9011. Marking as down 01/02 15:34:21.878915 0x7f444dd57db0:mosEventLog_PostInternal:00609: New event added. Message: "Command login received, User: 'D569206@test'. [44428542]". Additional info: "Login banner read and approved manually. Originating source IP: 1.2.3.4" Severity: Info 01/02 15:34:21.879025 (nil):mosOsThrd_StartFunc:00576: Starting thread () tid 7280 01/02 15:34:21.879119 (nil):mosLdap_InitConnection:00205: (7280) Connection was successfully established to ldaps://ldap.test.net ldap service 01/02 15:34:22.315417 (nil):mosUmtSchedThrd_CheckAndReportLongWait:03178: WARNING: Long UMT wait time on umt sched thread 0x3970be0, os thread 0x3970c00, running UMT 0x7f4452aeedb0, scheduler 0x21d42c0. Max wait time 410 millis, Waiting UMT 0x7f4452ae5db0, Number of waiters 2. ... 01/02 15:34:22.316855 0x7f444d78adb0:netPath_RcvLoop_CK:01611: WARNING: :: Connected Live SERVER path 0x7f43f4019fa0 of portal 0x7f43f402a3a0 net 0x7f445c010240 socket 31 inflights 0 HS:0 socket receive failed, Status ABORTED 01/02 15:34:22.316872 0x7f444d78adb0:netPath_RcvLoop_CK:01611: WARNING: :: Connected Live SERVER path 0x7f43f4017db0 of portal 0x7f43f4014580 net 0x7f445c010240 socket 30 inflights 0 HS:0 socket receive failed, Status ABORTED 01/02 15:34:22.316883 0x7f444d78adb0:netCon_AbortInflightRpcsUptoDummyNode:05869: Aborted 0 inflight RPC messages due to disconnect. ... 01/02 15:34:22.318008 0x7f444d78adb0:netPath_RcvLoop_CK:01611: WARNING: :: Connected Live SERVER path 0x7f43f40085d0 of portal 0x7f43f4014d80 net 0x7f445c010240 socket 43 inflights 0 HS:0 socket receive failed, Status ABORTED 01/02 15:34:22.318024 0x7f444d78adb0:netCon_AbortInflightRpcsUptoDummyNode:05869: Aborted 0 inflight RPC messages due to disconnect. ... 01/02 15:34:22.321016 0x7f444d7a5db0:mdmNet_AbnormalExitCK:00334: Will pause network 01/02 15:34:22.321039 0x7f444d7a5db0:net_Pause:02235: Net paused 1 (reversible 0) 01/02 15:34:22.321113 0x7f444d7a5db0:net_Pause:02235: Net paused 1 (reversible 0) 01/02 15:34:22.705208 (nil):mosTrcLayer_Create:00239: ---------- Process started. Version private PowerFlex R3_5.1200.104_Release, CodeBase , Nov 30 2020. PID 7312 ---------- 12/01 13:43:24.112344 0x7f1ae9d33db0:mosEventLog_PostInternal:00609: New event added. Message: "Command login received, User: 'D569206@somecustomer.com'. [21793543]". Additional info: "Login banner read and approved manually. Originating source IP: 1.2.3.4" Severity: Info 12/01 13:43:24.112438 (nil):mosOsThrd_StartFunc:00576: Starting thread () tid 99811 12/01 13:43:24.112499 (nil):mosLdap_InitConnection:00205: (99811) Connection was successfully established to ldaps://ldap.somecustomer.com ldap service 12/01 13:43:24.191585 (nil):mosLdap_Bind:00101: (99811) After ldap_sasl_bind - rc: 0, msg-id: 1 12/01 13:43:24.215639 (nil):mosLdap_Bind:00143: (99811) User D569206@somecustomer.com was successfully binded to LDAP service 12/01 13:43:24.238744 (nil):mosLdap_CheckUserInGroup:00291: (99811) LDAP search did not return any data 12/01 13:43:24.261895 (nil):mosLdap_CheckUserInGroup:00291: (99811) LDAP search did not return any data 12/01 13:43:24.285003 (nil):mosLdap_CheckUserInGroup:00291: (99811) LDAP search did not return any data 12/01 13:43:24.331252 (nil):mosLdap_CheckUserInGroup:00291: (99811) LDAP search did not return any data 12/01 13:43:24.331469 (nil):mosLdap_SearchUserInDomainGroupsIntr:00579: (99811) User "D569206@somecustomer.com" was found in 1 groups - search-user: D569206, bind-user: D569206@somecustomer.com, rc: SUCCESS, try: 1 12/01 13:43:24.331545 0x7f1ae9d33db0:repType_AllocObjects:04767: Allocating 1 objects (first-index: 10) of type SESSION 12/01 13:43:24.331571 0x7f1ae9715db0:repExtent_IO:03750: Writing to the repository - Extent: 7092, Page: 113778, Size: 8192, Batch size: 1, Last type: SESSION 12/01 13:43:24.333049 0x7f1ae9730db0:repExtent_IO:03750: Writing to the repository - Extent: 7092, Page: 113778, Size: 8192, Batch size: 1, Last type: SESSION 12/01 13:43:24.333775 0x7f1ae9d33db0:mosEventLog_PostInternal:00609: New event added. Message: "Command login succeeded. [21793543]". Additional info: "" Severity: Info 12/01 13:43:24.972268 0x7f1ae9cb5db0:actor_FillQueryClusterInfoTrylock:08661: MDM busy /var/log/messages Feb 1 15:34:22 hostname1 init: mdm main process (42940) terminated with status 254 Feb 1 15:34:22 hostname1 init: mdm main process ended, respawning ImpactFailed LDAP login attempts.Login-based automation fails or misbehaves.Unstable MDM cluster.
PowerFlex uses the OpenLDAP library for LDAP authentications and connections. In environments with extreme amounts of LDAP login requests and network latency, it can trigger a large number of stuck threads within the MDM. PowerFlex code saves an LDAP login request approval for up to 8 hours. And in situations where constant login requests come in, the OpenLDAP code queues all requests and releases frequently. This causes the MDM to be overwhelmed in login requests, from the same user. As a result, the system responds in "MDM busy," which declines any other incoming requests and API calls. Eventually, the MDM crashes, and switches over to a Secondary MDM.
One (or all) of the following actions can be implemented: Reduce the login attempts done by LDAP.Increase the time interval between login attempts done by LDAP. Configure the following allowed values for LDAP timeout in the MDM configuration file and restart the service to apply the changes: Edit the .../mdm/cfg/conf.txt file.Add the following parameters and values, depending on the customer's environment, preferences, and network state ldap_timeout_sec:8-10 ORldap_timeout_sec:13-15 ORldap_timeout_sec:23-30 Verify that there is no rebuild or rebalance before moving to the next step.When applicable, place the component into Maintenance Mode.pkill the process/restart the service. NOTE: Option number 3 might reduce the MDM switchovers drastically, but not prevent them. MDM login request mechanism saves each login request approval up to 8 hours.