BugZero | VMware BugID 1016106 - ESXi host takes a long time to start during rescan...

VMware - Defect ID: 1016106

ESXi host takes a long time to start during rescan of RDM LUNs

VMware - Defect ID: 1016106

ESXi host takes a long time to start during rescan of RDM LUNs

Last updated on 2/27/2024

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Overall: 0N/A

Severity: 0N/A

Community: 0N/A

Lifecycle: 0N/A

What is the BugZero Risk Score?

Vendor details

No defect details.

Symptoms

When RDMs are used as shared disk resources for a clustering solution such as WSFC, Red Hat High Availability Cluster, etc., you experience these symptoms: ESXi hosts hosting secondary nodes may take a long time to start. This time depends on the number of RDMs that are attached to the ESXi host.. Note: Example, In a system with 10 RDMs used in a two-node WSFC or Red Hat High Availability Cluster, restart of the ESXi host with the secondary node may take upto 30 minutes. In a system with less RDMs, the restart time is less. For example, if only three RDMs are used, the restart time is approximately 10 minutes.The ESXi host intermittently displays an error message on the Summary Tab and the vSphere Client may not be able to start: Cannot synchronize host hostname. Operation Timed out. The log in screen shows the start waiting after this message similar to: Loading module multiextent.

Cause

This issue occurs when virtual machines participating in a clustering solution such as WSFC, Red Hat High Availability Cluster use shared RDMs and SCSI reservations across hosts, and a virtual machine on the other host is the active cluster node holding a SCSI Reservation.The delay occurs at these steps: Starting path claiming and SCSI device discovery In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:vmkernel: 0:00:01:57.828 cpu0:4096)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.6006016045502500176a24d34fbbdf11vmkernel: 0:00:01:57.830 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:3:0 (naa.6006016045502500166a24d34fbbdf11)vmkernel: 0:00:02:37.842 cpu3:4099)ScsiDeviceIO: 1672: Command 0x1a to device "naa.6006016045502500176a24d34fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0 Mounting the partition of the RDM LUNs In the /var/log/vmkernel.log file of the restarting ESXi host, you see entries similar to:vmkernel: 0:00:08:58.811 cpu2:4098)WARNING: ScsiCore: 1353: Power-on Reset occurred on naa.600601604550250083489d914fbbdf11vmkernel: 0:00:08:58.814 cpu0:4096)VMNIX: VmkDev: 2122: Added SCSI device vml0:9:0 (naa.600601604550250082489d914fbbdf11)vmkernel: 0:00:09:38.855 cpu2:4098)ScsiDeviceIO: 1672: Command 0x1a to device "naa.600601604550250083489d914fbbdf11" failed H:0x5 D:0x0 P:0x0 Possible sense data: 0x0 0x0 0x0.vmkernel: 0:00:09:38.855 cpu1:4111)ScsiDeviceIO: 4494: Could not detect setting of QErr for device naa.600601604550250083489d914fbbdf11. Error Failure.vmkernel: 0:00:10:08.945 cpu1:4111)WARNING: Partition: 801: Partition table read from device naa.600601604550250083489d914fbbdf11 failed: I/O errorvmkernel: 0:00:10:08.945 cpu1:4111)ScsiDevice: 2200: Successfully registered device "naa.600601604550250083489d914fbbdf11" from plugin "NMP" of type 0vmkernel: 47:02:52:19.382 cpu17:9624)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4vmkernel: 47:02:52:19.383 cpu12:4108)WARNING: NMP: nmpUpdatePResvStateSuccess: Parameter List Length 54310000 for service action 0 is beyondthe supported value 18vmkernel: 47:02:52:21.383 cpu23:9621)WARNING: NMP: nmp_IsSupportedPResvCommand: Unsupported Persistent Reservation Command,service action 0 type 4 If you configure the setting on an existing VMFS LUN, you may see these entries in the /var/log/vmkernel.log file: cpu4:10169)WARNING: Partition: 1273: Device "naa.XXXXXXXXXXXXXXXXXXXxxxxxxxxxxxxx" with a VMFS partition is marked perennially reserved. This is not supported and may lead to data loss.You can safely ignore this warning for Clustered VMDK datastores. VMware Engineering is working on suppressing the message in future release.

Resolution

ESXi 6.x and ESXi 7.x Hosts For all ESXi 6.x and 7.x hosts, the command line, vSphere Client, and PowerCLI methods of setting the RDMs to perennially reserved are covered in the sections below:To mark the LUNs as perennially reserved: Determine which RDM LUNs are part of WSFC, Red Hat High Availability Cluster etc . From the vSphere Client, select a virtual machine that has a mapping to the cluster RDM devices.Edit your virtual machine settings and navigate to your Mapped RAW LUNs. In this example, Hard disk 2: In the Physical disk, there is the specification of the device in use as RDM (that is, the VML ID). Take note of the VML ID, which is a globally unique identifier for your shared device. Identify the naa.id for this VML using this command: esxcli storage core device list For example: esxcli storage core device list naa.6589cfc000000a17ac02aae02067e747 Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747) Has Settable Display Name: true Size: 40960 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747 Vendor: FreeNAS Model: iSCSI Disk Revision: 0123 SCSI Level: 6 Is Pseudo: false Status: degraded Is RDM Capable: true Is Local: false Is Removable: false Is SSD: false Is VVOL PE: false Is Offline: false Is Perennially Reserved: false Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: unknown Attached Filters: VAAI Status: supported Other UIDs: vml.010001000030303530353630313031303830310000695343534920 Is Shared Clusterwide: true Is SAS: false Is USB: false Is Boot Device: false Device Max Queue Depth: 128 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: falseUse the esxcli command to mark the device as perennially reserved: esxcli storage core device setconfig -d naa.id --perennially-reserved=true For example: esxcli storage core device setconfig -d naa.6589cfc000000a17ac02aae02067e747 --perennially-reserved=true Note: For vSphere 7.x, see the Change Perennial Reservation Settings section of the vSphere Storage Guide. To verify that the device is perennially reserved, run this command: esxcli storage core device list -d naa.id In the output of the esxcli command, search for the entry Is Perennially Reserved: true. This shows that the device is marked as perennially reserved. For example: esxcli storage core device list -d naa.6589cfc000000a17ac02aae02067e747 naa.6589cfc000000a17ac02aae02067e747 Display Name: FreeNAS iSCSI Disk (naa.6589cfc000000a17ac02aae02067e747) Has Settable Display Name: true Size: 40960 Device Type: Direct-Access Multipath Plugin: NMP Devfs Path: /vmfs/devices/disks/naa.6589cfc000000a17ac02aae02067e747 Vendor: FreeNAS Model: iSCSI Disk Revision: 0123 SCSI Level: 6 Is Pseudo: false Status: degraded Is RDM Capable: true Is Local: false Is Removable: false Is SSD: false Is VVOL PE: false Is Offline: false Is Perennially Reserved: true Queue Full Sample Size: 0 Queue Full Threshold: 0 Thin Provisioning Status: unknown Attached Filters: VAAI Status: supported Other UIDs: vml.010001000030303530353630313031303830310000695343534920 Is Shared Clusterwide: true Is SAS: false Is USB: false Is Boot Device: false Device Max Queue Depth: 128 No of outstanding IOs with competing worlds: 32 Drive Type: unknown RAID Level: unknown Number of Physical Drives: unknown Protection Enabled: false PI Activated: false PI Type: 0 PI Protection Mask: NO PROTECTION Supported Guard Types: NO GUARD SUPPORT DIX Enabled: false DIX Guard Type: NO GUARD SUPPORT Emulated DIX/DIF Enabled: false Repeat the procedure for each Mapped RAW LUN that is participating in the clustering solution such as WSFC, Red Hat High Availability Cluster, etc. Note: The configuration is permanently stored with the ESXi host and persists across restarts. To remove the perennially reserved flag, run this command: esxcli storage core device setconfig -d naa.id --perennially-reserved=false

Related Information

VMware Skyline Health Diagnostics for vSphere - FAQHow to detach a LUN device from ESXi hostsFor more information, see Obtaining LUN pathing information for ESX or ESXi hosts (1003973).Note: The PowerCLI and esxcli commands are case sensitive. If the naa.id is specified in uppercase letters when issuing the command, a new device is added on the ESXi host.The resolution steps in this article are also known to resolve storage devices reporting NMP errors similar to: WARNING: NMP: nmp_DeviceRetryCommand:133:Device "naa.600601604ec0360065efeed9d265e411": awaiting fast path state update for failover with I/O blocked. No prior reservation exists on the device.For more information, see: Identifying virtual machines with Raw Device Mappings (RDMs) using PowerCLI (2001823)Identifying Raw Device Mappings (RDMs) using the vSphere Client (1004814)Identifying virtual disks pointing to Raw Device Mappings (RDMs) (1005937) If you experience symptoms described above for Clustered VMDK, follow the steps to resolve the issue.

Original Vendor Announcement

No bugs this month

Ready to prevent the next vendor outage?

Get a demo

OPERATIONAL DEFECT DATABASE

VMware - Defect ID: 1016106

ESXi host takes a long time to start during rescan of RDM LUNs

VMware - Defect ID: 1016106

ESXi host takes a long time to start during rescan of RDM LUNs

Last updated on 2/27/2024

Vendor details

Vendor details

Description

Symptoms

Cause

Resolution

Related Information

Links

Top VMware defects by risk score

Ready to prevent the next vendor outage?