...
An issue exists where any SATA/SAS/NVMe SSD drive configured in a VMware All-Flash vSAN disk group may be mistakenly reported as failed and marked by vSAN as having a permanent error. This is due to the Medium Errors being continuously reported after multiple attempts to remap the bad area by the ESXi operating system. The SMART data will be retrieved directly from the SSD device and show the drive has available spare space to remap bad areas on the drive. vSAN 6.7 or 7.x may not allow the recovery of a single Unrecoverable Read Error (URE), when it occurs in the metadata regions of an all flash vSAN disk group, without removing the disk group from the vSAN first. Depending on the version of ESXi and features enabled, the host may perform an "autoDG" creation operation on the failed disk group in an attempt to repair the bad area on a disk group. As a result, a drive may be reported as failed after multiple attempts to repair the drive using the "autoDG" operation. This may happen because of how vSAN may interact with various vendor drives in the handling of the 5-10% area used for metadata operations. Based on VMware KB 81121, an autoDG creation feature runs a TRIM utility, and by default TRIM only runs on the first 5-10% of the metadata region. If the bad area is beyond the 5-10% on the drive, the bad area will not be remapped, causing premature replacement of the drive. Note : Refer to VMware KB 81121 for more information about vSAN Medium Error handling.
Any HPE SATA/SAS/NVMe SSD drive being used in a VMware all-flash vSAN 6.7 or 7.x environment.
Note : Ensure the vSAN disk group has been removed from the cluster before performing any operations below. Several methods are mentioned below to use in recovering the drive. The first step is to verify if the drive has available spare space to remap the bad area. Perform the following operations to retrieve the SMART data directly from the drive: CPU Direct Connect PCIe/NVMe Open ESXi shell and execute following command to retrieve SMART data, #esxcli nvme device log smart get -A vmhbaX..... vmhbaX represents the drive experiencing the failure Example from using vmhba2: HPE Smart Array Connected SATA/SAS 1. Open ESXi shell and execute following command to retrieve SMART data. 2. Generate diagnostic report files: esxcli ssacli cmd -q "ctrl all diag file=/tmp/adureport.zip zip=on ris=on xml=on" SmartSSD Wear Gauge report files: esxcli ssacli cmd -q "ctrl all diag file=/tmp/adureport.zip zip=on ris=on xml=on ssdrpt=on" 3. After determining the drive has available spare space, use the following procedure to remap the bad area on the drive. 4. Determine the device name of the drive in need of remapping: #esxcfg-devs -A Example of results: 5. Use the dd command with the device name to repair the bad area on the disk: dd if=/dev/zero of=/vmfs/devices/disks/t10.NVMe____KCD6XLUL3T84____________________________01495301E28EE bs=1M count=1000 conv=notrunc 6. Verify using od command: od -b /vmfs/devices/disks/t10.ATA_____VR0340GEJXN_____________________________BTWM63120AW6340C____ -N 10000 -v Smart Array Controller attached drives : Example below uses controller slot=1, and physical drive port 1, box 1, bay 6. 1. Log into ESXi shell. 2. Execute the following commands to remap the bad area on the drive: esxcli ssacli cmd -q "ctrl slot=1 pd 1i:1:6 show detail" esxcli ssacli cmd -q "ctrl slot=1 pd 1i:1:6 modify erase erasepattern=block unrestricted=off forced" esxcli ssacli cmd -q "ctrl all show config " | grep "1I:1:6" esxcli ssacli cmd -q "ctrl slot=1 pd 1i:1:6 modify enableeraseddrive" NOTE : An alternate method to erase bad areas on a drive is to launch HPE Intelligent Provisioning (F10) during POST. Select the Erase utility. This option will erase ALL data on all drives installed in the system. Ensure only drives needing to be erased are connected to the system when using the Erase utility. RECEIVE PROACTIVE UPDATES : Receive support alerts (such as Customer Advisories), as well as updates on drivers, software, firmware, and customer replaceable components, proactively in your e-mail through HPE Support Alerts.Sign up for Support Alerts at the following URL: HPE Email Preference Center NAVIGATION TIP: For hints on navigating HPE.com to locate the latest drivers, patches and other support software downloads, refer to the Navigation Tips document. SEARCH TIP: For hints on locating similar documents on HPE.com, refer to the Search Tips document.