...
Scenario SDS device fails or reports errors, when trying to use SDS "clear device error," the device remains in an error or failure state. Symptoms ScaleIO system events report disk device errors or failure: 799 2016-01-22 17:28:39.818 SDS_DEV_ERROR_REPORT ERROR Device error reported on SDS: 10.3.1.21, Device: /dev/sdb. State: NORMAL upDownState: UP processState: DEV_ERR_INPROGRESS devErrState: REPORT Query SDS reports disk device errors or failure: ScaleIO-10-1-1-202:~ # scli --query_sds --sds_id 9780122600000003 Device information (total 8 devices): 1: Name: ScaleIO-6a0a6209 Path: /dev/sdb Original-path: /dev/sdb ID: 851f01c100030000 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 2: Name: ScaleIO-6a0a620a Path: /dev/sdc Original-path: /dev/sdc ID: 851f01c200030001 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 3: Name: ScaleIO-6a0a620b Path: /dev/sdd Original-path: /dev/sdd ID: 851f01c300030002 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 4: Name: ScaleIO-6a0a620c Path: /dev/sde Original-path: /dev/sde ID: 851f01c400030003 Storage Pool: SSDPOOL, Capacity: 1489 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 5: Name: ScaleIO-6a0a620d Path: /dev/sdf Original-path: /dev/sdf ID: 851f01c500030004 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Error 6: Name: ScaleIO-6a0a620e Path: /dev/sdg Original-path: /dev/sdg ID: 851f01c600030005 Storage Pool: SSDPOOL, Capacity: 1489 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 7: Name: ScaleIO-6a0a620f Path: /dev/sdh Original-path: /dev/sdh ID: 851f01c700030006 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal 8: Name: ScaleIO-6a0a6210 Path: /dev/sdi Original-path: /dev/sdi ID: 851f01c800030007 Storage Pool: SASPOOL, Capacity: 1675 GB Error-fixes: 0 scanned 0 MB, Compare errors: 0 State: Normal SVM/Linux message file reports an offline device: Jan 22 17:28:35 ScaleIO-10-1-1-201 kernel: [45678.865605] end_request: I/O error, dev sdg, sector 1138313984 Jan 22 17:28:37 ScaleIO-10-1-1-201 kernel: [45681.452800] sd 2:0:6:0: [sdg] task abort on host 2, ffff8800b83f6e80 Jan 22 17:28:37 ScaleIO-10-1-1-201 kernel: [45681.452877] sd 2:0:1:0: [sdb] task abort on host 2, ffff8801b7476d80 Jan 22 17:28:37 ScaleIO-10-1-1-201 kernel: [45681.453086] sd 2:0:8:0: [sdh] task abort on host 2, ffff8800b83f6280 Jan 22 17:28:37 ScaleIO-10-1-1-201 kernel: [45681.453109] sd 2:0:8:0: [sdh] task abort on host 2, ffff8800a37a6c80 Jan 22 17:28:37 ScaleIO-10-1-1-201 kernel: [45681.453133] sd 2:0:9:0: [sdi] task abort on host 2, ffff8800b83f6b80 Jan 22 17:28:47 ScaleIO-10-1-1-201 kernel: [45691.537180] sd 2:0:5:0: rejecting I/O to offline device ESXi VMkernel log report errors on disk device: 2016-01-22T09:40:21.801Z cpu1:33420)ScsiDeviceIO: 7024: Could not detect setting of QErr for device naa.614187704f3b47001e34b585468abf85. Error Not supported. 2016-01-22T09:40:21.801Z cpu1:33420)ScsiDeviceIO: 7538: Could not detect setting of sitpua for device naa.614187704f3b47001e34b585468abf85. Error Not supported. 2016-01-22T09:40:21.801Z cpu5:33593)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x28 (0x439e1a830cc0, 0) to dev "naa.614187704f3b47001e34b585468abf85" on path "vmhba1:C2:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0. Act:NONE 2016-01-22T09:40:21.801Z cpu5:33593)ScsiDeviceIO: 2607: Cmd(0x439e1a830cc0) 0x28, CmdSN 0xd62 from world 0 to dev "naa.614187704f3b47001e34b585468abf85" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0. 2016-01-22T09:40:21.801Z cpu5:33593)ScsiCore: 1609: Power-on Reset occurred on naa.614187704f3b47001e34b585468abf85 2016-01-22T09:40:21.844Z cpu5:33593)NMP: nmp_ThrottleLogForDevice:3178: Cmd 0x1a (0x439e1a830cc0, 0) to dev "naa.614187704f3b47001e34b585468abf85" on path "vmhba1:C2:T1:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. Act:NONE 2016-01-22T09:40:21.844Z cpu5:33593)ScsiDeviceIO: 2645: Cmd(0x439e1a830cc0) 0x1a, CmdSN 0xd66 from world 0 to dev "naa.614187704f3b47001e34b585468abf85" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0. 2016-01-22T09:40:21.844Z cpu1:33420)ScsiDevice: 3835: Successfully registered device "naa.614187704f3b47001e34b585468abf85" from plugin "NMP" of type 0 2016-01-22T09:40:21.844Z cpu1:33420)NMP: nmp_DeviceUpdateProtectionInfo:569: Set protection info for device 'naa.614187704f3b47001e34b585468abf85', Enabled: 0 ProtType: 0x0 Guard: 0x0 ProtMask: 0x0 2016-01-22T22:27:49.085Z cpu19:33115)WARNING: NMP: nmpDeviceTaskMgmt:2284: Attempt to issue lun reset on device naa.614187704f3b47001e34b585468abf85. This will clear any SCSI-2 reservations on the device. Impact The disk device remains as failed.You cannot clear SDS device errors.
When there is an issue with the disk device and the disk device is not responding for any reason, the OS offlines the disk device. Note: If the disk device is malfunctioning, the device does not come back to an online state, and a disk device replacement may be required.
Workaround SVM - Linux environment: Verify the current state of the disk device: [root@ssltest ~]# cat /sys/block/sdx/device/state offline If the disk device is marked as "offline," use the following command to "online" the disk device: echo "running" > /sys/block/sdx/device/state Using SCLI or UI, clear the SDS device error. Windows Environment: Verify the current state of the disk device using "Logical Disk Manager" or Disk Part: C:\>diskpart Microsoft DiskPart version 6.1.7601 Copyright (C) 1999-2008 Microsoft Corporation. On computer: ISENABLOVSL1C DISKPART> list disk Disk ### Status Size Free Dyn Gpt -------- ------------- ------- ------- --- --- Disk 0 Online 238 GB 0 B DISKPART> If the disk device is marked as "offline," use the following command to "online" the disk device or use "Logical Disk Manager": DISKPART> online disk An alternative option to online disk device on any OS: Note: This option requires "downtime," and triggers rebuild/re-balance on the ScaleIO system: If possible, enter the SDS into Maintenance ModeReboot SDS ServerExit Maintenance Mode (if done on step one)Clear SDS device error (using UI or CLI)Verify device state in ScaleIO