Symptoms
Instant Clone provisioning intermittently fails as the customization of a replica VM times out.An error " Error during provisioning: Initial publish failed: Fault type is UNKNOWN_FAULT_FATAL After waiting for 600 seconds Replica vm-XXXX still has not finished customization. Giving up!" appears in the horizon administration dashboard for the pool and connection server logs.DRS is enabled.The Horizon View Connection Server logs reference: Location of Horizon View log files (1027744) contains messages similar to the following:
2023-04-01T09:24:06.101+09:00 ERROR (14C0-0C64) <WFE-20> [Task] ExecuteMultipleTasksAction of Task=[PrimeResourcesTask:0853a98f-12ed-460c-ba63-e503d4185020:29974], requestId=e437465d-47ce-4852-8b1d-97d56884cccd failed at index=1 with com.vmware.daas.cloneprep.common.CPException: After waiting for 600 seconds Replica vm-XXXX still has not finished customization. Giving up!
2023-04-01T09:24:06.105+09:00 DEBUG (14C0-25C8) <PendingOperation-vm-11923:snapshot-12512-SchedulePushImage> [NgvcHelper] Priming of pool XXXX has failed. - UNKNOWN_FAULT_FATAL: After waiting for 600 seconds Replica vm-XXXX still has not finished customization. Giving up! : com.vmware.daas.cloneprep.common.CPException: After waiting for 600 seconds Replica vm-XXXX still has not finished customization. Giving up!
if you examine the VMware.log of the replica that has failed, log messages similar to the below are seen:
Reference: ESXi Log File Locations
2023-04-01T00:15:02.224Z| vmx| A100: ConfigDB: Setting config.readOnly = "TRUE"
2023-04-01T00:15:33.257Z| vcpu-1| W115: ConfigDB: Ignoring request to write config file
2023-04-01T00:15:59.938Z| vmx| A100: ConfigDB: Setting config.readOnly = "FALSE"
Cause
Normally, the CustomizationState in the VMX file is set to Success upon successful customization. However, when DRS vMotion runs at the same time, the VMX file becomes ReadOnly or Locked preventing the update of the CustomizationState.
Resolution
No Resolution at the moment. Please subscribe to this KB to ensure notification when this edge case is fixed in code.
Workaround
Options:
As this is an intermittent issue, a second attempt at an image push is often successful.Another option is to temporarily disable DRS during the image push to mitigate the intermittent failure risk of the replica-vm customization.
Related Information
Horizon View Best Practices: Instant Clone Provisioning and Troubleshooting (91265)This is a child article of UNKNOWN_FAULT_FATAL: An Index of Instant Clone Creation Errors caused by an Unknown Fault (91729)