...
HCX NE Appliance may stuck in unstable state post unextension workflow, as a result any service mesh operation for NE Appliance like "Resync" or "Redeploy" won't be serviced further.Below error can be seen in the cloud HCX logs: 2023-03-12 00:21:22.210 UTC [InterconnectService_SvcThread-4705, J:d02ceb0d, , TxId: a8d54855-d3ae-45b6-9157-589de981836e] ERROR c.v.v.h.s.i.ProcessServiceMesh- checkResources failed, errorCode:null. stacktrace:null, errorMessage:Interconnect Service Workflow AllocateResourcesForInterconnectAppliance failed. Error: Error while configuring appliance networks.. Cause: Could not resolve segment /infra/tier-1s/cgw/segments/hcx-ne-8b83b725-de2c-43ca-bf92-969ccc0209d7 to opaque network. Failed to get realized state. Result: {"status":"failure","statusCode":404,"details":"","result":{"httpStatus":"NOT_FOUND","error_code":500090,"module_name":"Policy","error_message":"Policy object path=[\/infra\/tier-1s\/cgw\/segments\/hcx-ne-8b83b725-de2c-43ca-bf92-969ccc0209d7] does not exist."}} Location of App Engine log: HCX Manager : /common/log/admin/app.log
This document is created as a reference for the HCX NE Appliance redeployment failure post network unextension workflow and how to recover that.
When a segment gets extended using a given NE Appliance, HCX Cloud Manager does an API call to create HCX defined segment such as "L2E-XXXX" on the cloud/dst NSX-T Policy-UI.If unstretch workflow fails due to some potential issues in the backend system or due to infrastructure, NSX-T doesn't remove or cleanup extended segment "L2E-XXXX" from its Policy-UI. IMPORTANT: One of the potential cause where network unextension workflow may fail is due to high memory symptom observed in a given NE Appliance when user tries to extend more than 5 segments per NE Appliance.This has been documented in VMware Knowledge Base Article 91086 and fixed in HCX 4.6.1 version onwards.Due to some reason if user performs cleanup or delete "L2E-XXXX" segment manually using NSX-T Policy UI or API, the HCX Manager won't get notified by NSX Management layer, which causes HCX to maintain stretched segment backing record in the backend system.Note: Users are allowed to delete any segment from NSX-T Policy-UI or API if NO VMs attached to that segment.As a result, when user tries to "Resync" OR "Redeploy" operation for that SM/NE Appliance, it tries to validate the extended segment backing record with NSX-T but it returns "404" error code as "NOT FOUND" since its already deleted from NSX-T.RECOMMENDATIONS: DO NOT delete or remove any L2E segments from cloud NSX-T until HCX and its associated components are undeployed in a given cloud environment. Note: L2E segments created on cloud NSX-T as part of failed extension/unextension workflow can be used again for re-extension workflow using HCX.Note: L2E segments are pretty much same as NSX-T native segments and can be used as a regular segments to connect workload VMs. Also, DO NOT delete or remove any such rules like "HCX-L2CMacDiscoveryProfile" from NSX-T pointing to HCX extensions workflow, which may end up service mesh and NE Appliance redeployment.
This will impact those NE Appliances used during unextension workflow.This issue will only be surfaced when user removes extended segment (L2E-XXXX) from cloud NSX-T Policy-UI or API.NE running in HA mode will also be affected.Existing extended datapath won't be affected.There will be NO impact to Interconnect IX Appliances.
None at the moment.
Open a Service Request with VMware Global Support Services and include the required information to work on remediation.
Refer HCX - NE appliance state becomes critical due to Memory component