...
ESXi host may show "network redundancy lost" or "network redundancy degraded" alarm.The ESXi host may show some vmnics as "DOWN" currently.If no vmnics are down unexpectedly, yet you see redundancy lost messages, there may be NIC flapping.You may see an error similar to: Lost network connectivity on virtual switch "vSwitch0". Physical NIC vmnic1 is down. Affected portgroups: "VM Network". When the NIC Teaming network adapter fails with a failed criteria code, you may get this error in the vobd log similar to: Nov 28 22:25:19 10.2.0.103 Nov 29 03:25:19 vobd: Nov 29 03:25:19.927: 8596737035540us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic2 is down. Affected dvPort: 132/5d 0d 2c 50 4a 71 82 e7-54 c7 69 ed b2 01 6f bc. 1 uplinks up. Failed criteria: 130. Note: The preceding log excerpts are only examples. Date, time, and environmental variables may vary depending on your environment.A physical NIC may be brought down due to frequent link status flapping When a physical NIC experiences frequent link status flapping, ESXi may turn off the physical NIC to avoid unnecessary link failover and network instability. For example, a physical NIC driver may reset the physical NIC when it detects "tx Hang". A reset causes a link down/up event to be generated. If the physical NIC is itself bad, this causes periodic reset and link status flapping. The vmkernel.log log file on the ESXi host will show that one or more network cards and constantly changing from a Down to Up state and visa versa. During link status flapping, traffic may migrate between different physical NICs. If the flapping is too frequent, you experience network issues, such as bandwidth reduction or an unstable network.
This article provides information on troubleshooting issues when a network adapter fails. The ESXi/vCenter UI and ESXi logs show NIC adapter alerts and messages. This KB goes over typical checks that can be done for troubleshooting.For more adapter troubleshooting steps, please open a support ticket with the hardware vendor.
The services associated with the affected portgroups are disconnected from the associated physical networks: virtual machines, service console, VMotion, NFS, iSCSI, and Management Services.
To determine the cause of the failure or eliminate common NIC issues: Check the current status of the vmnic from either the VMware vSphere Client or the ESXi service console: To check the status from the vSphere Client: Select the ESX host and click the Configuration tab.Click Networking.The vmnics currently assigned to virtual switches are displayed in the diagrams. If a vmnic displays a red X, that link is currently down. To check the status from the ESXi service console, run this command: esxcli network nic list The output appears similar to this: Name PCI Device Driver Admin Status Link Status Speed Duplex MAC Address MTU Description ------- ------------ ------ ------------ ----------- ------ ------ ----------------- ---- ----------- vmnic0 0000:01:00.0 ixgben Up Up 1000 Full ec:f4:bb:da:b6:c8 1500 Intel(R) Ethernet Controller X540-AT2 vmnic1 0000:01:00.1 ixgben Up Up 1000 Full ec:f4:bb:da:b6:ca 1500 Intel(R) Ethernet Controller X540-AT2 Note: The Admin Status is the only portion of the output that ESXi controls. This is done by using the these commands: esxcli network nic down -n vmnicXesxcli network nic up -n vmnicX The Link Status column specifies the status of the link between the physical network adapter and the physical switch. The status can be either Up or Down. If there are several network adapters, with some being up and some down, you might have to verify that they are connected to the intended physical switch ports. This can be done by bringing down each of the ESXi host's ports on the physical switch and running the command to observe which vmnic is affected. 2. Check that the vmnic referred to in the event message is still connected to the switch and configured properly: Make sure that the network cable is still connected to the switch and to the host.Check that the switch connected to the system is still functioning properly and has not been misconfigured. Refer to the switch documentation for details.Check for activity between the physical switch and the vmnic. This might be indicated either by a network trace or activity LEDs.Check that the NIC driver is up to date: Determining Network/Storage firmware and driver version in ESXi. 3. Search for the word "vmnic" in the vobd log file. If you see "vmnic down" or "vmnic up" messages, the NIC may be flapping. Note: Some NICs report the NIC link up state only, not down. If the NIC is reported as "up" and the host was not rebooting, this is an indication that the NIC is flapping and not reporting the down state to ESXi.Check for a failed criteria code with the vmnic messages. If there is a failed criteria code listed, please see step 4 below.If there is no failed criteria code, and everything was checked in step 2 above, call the hardware vendor about this flapping. 4. In the vobd file,the vmnic failure may be classified with a Failed criteria code in this log. This code explains the reason for the vmnic failure. Example:2020-11-17T15:37:00.330Z: [netCorrelator] 4836107000843us: [vob.net.dvport.uplink.transition.down] Uplink: vmnic4 is down. Affected dvPort: 28/50 24 e2 d9 41 e2 48 58-7d da b4 fd 4a bd 37 92. 3 uplinks up. Failed criteria: 128 Time - Event - Uplink# - State - Port - vSwitch - # Active Uplinks left - Failed Criteria Note: # Active Uplinks left is indication of a failover which identifies the number of active uplinks left in the teaming policy of the virtual switch. The following are the failed criteria codes. 1 – Link speed reported by the driver (exact match for compliance)2 – Link speed reported by the driver (equal or greater for compliance)4 – Link duplex reported by the driver 8 – Link LACP state down32 – Beacon probing64 – Errors reported by the driver or hardware128 – Link state reported by the driver 256 – The port is blocked512 – The driver has registered the device Note: Failed Criteria 128 is driver reporting link state down. This can be caused by unplugging the network cable or administratively downing the physical switchport. If this was not an intended link outage it will likely be an issue with the driver, firmware, SFP+ module, cable, and/or switchport of the physical switch. Check the driver by following the below KB, and call the host hardware vendor for further troubleshooting when Failed criteria 128's are seen in the vobd log. For more information, see Determining Network/Storage firmware and driver version in ESXi. Note: The failure codes are accumulative so can be added together when multiple criteria are met.When there are multiple failures, you see entries similar to these in the vobd.log file: 2012-04-05T11:22:10.449Z: [netCorrelator] 1123644995238us: [vob.net.pg.uplink.transition.down] Uplink: vmnic3 is down. Affected portgroup: iSCSI-Netz2. 0 uplinks up. Failed criteria: 130 The failed criteria here is 130, which is 2 + 128. This is a combination of these two failure codes: Link speed reported by the driver (equal or greater for compliance) Link state reported by the driver
VMware Skyline Health Diagnostics for vSphere - FAQFor more information, see the vSphere API/SDK Documentation.The criteria that are used to determine if a network adapter in a network adapter team has failed include: checkBeacon – By default, this check is disabled. This check becomes active when Beacon Probing is enabled on a virtual switch.checkDuplex – By default, this check is disabled. If checkDuplex is true, the configured duplex mode is fullDuplex and the link is considered to be bad if the link duplex reported by driver is not the same as fullDuplexIf checkDuplex is false, fullDuplex is unused and the link duplexity is not used as a detection method. checkErrorPercent – By default, this check is disabled. If checkErrorPercent is true, the percentage mentioned in the criteria is the configured error percentage that is tolerated. The link is considered to be bad if error rate exceeds percentage.If checkErrorPercent is false, percentage is unused, and error percentage is not used as a detection method. checkSpeed – The default setting is Minimum and has a default value of 10Mbps. To use link speed as the criteria, checkSpeed must be one of these values: exact – Use exact speed to detect link failure. Speed is the configured exact speed in megabits per second.minimum – Use minimum speed to detect failure. Speed is the configured minimum speed in megabits per second.empty string – Do not use link speed to detect failure. Speed is unused in this case. The Failed criteria code of 32 indicates the link has failed due to Beacon Probing detecting a problem. Beacon Probing sends beacons per VLAN between physical NICs in a team. When these are not received by other NICs this means that there is a problem in the physical network.