Operational Defect Database

BugZero updated this defect 30 days ago.

VMware | 97689

ESXi host lost management connectivity during maintenance mode VMware NSX upgrade

Last update date:

4/19/2024

Affected products:

NSX-T

Affected releases:

4.x

Fixed releases:

No fixed releases provided.

Description:

Symptoms

Upgrading NSX to version 4.x.You are doing a upgrade using maintenance mode.The host management vmkernel are using a lacp (lag).During the upgrade, the host goes into maintenance mode and then losses network connectivity.If the management vmkernel does not use lacp and other vmkernels do, ESXi management connectivity may continue to work, but other services such as vMotion or storage may be impacted.The NSX-T upgrade UI may display the following log entry:Unexpected error while upgrading upgrade unit: Install of offline bundle failed on host <hsot-uuid> with error : VI SDK invoke exception:java.rmi.RemoteException: VI SDK invoke exception:org.dom4j.DocumentException. Please refer 'https://kb.vmware.com/s/article/91383' article for troubleshooting steps. Note: The KB in the log entry is unrelated, as this is not an in place upgrade. In the NSX-T manager /var/log/syslog, we see alerts such as: NSX 73274 - [nsx@6876 comp="nsx-manager" level="INFO" subcomp="ccp"] Connection closed received NettyConnection(NettyChannel(local=<ManagerNodeIP>:1235, remote=<TransportNodeIP>:45998), active=false) In the ESXi log /var/run/log/vmkernel.log you see entries similar to: cpu33:223985481)Team.vswitch: TeamVSLACPLAGEventCB:9087: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Received event LAG DESTROY, LAG /0, link UNKNOWN, uplink /0x0, link UNKNOWNcpu34:223985789)kcp: KCPSHARegisterEvent:545: [nsx@6876 comp="nsx-esx" subcomp="kcp"]KCP_SHA register VMK_PORTSET_EVENT_LACP_LAG event successcpu6:223985481)Net: 2184: connected LACP_MgmtPort to null config, portID 0x4000032cpu6:223985481)Team.vswitch: TeamVSLACPLAGEventCB:9119: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Received event LAG CREATE, LAG /0, link UNKNOWN, uplink /0x0, link UNKNOWN...cpu6:223985481)Team.vswitch: TeamVSPolicySet:8123: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Invalid Uplink : <LAG-NAME>, ignore itcpu6:223985481)Team.vswitch: TeamVSPolicySet:8123: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Invalid Uplink : <LAG-NAME>, ignore it

Cause

Due to a race condition, when using a lag, the lag may break during the NSX upgrade.This can occurs under the following conditions: A Maintenance Mode upgrade of the ESXi using a lag is carried out, which is the default upgrade.VDS/CVDS is used and the vmknic is on a ESXi owned DVPG (as opposed to an NSX segment)LAG/LACP is configured as an uplink for the DVPG

Resolution

This is a known issue impacting VMware NSX.

Workaround

To prevent this issue from occurring, you can set the ESXi host to upgrade in In-place mode, in this mode the host will not enter maintenance mode and will not trigger the script which leads to the race condition.If you have already upgrade the host and encountered this issue, to regain management connectivity to the ESXi again, you can, Reboot the ESXi host.Please note that In-place mode is not supported mode of upgrade for VMware Lifecycle Manager (vLCM) enabled clusters.

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Learn More

Search:

...