Operational Defect Database

BugZero updated this defect 26 days ago.

VMware | 97866

After vCenter certificate replacement, reconciliation of TKGS cluster is failing

Last update date:

4/23/2024

Affected products:

vCenter Server

Affected releases:

7.08.0

Fixed releases:

No fixed releases provided.

Description:

Symptoms

Certificates of the vCenter Server were changed (e.g. machine certificate or STS certificate).Reconciliation of TKGS cluster is failing.The nsx-ncp pods remain stuck in "Init" state: # kubectl get pods -n vmware-system-nsxoutput:NAME READY STATUS RESTARTS AGEnsx-ncp-6b975548cb-jwdxv 0/2 Init:0/1 14 (8m58s ago) 3h51m NSX-NCP pods might be constantly crashing/restarting and be in CrashLoop BackOff state. Verifiable via kubectl, e.g. kubectl describe pod/nsx-ncp-[UNIQUE ID] -n vmware-system-nsx Logs on the supervisor cluster in /var/log/pods/ indicate issues that certificate is not trusted. Logs might be similar to: (Logs can also be checked via kubectl, e.g. kubectl logs pod/nsx-ncp-[UNIQUE-ID] -n vmware-system-nsx) [wcp-migrator MainThread I] nsx_ujo.ncp.vc.session Refreshing token and re-instantiating TESSession[wcp-migrator MainThread I] nsx_ujo.ncp.vc.session VC credentials were not changed[wcp-migrator MainThread I] nsx_ujo.ncp.vc.session Successfully retrieved JWT token: eyJraWQi[...]w1nO[wcp-migrator MainThread W] vmware_nsxlib.v3.utils Finished retry of vmware_nsxlib.v3.cluster.ClusteredAPI._proxy.<locals>._proxy_internal for the 10th time after 31.602 (s) with args: Unknown[wcp-migrator MainThread E] vmware_nsxlib.v3.lib Unable to read maximum tags. Reason: Certificate not trusted ...OR...[wcp-migrator MainThread W] vmware_nsxlib.v3.cluster [7f0bbb44af50] Request failed due to: Certificate not trusted[wcp-migrator MainThread W] vmware_nsxlib.v3.cluster [7f0bbb44af50] Request failed due to an exception that calls for regeneration. Re-generating pool.

Cause

After vCenter certificates were replaced (especially machine certificate and STS certificate), NSX Manager expectedly looses trust with vCenter Server. Due to the certificate changing, NSX Manager cannot differentiate between a expected certificate change or a malicious attempt (e.g. Man-in-the-middle attack) and refuses to further communicate with the vCenter API for security reasons.Manually re-establishing trust with validation of the certificate thumbprint is required to re-establish the trust relationship and connectivity between both components.

Impact / Risks

The reconciliation of TKGS cluster is failing. As this is usually caused due to broken trust relationship between NSX Manager and vCenter Server, it might have a broader impact which involve both components - such as pods creation failing, changes to network policy, etc.

Resolution

For re-establishing trust between NSX Manager and its Compute Manager (vCenter), please follow 'Resolution' outlined in this KB article: https://kb.vmware.com/s/article/90086.When this is performed, NSX-NCP pods on the Supervisor Cluster should re-establish connectivity after some minutes automatically. If not, please involve VMware Support with reference to this KB article.

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Learn More

Search:

...