Operational Defect Database

BugZero updated this defect 55 days ago.

VMware | 96890

Remediate NSX Application Platform Health Alarm Due to Service Status Being Down

Last update date:

3/25/2024

Affected products:

NSX

Affected releases:

4.1

Fixed releases:

No fixed releases provided.

Description:

Symptoms

The NSX Application Platform Health alarm triggers when any pod enters a crashloopback state.The alarm message indicates "service status down" without specifying which specific pod is affected or providing actionable insights.The intelligence of the system is degraded as a result of the alarm.

Impact / Risks

The lack of detailed information in the alarm message hinders troubleshooting and impacts the overall intelligence of the system. This KB article will make debuggablity better.

Resolution

This issue is addressed in version 4.2.0 of the NSX Application Platform.

Workaround

To mitigate the issue, follow these steps: Identify pods not in the RUNNING or SUCCEEDED (completed) state: napp-k get pods --field-selector status.phase!=Running,status.phase!=Succeeded,metadata.namespace!=kube-system,metadata.namespace!=vmware-system-csi,metadata.namespace!=vmware-system-auth,metadata.namespace!=vmware-system-cloud-provider --all-namespaces Delete the affected pod: napp-k delete pod <pod-name> -n namespaceIf the issue persists, perform a rollout restart of the deployment : napp-k rollout restart <statefulsetdeployment> <service_name> -n namespace Implementing this workaround should help in resolving the NSX Application Platform Health alarm triggered by service status being down until the system is updated to version 4.2.0.

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Search:

...