BugZero updated this defect 55 days ago.
Data sources
All data on this page is proprietary to BugZero® or gathered from public sources
3/25/2024
NSX
4.1
No fixed releases provided.
The NSX Application Platform Health alarm triggers when any pod enters a crashloopback state.The alarm message indicates "service status down" without specifying which specific pod is affected or providing actionable insights.The intelligence of the system is degraded as a result of the alarm.
The lack of detailed information in the alarm message hinders troubleshooting and impacts the overall intelligence of the system. This KB article will make debuggablity better.
This issue is addressed in version 4.2.0 of the NSX Application Platform.
To mitigate the issue, follow these steps: Identify pods not in the RUNNING or SUCCEEDED (completed) state: napp-k get pods --field-selector status.phase!=Running,status.phase!=Succeeded,metadata.namespace!=kube-system,metadata.namespace!=vmware-system-csi,metadata.namespace!=vmware-system-auth,metadata.namespace!=vmware-system-cloud-provider --all-namespaces Delete the affected pod: napp-k delete pod <pod-name> -n namespaceIf the issue persists, perform a rollout restart of the deployment : napp-k rollout restart <statefulsetdeployment> <service_name> -n namespace Implementing this workaround should help in resolving the NSX Application Platform Health alarm triggered by service status being down until the system is updated to version 4.2.0.