Operational Defect Database

BugZero updated this defect 55 days ago.

VMware | 85678

NSX-T Bare Metal Edge Node CPU core 0 or 1 dropping traffic

Last update date:

3/25/2024

Affected products:

NSX-T

Affected releases:

No affected releases provided.

Fixed releases:

No fixed releases provided.

Description:

Symptoms

The NSX-T Bare Metal Edge node is on version 3.1.x or lower.On the NSX-T Bare Metal Edge Node, when you run the admin CLI command: get datapath cpu stats shows high CPU utilization on core 0 or 1.In the NSX-T Bare Metal Edge Node logs var/run/vmware/edge/cpu_usage.json we see the following: "highest_cpu_core_usage_dpdk": 77.88, "dpdk_cpu_per_core": { "0": 77.88, > This is high cpu core usage "1": 0.03, "2": 0.01, "3": 0.04, "4": 0.05, "5": 0.01, "6": 0.01, "7": 0.01, "8": 0.01, "9": 0.02, "10": 0.01, "11": 0.02 }, Packet drops are seen for traffic processed by CPU core 0 or 1 while other CPU cores handle a similar amount of traffic without any drop.Other services running on this affected CPU core (above core 0) may also be impacted, for example BGP or LACP traffic.

Cause

There is a kni_single kernel thread, in charge of communication between the userspace and kernel space.This runs on a datapath fastpath core (i.e. on core 0 or 1), if some workload network traffic gets hashed to CPU core 0 or 1, it may get dropped due to the high CPU utilization.

Resolution

This is a known issue affecting NSX-T Data Centre.

Workaround

The workaround is to move the kni_single kernel thread to a non datapath CPU core.SSH as root to the NSX-T Bare Metal Edge Node.1. List the available CPU cores (to be ran as root user):root@Edge-1:~# lscpuArchitecture: x86_64CPU op-mode(s): 32-bit, 64-bitByte Order: Little EndianCPU(s): 8On-line CPU(s) list: 0-7 <<<<<<Thread(s) per core: 1Core(s) per socket: 1Socket(s): 8NUMA node(s): 1Vendor ID: GenuineIntelCPU family: 6Model: 85Model name: Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHzStepping: 4CPU MHz: 1995.312BogoMIPS: 3990.62Virtualization: VT-xHypervisor vendor: VMwareVirtualization type: fullL1d cache: 32KL1i cache: 32KL2 cache: 1024KL3 cache: 28160KNUMA node0 CPU(s): 0-72. Check which of the CPU cores are used by the dataplane, log in as admin user:Edge-1> get dataplaneAccept_ra : FalseBfd_ring_size : 512Bitw_mode : FalseCorelist : 0,1,2,3,4,5 <<<<< cores used for dataplaneIn this example cores 6 and 7 are not used for the dataplane.3. Get the PID of the kni_single kernel thread, login as root:root@edge02:~# ps -aux|grep -i [k]ni_singleroot 7128 4.1 0.0 0 0 ? S Aug12 243:18 [kni_single]4. Use the "taskset" command to list the current CPU affinity for the kni_single process:root@edge02:~# taskset -pc 71287128's current affinity list: 0-3Note: The above PID 7128 will be different.5. Use the taskset command to set the cpu affinity of kni_single to non datapath cores: root@edge02:~# taskset -pc 6-7 71287128's current affinity list: 0-37128's new affinity list: 6,7Note: The above PID 7128 will be different and the core numbers may be different.6. Verify that the change was made:root@edge02:~# taskset -pc 71287128's current affinity list: 6,7Note:The above command outputs and values are only examples, these may vary depending on your environment.This workaround will not persist across reboots.If you have issues with this workaround, please contact VMware Support and note this Article ID (85678) in the problem description. For more information, see How to Submit a Support Request .

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Learn More

Search:

...