Operational Defect Database

BugZero updated this defect 34 days ago.

VMware | 91714

Virtual machines might become unresponsive due to a rare deadlock issue in a VMFS6 volume

Last update date:

4/15/2024

Affected products:

vSphere ESXi

vSphere

Affected releases:

7.x7.0

Fixed releases:

No fixed releases provided.

Description:

Symptoms

VM(s) randomly become unresponsive when they are using thin VMDK files on VMFS6The /var/log/vmkernel.log is flooded with resetting handle messages that go on indefinitely: 2023-04-05T05:01:26.653Z cpu57:8916482)VSCSI: 2973: handle 38295998585421404(GID:48732)(vscsi0:0):Added handle (refCnt = 3) to vscsiResetHandleList vscsiResetHandleCount = 12023-04-05T05:01:26.653Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927072023-04-05T05:01:26.653Z cpu14:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 0/0] from (vmm0:SQLVM1)2023-04-05T05:01:27.157Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:01:27.659Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:01:28.161Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:01:28.663Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:01:29.165Z cpu14:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:01:29.655Z cpu57:8916482)WARNING: VSCSI: 3967: handle 38295998585421404(GID:48732)(vscsi0:0):WaitForCIF: Issuing reset; number of CIF:42023-04-05T05:01:29.655Z cpu57:8916482)WARNING: VSCSI: 2986: handle 38295998585421404(GID:48732)(vscsi0:0):Ignoring double reset<snip>2023-04-05T05:08:56.864Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:57.367Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:57.840Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:57.840Z cpu3:2097732)VSCSI: 3335: handle 38295998585421404(GID:48732)(vscsi0:0):Reset [Retries: 15/0] from (vmm0:SQLVM1)2023-04-05T05:08:58.343Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:58.845Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:59.347Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 13811927062023-04-05T05:08:59.847Z cpu3:2097732)VSCSI: 3226: handle 38295998585421404(GID:48732)(vscsi0:0):processing reset for handle ... state 1381192706

Cause

In rare cases, if a write I/O request runs in parallel with an unmap operation triggered by the guest OS on a thin-provisioned VM, a deadlock might occur in a VMFS6 volume. As a result, the virtual machine may become unresponsive.

Resolution

This is a known issue that is resolved in ESXi 7.0 U3f. Please see the release notes: https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u3f-release-notes.html

Workaround

To workaround this issue, the thin disks for a VM can be inflated/converted to thick. This will prevent the issuance of UNMAP commands from the GuestOS level and thus there would be no race condition between write I/Os and UNMAP operations.

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Search:

...