Operational Defect Database

BugZero updated this defect 44 days ago.

VMware | 67626

Investigating virtual disk file locks on vSAN

Last update date:

4/5/2024

Affected products:

vSAN

Affected releases:

6.77.0

Fixed releases:

No fixed releases provided.

Description:

Symptoms

File lock issues can cause various problems, including but not limited to the inability to power on VMs or consolidate snapshots. You will need to identify which hosts may be holding locks on a virtual disk file residing on vSAN.

Purpose

On a VMFS datastore, we often check for file locks on the -flat or -delta file virtual disk file. However, these files don't exist on vSAN since it is an object base system. This article details how to check for locks on those virtual disk objects.

Cause

vSAN has a specific object type, vdisk, for virtual disks. They are not stored with the configuration files for the VM in the namespace directory.

Impact / Risks

VMs fail to power on.Snapshots fail to delete or consolidate.VM fails to clone or vMotion.

Resolution

First, check for backup proxy servers in use. If there are then check to see if the affected disk is still mounted to the proxy server. If you find the disk attached to the proxy server then remove the disk from the proxy server ensuring "Delete from disk" is NOT selected.Note: There may be more than one proxy server in use. Make sure to check all proxy servers.vSAN uses .lck filesThe name of the .lck file will have the UUID of the VSAN object it represents as the file name.To check the Descriptor, change the directory into the VM namespace.For example: cd /vmfs/volumes/vsanDatastore/<VM_Namespace>Then run grep RW VMDiskName.vmdkYou'll see output similar to this,# Extent descriptionRW 209715200 VMFS "vsan://e7c66759-680f-e86b-798d-a0369fa131f0"The UUID “e7c66759-680f-e86b-798d-a0369fa131f0” is the vSAN object representing the vdisk for that descriptor.Note: If you get an error with device or resource busy then SSH to the host the VM is registered to and work from that host.The following command will show all .<uuid>.lck files within the vSAN namespace directory :# ls -lah .*.lckYou'll see something similar to this,-rw------- 1 root root 0 Jul 13 2017 .e7c66759-680f-e86b-798d-a0369fa131f0.lckThere may also be non-hidden lock files which you can diagnose similarly by running the following :# ls -lah *.lckRun vmfsfilelockinfo -p .e7c66759-680f-e86b-798d-a0369fa131f0.lck which will show the lock details for this vSAN objectvmfsfilelockinfo Version 2.0Looking for lock owners on ".e7c66759-680f-e86b-798d-a0369fa131f0.lck""<VMname>.vswp.lck" is locked in Exclusive mode by host having mac address ['xx:xx:xx:xx:xx:xx']Trying to make use of Fault Domain Manager----------------------------------------------------------------------Found 6 ESX hosts using Fault Domain Manager.----------------------------------------------------------------------Searching on Host esxi1Searching on Host esxi3Searching on Host esxi4Searching on Host esxi2Searching on Host esxi6Searching on Host esxi5 MAC Address : xx:xx:xx:xx:xx:xxHost owning the lock on file is esxi5, lockMode : ExclusiveTotal time taken : 0.11339905299246311 seconds.If no lock is found it will look like this:vmfsfilelockinfo Version 2.0Looking for lock owners on ".e7c66759-680f-e86b-798d-a0369fa131f0.lck"".e7c66759-680f-e86b-798d-a0369fa131f0.lck" is not locked by any ESX host and is FreeTotal time taken : 0.037906300276517868 seconds.Alternatively, you can also run the command vmkfstools -D against this file, which will show the lock details for this vSAN object as well.Example:# vmkfstools -D .e7c66759-680f-e86b-798d-a0369fa131f0.lckYou should see output similar to this,Lock [type 10c00001 offset 152799232 v 830, hb offset 3969024gen 215, mode 1, owner 5c576ea9-e19f62dc-07eb-a0369fa12052 mtime 1107249num 0 gblnum 0 gblgen 0 gblbrk 0]Addr <4, 354, 1>, gen 3, links 1, type reg, flags 0, uid 0, gid 0, mode 600len 0, nb 0 tbz 0, cow 0, newSinceEpoch 0, zla 4305, bs 8192The part in bold is the MAC address of the management VMkernel port. It should correspond to a host in the vSAN cluster.Note: During the life-cycle of a powered on virtual machine, several of its files transitions between various legitimate lock states. The lock state mode indicates the type of lock that is on the file. The list of lock modes is: mode 0 = no lockmode 1 = is an exclusive lock (vmx file of a powered on virtual machine, the currently used disk (flat or delta), *vswp, and so on.)mode 2 = is a read-only lock (For example on the ..-flat.vmdk of a running virtual machine with snapshots)mode 3 = is a multi-writer lock (For example used for MSCS clusters disks or FT VMs) Once you have the name of the host owning the lock SSH into that host and try restarting the management services hostd & vpxa with the following command /etc/init.d/hostd restart && /etc/init.d/vpxa restartIf the lock is still present then run lsof |grep <vmname> && ps|grep <vmname> For example:[root@esxi4:~] lsof |grep cent7_2 && ps|grep cent7_27565528 vmx FILE 43 /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx.lck7565528 vmx FILE 44 /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx7565528 vmx FILE 45 /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.vmx~7565528 vmx FILE 82 /vmfs/volumes/vsan:52bea6daf62777db-6515bb0268f25523/18db7d62-56b6-8186-64ba-0050560181e8/cent7_2.nvram7565529 0 vmm0:cent7_27565533 0 vmm1:cent7_27565535 7565528 vmx-filtPoll:cent7_27565536 7565528 vmx-mks:cent7_27565537 7565528 vmx-svga:cent7_27565538 7565528 vmx-vcpu-0:cent7_27565540 7565528 vmx-vcpu-1:cent7_2The number in bold is the world process ID we can kill this process by running kill <PID>. Make sure you run this command only from the host or hosts the VM is NOT registered to.Note: If the VM is powered down there should be no open files (lsof) or active processes (ps) for the VM. Additionally, you should only see open files or active processes on the host the VM is registered to when the VM is powered on.If you find no locks with either of the lock commands you can try running lsof |grep <vmname> && ps|grep <vmname> on all hosts in the cluster to see if you find a process on more than one host. If there are running processes then kill the process on any of the hosts that might have a hung process related to the VM.Note: Make sure you're only killing the process on hosts the VM is NOT registered to especially if the VM is powered on.If either vmfsfilelockinfo -p or vmkfstools -D commands finds no locks and lsof |grep <vmname> && ps|grep <vmname> finds no active process for the VM on any host and still getting file lock errors then we are dealing with a phantom lock and a rolling reboot of the cluster is required to clear the lock.

Workaround

In order to check all the VM files and/or vSAN object lock files get the name of the files and/or vSAN object lock files that are locked, also which host is locking the files, run the following commands in the VM directoryfor file in *; do echo ${file}; vmfsfilelockinfo -p ${file} |grep -i mode; doneOutput Example:Test-3f9d789c.hlogTest-ec315dde.vswpTest-ec315dde.vswp.lck"Test-ec315dde.vswp.lck" is locked in Exclusive mode by host having mac address ['00:XX:56:XX:11:XX']Host owning the lock on file is <Hostname>, lockMode : ExclusiveTest.nvram"Test.nvram" is locked in Exclusive mode by host having mac address ['00:XX:56:XX:11:XX']Host owning the lock on file is <Hostname>, lockMode : ExclusiveTest.vmdkTest.vmsdTest.vmxNormally, in the output, we will see the owner host, if you find a different host save the name of that host. To check all .<uuid>.lck files run the below command :for file in .*lck; do echo ${file}; vmfsfilelockinfo -p ${file} |grep -i mode; doneTo check all the files for VMs that have spaces in the name run the below command :for file in *; do echo "${file}"; vmfsfilelockinfo -p "${file}" |grep -i mode; done

Related Information

See the following KBs with respect toRestarting the Management agents in ESXiCommitting snapshots when there are no snapshot entries in the Snapshot ManagerInvestigating virtual machine file locks on ESXi

Additional Resources / Links

Share:

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Learn More

Search:

...