Operational Defect Database

BugZero updated this defect 33 days ago.

VMware | 75256

Hosts show jump in used capacity on reboot due to recalculation with thick (OSR=100) objects on vSAN with deduplication

Last update date:

4/16/2024

Affected products:

vSAN

Affected releases:

6.x7.08.0

Fixed releases:

No fixed releases provided.

Description:

Symptoms

VMware is aware of a scenario where after a host reboot, vSAN will initially show a significantly higher amount of used space on a deduplicated disk group when using “thick” objects (i.e. a policy with Object Space Reservation / OSR = 100%). This issue occurs only when the reserved capacity on a cluster is close to or greater than the amount of free space in the cluster. This is specific to how vSAN evaluates space consumed by deduplicated objects with space reservations enabled (OSR > 0%).The impact is only significant/noticeable when the majority of objects on the vSAN datastore have a 100% space reservation set.

Impact / Risks

If the jump in capacity consumption is sufficiently large, rebalance activity will begin resulting in a large amount of resync traffic.If any disk group reports 100% full after reboot, congestion will be raised to throttle incoming IO. This can result in host management agents also being impacted leading to hosts disconnecting from vCenter Server, or host isolation from the vSAN cluster.After reboot, it will take approximately 6-12 hours for the used capacity value to return to pre-reboot levels.

Resolution

Behavior is improved in vSAN 7.0 and higher, however a fix will be released in a later version of 8.0 to be determined.

Workaround

Using thick disks is recommended for databases when using VMFS or NFS technologies, however there is no performance benefit or “first-write” penalty in vSAN when using thin objects. Although current documentation states that databases require “thick” provisioned objects in vSphere, VMware is working to change this guidance in future vSAN releases. The following options can be used to entirely avoid the scenario presented in this KB article: Reconfigure objects with an OSR of 0% (thin provisioning).Disable deduplication (this is not feasible in all scenarios depending on total consumption).Move sufficient VMs off the vSAN cluster to reduce utilization below the threshold where the issue can occur. Note: When disabling deduplication and/or compression, vSAN changes the disk format on each disk group of the cluster. This is done by evacuating data from the disk group, removing the disk group, and recreating it with a format that does not support deduplication and compression. The time required for this operation depends on the number of hosts in the cluster and amount of data. The progress can be monitored in the Tasks and Events tab in the vSphere client.

Additional Resources / Links

Original Vendor Announcement

BugZero® Risk Score

What's this?

Coming soon

Status

Unavailable

Search:

...