VAAI bug in HPE 3PAR OS exposes risk of data loss

VAAI bug in HPE 3PAR OS exposes risk of data loss

A confirmed bug in HPE 3PAR OS relating to vSphere VAAI integration, introduces a potential risk to data loss when using remote copy groups and peer persistence.

As I understand from my interactions with HPE Global Support, the issue stems from operations leveraging the Hardware Accelerated Data Mover (FS3DM Hardware Accelerated) triggered by tasks offloaded to the 3PAR storage array by vSphere VAAI operations, most notably Storage vMotion which leverages xcopy at the array level.

The potential risk comes into play when you have VMs running on datastores backed by 3PAR LUNs using remote copy groups with peer persistence, essentially if the copy group is interrupted and enters failsafe state stopping the RCG sync when an xcopy operation is triggered, the OS will not check which of the two LUNs in the group are active before initiating the xcopy.

What this means in real terms is, if a Storage vMotion is initiated on a VM residing on a peer-persistent LUN that is not currently in sync, the 3PAR OS could svMotion the VM disks from the inactive LUN.  To confound this, once the svMotion task is reported back to vSphere as complete then the VM disks on the active LUN would be erased, as this operation will be initiated by vSphere which only has access to the active LUN at the time.

In the above instance, the net result would be that your running VM will be replaced by an outdated copy of the VM disks from the time when the RCG sync was stopped, which is particularly worrying in environments leveraging Storage DRS as you can well imagine.

HPE confirm that this bug is fixed in the latest 3PAR OS release 3.2.1 MU5 which should be available now, and my understanding is that all prior versions of the 3PAR OS are affected and therefore at risk.

To negate the risk prior to the 3PAR OS upgrade I recommend you disable the FS3DM Hardware Accelerated Data Mover by running the following command on all of your ESXi hosts:

esxcli system settings advanced set –int-value 0 –option /DataMover/HardwareAcceleratedMove

This change can be applied dynamically and does not require a reboot of the ESXi host to take effect.
For additional information regarding the disabling of the data mover (including steps how to perform this via the GUI) see KB1033665

Confirmation of the fixes are listed in the OS modifications in the 3PAR OS 3.2.1 MU5 Release Notes against Issue IDs 150221 and 161945.
In addition to these xcopy-related fixes there is also a modification listed in the notes against ID 149344 to resolve ‘unexpected array restarts’ during xcopy operations.

To get the updated release head to www.hpe.com/support/softwaredepot  or contact your support provider or HPE Global Support directly.

 

vM

Related posts