Monday, October 5, 2015

ETA 207784: VNX: Storage Processors may restart if VMware vStorage APIs for Array Integration (VAAI) is enabled resulting in potential data unavailable



Based on the EMC Advisory, there’s severity high critical issue (Potential Data Unavailability) for the VNX systems running Operating Environment for Block 05.32.000.5.219 and earlier & 05.33.006.5.102 and earlier / Current Version - 05.33.000.2.081

Data may intermittently be unavailable due to either a dual Storage Processor (SP) restart or a Non-Disruptive Upgrade (NDU) which does not complete due to a single or dual Storage Processor restart.

ISSUE
A restart occurred due to bug check [05900000 FF_ASSERT_PANIC]
Event message that is seen in the System Event logs:
A 04/15/15 10:54:36 DGSSP 76008106 The Storage Processor rebooted unexpectedly @ 10:38:27 on 04/15/2015: BugCheck 0,
{0000000000000000, 0000000005900000, 000000000000010e, 0000000000000000}, Failing Instruction: 0xfffff88195e4733b in disktarg.sys loaded @ 0xfffff88195e3e000 76008106 [FF_ASSERT_PANIC]

Environment 
EMC Hardware: VNX5100
EMC Hardware: VNX5300
EMC Hardware: VNX5500
EMC Hardware: VNX5700
EMC Hardware: VNX7500
EMC Hardware: VNX5200
EMC Hardware: VNX5400
EMC Hardware: VNX5600
EMC Hardware: VNX5800
EMC Hardware: VNX7600
EMC Hardware: VNX8000

CAUSE
The CAS command is used in the Hardware Assisted Locking Mechanism (VAAI). Hardware Assisted Locking is the elimination of this LUN-level locking based on SCSI reservations. Initially, the ESX server reads the lock.
If the lock is free, the server sends a CAS command with the lock data that the server wants to place into the lock and the original free contents of the lock. The storage array reads the lock again and compares the current data
in the lock to the Compare And Write command. If the information matches, the new data is written to the lock. This process is treated as a single atomic operation and is applied at the block level (not the LUN level) allowing parallel
Virtual Machine File System (VMFS) updates possible. Atomic Test and Set (ATS) is the VMware implementation of the SCSI CAS command.

Workaround

Disable VAAI CAS commands on the ESX Server. This command is used by VMware hosts vSphere 4.1 and later:

VMware usage of Compare and Swap is disabled with the following command line:

esxcfg-advcfg -s 0 /VMFS3/HardwareAcceleratedLocking

To verify the setting of this variable:

esxcfg-advcfg -g /VMFS3/HardwareAcceleratedLocking

for ESXi 5.0 and above, please refer to the below KB

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1033665

0 = disabled, using old style SCSI reservations for locking; 1 = enabled and using CAS for locking.

This command takes effect immediately. It is non-disruptive. No host restart is required.

This command must be done on all ESX servers that are connected to the array to stop CAS commands being sent to the array

No comments:

Post a Comment