Issue: I have received alert that few VMs are rebooted and few VMs are hung
Findings: Up on checking the Vcenter, found that all these VMs are running on same Host but Esxi Seems to be running fine and also some of the other VMs running fine on the same ESXi.
but when I checked all these VMs events and tasks logs deeply, noticed that when VADP triggered snapshots creation, all these VMs got same erroras below and failed to create snapshots and follow by rebooted
Error: error message from ESXi Reason: 0 ( cannot allocate memory)
up on checking ESXi kernal warning Logs, its shown Heap COW already it's maximum size and cannot expand.
so I checked the current free size of the heap COW % e and it is 4%. (which means it's been utilized 96 %).
How to check current heap COW Size utilization:
solution: in order to resolve this COW Heap memory issue, I restarted management services of ESXi and consolidated all the VM's existing snapshots manually. after that the Heap COW free % back to 95% ( which means now it's been utilizing 5% of Heap COW memory)
and also I have increased the default size of Heap COW in ESXi to maximum. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004424
So what is the relation between Heap COW and multiple snapshots
VMware solution is as below:
If you use snapshots on virtual machines running on an ESX host, each snapshot delta disk is a COW (Copy On Write) disk. For each one in use by running virtual machines, their data structures take up ESX kernel memory. This allocation is known as the COW heap. This memory is used to store cached metadata, pointing to where in a VMDK or in a chain of VMDK files disk data to be accessed resides.
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1003156&sliceId=2&docTypeID=DT_KB_1_1&dialogID=161478702&stateId=1%200%20161488616
moreover In order to prevent this issue in feature. I have configured motoring for consolidation failed issue and scripted to poll every minutes Heap COW memory utilization to data store( which can be even polled to syslog server)
Findings: Up on checking the Vcenter, found that all these VMs are running on same Host but Esxi Seems to be running fine and also some of the other VMs running fine on the same ESXi.
but when I checked all these VMs events and tasks logs deeply, noticed that when VADP triggered snapshots creation, all these VMs got same erroras below and failed to create snapshots and follow by rebooted
Error: error message from ESXi Reason: 0 ( cannot allocate memory)
up on checking ESXi kernal warning Logs, its shown Heap COW already it's maximum size and cannot expand.
so I checked the current free size of the heap COW % e and it is 4%. (which means it's been utilized 96 %).
How to check current heap COW Size utilization:
solution: in order to resolve this COW Heap memory issue, I restarted management services of ESXi and consolidated all the VM's existing snapshots manually. after that the Heap COW free % back to 95% ( which means now it's been utilizing 5% of Heap COW memory)
and also I have increased the default size of Heap COW in ESXi to maximum. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004424
So what is the relation between Heap COW and multiple snapshots
VMware solution is as below:
If you use snapshots on virtual machines running on an ESX host, each snapshot delta disk is a COW (Copy On Write) disk. For each one in use by running virtual machines, their data structures take up ESX kernel memory. This allocation is known as the COW heap. This memory is used to store cached metadata, pointing to where in a VMDK or in a chain of VMDK files disk data to be accessed resides.
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1003156&sliceId=2&docTypeID=DT_KB_1_1&dialogID=161478702&stateId=1%200%20161488616
moreover In order to prevent this issue in feature. I have configured motoring for consolidation failed issue and scripted to poll every minutes Heap COW memory utilization to data store( which can be even polled to syslog server)
Could you share the script you used to monitor heap usage?
ReplyDeleteCould you share the script you used to monitor heap usage?
ReplyDeleteHi Hunter Lemperle,
DeleteI just created basic script under /bin folder. refer to http://myserverissues.blogspot.sg/2015/06/how-to-collect-cow-heap-free-size-and.html
Dhanaraj,
DeleteThank you so much, what a great write up! This was extremely informative!
Hunter