Sunday, November 06, 2011

Problem With Virtual Machine Backup–Part 1 of 2

I was trying to backup four virtual machines in using Symantec Backup Exec 2010 R3 with Agent for VMware Virtual Infrastructure but encountered some issues doing so.  The four virtual machines were hosted in vSphere ESXi 4.1 with update 1.

I setup the backup job for the four virtual machines and ran it.  When I checked on the backup the next day, it was still trying to take a snapshot after 16 hours.  I logon to the vCenter and it showed creating of snapshot was in progress for the first virtual machine.  The problem was that it is not moving and just stuck at 95%!

I had no choice but to cancelled the backup since it was going to be stuck at 95% forever.  After the backup had been cancelled, I started googling for issues related to virtual machine backup and there are tons of information out there.  Specifically for Backup Exec, it was mentioned that VMware Snapshot provider should not be installed and BE VSS Provider should be used instead.  Well, that was what I have.  It was also mentioned that for ESXi 4.0 and later, the FREEZE.BAT in the C:\Program Files\VMware\VMware Tools\backupScripts.d folder will be called before and after the snapshot process.  I ran the FREEZE.BAT on the virtual machine and it failed to stop the Volume Shadow Copy service.  The Volume Shadow Copy service was set to manual start so I guessed the backup must have started it but it went “haywire”.  I rebooted the virtual machine and ran FREEZE.BAT successfully.

However, the problem was still far from being resolved.  I ran the backup job again and it failed within a minute with the following error.

Job ended: Tuesday, November 01, 2011 at 9:12:18 AM
Completed status: Failed
Final error: 0xe0009574 - Unable to create a snapshot of the virtual machine. The virtual machine may be too busy to quiesce to take the snapshot.
Final error category: Resource Errors

For additional information regarding this error refer to link V-79-57344-38260

The vCenter showed the following event.

image

“Create virtual machine snapshot”  “Another task is already in progress.”

I suspected that the previous snapshot was still in “progress” and will not end.  I followed the article  Collecting information about tasks in VMware ESX and ESXi to confirm it.

I logon to the ESXi host console, and enabled Local Tech Support.  Press Alt+F1 to switch to the console window.

Ran the command vim-cmd vimsvc/task_list and got an output similar to the following.

(ManagedObjectReference) [
   'vim.Task:haTask-162-vim.VirtualMachine.createSnapshot-3887',

Ran the command vim-cmd vmsvc/getallvms to confirm that the snapshot belonged to the virtual machine having the issue.

The solution is to restart the management agents.

Restart Management Agents

I restarted the management agents without impacting the running virtual machines.  After restarting, I had to reconnect the ESXi host from the vCenter.

I ran the backup job again and the it successfully took a snapshot of the first virtual machine.  The backup for the first virtual machine completed successful and just when I was about to celebrate, it failed for the second and third virtual machines.  The fourth virtual machine completed successfully as well.

The issues for the second and third virtual machines were different for the first virtual machine so I will cover it in part 2.

No comments: