Friday, May 20, 2011

Cluster Disk Resources Failed To Go Online

Just three hours ago, I received a call complaining the cluster disk resources on one of the cluster node failed to go online.  However, the disk resources were able to go online when moved to another node.

The event log were showing Event ID 1066 (Cluster disk resource Disk : is corrupt. Running ChkDsk /F to repair problems.).  This doesn’t really make sense to me since those disks are working on another node.  Nevertheless, I took the advice shown and did a chkdsk using the method I blogged sometimes ago (Cluster Shared Disk Refused To Start).  Well, the chkdsk also thinks that there is nothing wrong with the disks.

I was really frustrated with this contradicting  symptom.  Then I saw the Cluster Diagnostic Tool  icon on the desktop and ran it.  From the cluster log, I saw errors like this.

00001260.000005dc::2011/05/20-11:49:28.063 ERR  Physical Disk <Disk Q:>: DiskspCheckPath: GetFileAttrs(Q:) returned status of 87.

I did a google search and found the following kb article which saved my pain.

A physical disk resource may not come online on a cluster node in Win2003

Basically the problem is with Symantec Endpoint Protection (SEP) 11.0 Release Update 5 (RU5).  The version number is 11.0.5002.333.  The cluster node that was not able to bring the disk resources online was indeed having this version of SEP.  The other node which is working is having Symantec Endpoint Protection 11.0.6200.754 (RU6 MP2) .  To confirm that SEP is causing the problem, I used the handle.exe utility mentioned in the kb article to verify it and the result confirmed the cause.

So I upgraded the SEP version on the problem node to 11.0.6200.754 (RU6 MP2) and the problem was resolved.

So much for TGIF!  Well, I am happy that the problem did not last through the weekend.

No comments: