Thursday, January 06, 2011

Refreshing File Server

Our old file server running on Windows Server 2003 R2  (32 bit) clustered with the print server using Microsoft Clustering is hitting its maximum capacity and occasionally will throw “tantrum”.  The worst case was the disk resources for the file server refused to go online because corrupted file system was detected.  We spent almost 2 days running chkdsk to fix the issue and that was 2 days of downtime.

So now, we need to think how we want to setup the our new file server with the hardware refresh.  Of course the most straightforward way is to replace the old hardware with new and more powerful hardware and retains the current clustering setup.  However, we didn’t really favour clustering because it really make troubleshooting more tedious.  What other options do we have then.  Well these are some other options we are considering, keeping in mind that we still need redundancy and improve the file server performance.

First of all, we definitely will want to move from Windows Server 2003 R2 32 bit to Windows Server 2008 R2 64 bit.  Obviously, there are benefits such as larger memory support and improvement of file server features in Windows Server 2008.  There are some benefits that we will not enjoy at the moment such as the SMB 2.0 and improvement in DFS because most of our clients are on Windows XP and we are still in Windows 2003 domain.  One of the features that really interest me is the NTFS Self-Healing.  Basically, Windows Server 2008 will help to detect and repair corrupted files.

http://blogs.technet.com/b/doxley/archive/2008/10/29/self-healing-ntfs.aspx
http://blogs.technet.com/b/extreme/archive/2008/02/19/windows-server-2008-self-healing-ntfs.aspx

We wanted to remove clustering but still hope to achieve the same or even better redundancy.  So we start to consider virtualization using VMware where we can use the DRS and Fault-Tolerance features to achieve some level of redundancy.  Of course, the whole virtualization thingy is not only for the file server; we are doing server consolidation to meet the demand of increasing servers.  One of the worry most people (including us) have is the performance.  With our prior experience with virtualization, we don’t really see any performance impact after virtualizing our servers.  Another question or decision we need to make is should be put the data (end-users’s files in this case) in virtual disk (vmdk) or use Raw Disk Mapping (RDM).  There is again debate on performance and scalability between the two.  In term of performance, VMware claimed that there is no performance difference between the two and only use RDM for the following reasons.

Migrating an existing application from a physical environment to virtualization.
Using Microsoft Cluster Services (MSCS) for clustering in a virtual environment.

Not everyone buy into their claim and the debate will continue.

As for scalability, it is revolving around the 2TB limit.  The default block size for vmdk is 1MB which only allow a virtual disk of size 256GB to be created.  To have bigger virtual disk, the default block size needs to be changed.  The table below shows the available block size and its supported disk size.

Block Size Virtual Disk Size
1MB 256GB
2MB 512GB
4MB 1024GB
8MB 2048GB

If there is a need to have more than 2TB using vmdk, the workaround is by leveraging the maximum extents per volume (which is 32).  This will allow you to have a 64TB volume.  However, my experience with extent is really bad so I don’t really like the idea of extent.

As for RDM, the maximum LUN size is 2TB.  In fact, the largest LUN size that I can create on my current EMC SAN is also 2TB.  So, to have more than 2TB if we are using RDM, we will need to rely on Windows’s Disk Management to span across multiple volumes.  Here are some resources for these debates.

 http://communities.vmware.com/thread/105383
http://communities.vmware.com/thread/215129;jsessionid=5576308061ECC6C2A42E05BC2C743096?tstart=0
http://communities.vmware.com/message/661499
http://communities.vmware.com/message/997158
http://communities.vmware.com/thread/201842
http://communities.vmware.com/thread/192713
http://www.vmware.com/pdf/vmfs-best-practices-wp.pdf
http://communities.vmware.com/thread/161330
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=3371739
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1012683

Now, back to the redundancy and performance consideration again.  True that VMware HA does provide a certain level of redundancy but it does not provide redundancy against corruption of Windows OS itself which clustering does.  If the Windows OS is corrupted, the file server will be down because there is only 1 instance of the OS.  So after discussion with my colleague, we want to consider Distributed File System (DFS).  This is something we are not using now because DFS is not supported on clustering setup.  Using DFS, we can make use of multiple Namespace Servers, multiple servers hosting the data of the target servers and Distributed File System Replication (DFSR) to replicate the data to achieve the following.

Redundancy for Windows OS.
Redundancy for the data.
Split the file share across multiple servers to improve performance.
Distribute the risk of file corruption and reduce the likelihood of whole file server down.
Replicated data can be used for DR purpose.

Resource on setting up DFS: http://technet.microsoft.com/en-us/library/cc753479(WS.10).aspx

We need many servers to achieve that so virtualization is handy here.  As for licensing of the Windows Server, one option is to get the Datacenter Edition to have any number of virtual OS.

http://www.microsoft.com/licensing/about-licensing/virtualization.aspx

Okay, there is one more option that our vendor will be presenting to us which is EMC NAS Head.  Shall listen to what they have to say.

No comments: