Thursday 21 February 2008

Datastore Design Considerations

Been a bit of a slow month, although hopefully it'll be a strong finish as I'll be at VMworld Europe next week :-). .

The inspiration for this post is the flurry of marketing BS going on in the storage space currently - mostly due to VDI. From most cases I have seen / heard about, the storage cost ranges from between 1/4 and 1/2 of the total cost of a virtual desktop. And it's more often than not a recurring charge.

So any business unit worth their salt will soon question just why they are paying this recurring charge that over the course of 6 months probably comes close to the once of cost they used to pay for a physical desktop that they kept for 3 years. Which inevitably makes it's way back to you to re evaluate your current storage design, and inevitably ends up with storage vendor marketing hyperbole flooding the internet.

'Thin Provisioning' and 'de-duplication / single instancing' are the buzzwords of the day, but what implications do these various technologies have on your datastore design? The question needs more careful consideration than your CIO who has just watched this or this may care for.

Lets take the first video from incumbent FC champion, EMC. Aside from the fact that it's all whiteboard, the presenter either doesn't know the details of what he is talking about, very craftily steps around an important technical detail, or knows something that none of the storage engineers I work with know. Since the ppl I work with are a pretty damn smart bunch, I suspect it's not the latter.

The important technical detail to note is that he doesn't at any stage actually say he is referring to a VMFS LUN. Why is that important? Because of course, SAN based snapshots are LUN based. So when he takes a 'master' LUN, full copies it 4 times for better performance, and the proceeds to snap off 1000 VM's from those 4 LUNs attached to the one ESX cluster... wait a minute, what was the LUN limit of an ESX host again?

Which leads to the inevitable conclusion that either he must be using an NFS datastore or VMware have finally rolled over to their parent company and EMC can now understand VMFS like no other storage vendor can. Queue the second video from the kings of NAS, NetApp.

Sure the video is a little far fetched (there is obviously no way you could actually run all those VM's using no more than a 10GB of disk), but it illustrates the power of using NFS as a datastore - file based clones are possible, and if you're gonna do NFS datastores then you could certainly do a lot worse than use NetApp. And before you go throwing VMware NFS vs iSCSI vs FC performance papers at me, know that NONE of their papers are based on NFS presented via a NetApp box.

The flipside to NFS via a NetApp however, is that thin provisioning goes out the Window because NetApp's WAFL filesystem touches every block of the LUN it's presented when it gets it (hey, there's a reason why WAFL stands for 'Write Anywhere File Layout'). VMFS doesn't however, thus allowing for thin provisioned LUNs to work as expected although you may need to create .vmdk's as thin too, which is not without caveats. I say may because 3PAR can actually handle the default zero'ed thick .vmdk format so if you're using them you don't need to change anything on the VMware side.

Which brings us to de-duplication / single instancing. This can't be handled by the underlying storage array without some kind of intermediary thing... like another filesystem or OS... to scan for duplicate blocks and then manage what de-duped blocks belong to what higher up the chain, and then manage what to do if something higher up needs to write to a previously de-duped block etc etc. The whole point of disk subsystems in enterprise storage arrays is to do precisely none of that - it should be done at a higher level, ideally the native filesystem level. VMFS can't do this (yet), but Vmware's SVI will go some way to removing the need for de-dup in the first place. NetApp's sis-clone feature has the same effect, will be interesting to see a performance comparison of the 2 when they go GA.

Funny thing is, no one seems to be talking much about the performance implications of stacking all these clones and snaps on top of each other. When 250 VM's do something similutaneously that requires the IO to make it all the way down to the physical device (virus scan, anyone?), hows it gonna act? Better hope you got a ton of cache and nothing else is using it.

Finally, NFS may lead to an easier migration path because any worthwhile NAS box can present NFS and CIFS. So if you have a Citrix XenServer or *cough* Microsoft Hyper-V *cough* *cough* migration on the cards in the future and by that time all the vendors virtual hard disk formats interoperate, then you'll save yourself a lot of pain on the storage side.

So what to do, what to do... as always, it depends on your environment. But I think more than ever, the old paradigm of 'FC is king' needs serious reconsideration, and VMware need to spend some time on VMFS rather than developing half arsed patching solutions (ouch - more on Update Manager during or after VMworld!).