vinternals: Philosophy

Showing posts with label Philosophy. Show all posts

Saturday, 1 January 2011

Not Just Another Fucking 2011 Prediction Post

In fact, it's not a prediction post at all - it's just a list of things I would like to see happen or at least start happening this year. I'm not making any guesses as to whether they will or not, you can decide for yourselves what the likelihood of each is.

Microsoft Azure Infrastructure Deployment Video

I've been a bit quiet of late, mainly due to the digestion of Microsoft's PDC 2008 content (what a great idea to provide the content for free after the conference - VMware could take a leaf out of that book). I meant to post this up yesterday as another SAAAP installment but missed the deadline... but what the hell, I'll tag this post anyway!

One of the more infrastructure oriented PDC sessions was "Under the Hood - Inside the Windows Azure Hosting Environment". Skip forward to around 43 minutes into the presentation, where Chuck Lenzmeier goes into the deployment model used within the Azure cloud (you can stop watching at around 48 minutes).

Conceptually, this is _exactly_ the deployment model I and ppl like Lance Berc envisage for ESXi. Rather than put that base VHD onto local USB devices ala ESXi, Microsoft PXE boot a Windows PE "maintenance os", drop a common base image onto the endpoint, dynamically build a personality (offline) as a differencing disk (ie linked clone), drop that down to the endpoint, and then boot straight off the differencing disk VHD (booting directly off VHD's is a _very_ cool feature of Win7 / Server 2008 R2). I'm glad even Microsoft recognise the massive benefits of this approach - no installation, rapid rollback, etc.

Now ESXi of course has one *massive* advantage over Windows in this scenario - it is weighs in at around the same size as a typical Windows PE build, much smaller than a Hyper-V enabled Server Core build.

And if only VMware would support PXE booting ESXi, you could couple it with Lance's birth client and midwife or our statelesx, and you don't even need the 'maintenance OS'. You get an environment that is almost identical conceptually, but can be deployed much much faster due to the ~150MB or so total size of ESXi (including Windows, Linux and Solaris vmtools iso's and customised configuration to enable kernel mode options like swap or LVM.EnableResignature within ESXi that would otherwise require a reboot (chicken, meet egg :-)) versus near a GB of Windows PE and Hyper-V with Server Core. Of course I don't even need to go into the superiority of ESXi over Hyper-V ;-)

With Windows Server 2008 R2 earmarked for a 2010 release, it will be sometime before the sweet deployment scenario outlined in that video is available to Microsoft customers (gotta love how Microsoft get the good stuff to themselves first - jeez, what a convenient way to ensure your cloud service is much more efficient than anything your customer could build with your own technology). But by that time they will have a helluva set of best practice deployment knowledge for liquid infrastructure like this, and you can bet aspects of their "fabric controller" will find their way into the System Center management suite.

The liquid compute deployment gauntlet has been thrown VMware, time to step up to the plate.

Sunday, 27 April 2008

SAAAP!? VDI - Why is Storage Such an Issue?

A rather contentious title for a post, but there are a few things on the storage front that have been bugging me lately. The body of this post will probably have very little to do with storage, just bear with me on this little derailed train of thought.

The most common thing we see around the interwebs lately on the storage front all seem to be targeted at this issue of the storage cost with VDI. Single instancing, de-duplication, linked cloning, disk streaming... you all know what I'm talking about. But is this storage issue best solved by the storage vendors?

Maintaining state on the endpoints is the reason why we need so much storage for VDI. Even with some jedi mind trickery on the storage side to reduce duplicate OS files, there's still the applications. Which could be addressed to a similar degree with the same tricks and some kind of de-duplication, but I honestly don't think the technology is there yet. Not when Microsoft is releasing security patches on a monthly basis. The de-dup functionality from the likes of NetApp is way cool no doubt, but it's a post-processing operation. And while it may be pure philosophy on my part, I just can't see that approach scaling too well - prevention is better than cure, after all. And while de-dup via post processing is certainly a more viable option than what is currently available in a pre-processing engine, all I can say is keep an eye on that space.

Another kind of de-dup I'm sure we'll see bandied about more is Citrix Provisioning Server, the rebranded Ardence product. But again, while very, very cool, it's not very practical currently due to the non-persistence of the cache. One reboot and you're back to square 1 - which is great for a grid compute farm, but not so great for VDI.

Application virtualisation obviously goes a long way towards solving this problem. VMware knows this, otherwise the Thinstall acquisition wouldn't make much sense. With app virtualisation and presentation virtualisation and we're finally starting to address the storage problem. But a major piece is still missing - environment virtualisation.

Environment virtualisation is something that people in the presentation virtualisation (read:Citrix folk) space are very familiar with. I'm not talking about simple profile redirection or mandatory profiles - I'm talking about real environment virtualisation. The kind of thing that allows you to logon to a desktop, fire up Excel, then logon to a VDI machine on the other side of the world and open Excel on that, then fire up another Excel session from a presentation server in New York, and then another one from a presentation server in Sydney, and have all your application preferences individually delivered and saved accordingly. The kind of technology that streams your profile to wherever it needs to go. Mark it for replication around the globe or don't - the choice is yours.

Which leads me inexorably back to the title of this post. Why is storage such an issue for VDI? If the endpoint was stateless, would this issue still exist? In such a world, could we use, dare I say it, local storage on a large scale? Does this have greater implications - with a fully virtualised (machine, app, and environment) stack, could Microsoft finally have a leg to stand on with the "Quick Migration vs VMotion" debate? Comments are open - what do you think?

NOTE: I changed the title of this post after finishing it and realising the original title was just plain wrong. So apologies if you came here via an RSS feed!

Sunday, 13 April 2008

VirtualCenter 2.5 Update 1 - Upgrade Process Still Sucks!

Well what can I say... the old days, the bad days, the all-or-nothing days - they're back!

Once again, the VirtualCenter upgrade process is the equivalent of a full uninstall and reinstall. Someone better let the developers of the VMware management tools suite (or at least whoever writes the installers) that VMware are supposed to be an enterprise software company.

I have a video of the process, but until I find that elusive video host that allows resolutions of 800 x 600, I can't post it up. But the usual caveats apply... here are the main ones:

1. The VirtualCenter service is removed and reinstalled to run under the context of LocalSystem. So for all you ppl who have enterprise class deployments with the database on another host and vpxd running as a domain account in order to use Windows Authentication to the database, you need to reset the service account credentials. I would strongly advise doing this after the VC installer has run but BEFORE proceeding with the database upgrade wizard.

2. The VirtualCenter database user must have sysadmin rights on the entire database server in order for the upgrade the work. Why this is the case is anyone's guess - a clean install only requires 'dbo' rights on the VC database itself. But that isn't enough for the upgrade.

3. The bug with ADDLOCAL=VPX is still present if you're doing a clean install. I can't remember if I logged a bug report for that already, but looks like I need to raise another one. We don't use WebAccess in my organisation, and given the choice I'd rather not have to install Tomcat on the VC box - it's just one more bit of software that will require security patches. Might need to log a feature request for an IIS based web console.

The support matrix document still isn't updated to reflect whether managing ESX 3.5 U1 with VirtualCenter 2.5 GA is a supported configuration. It will be a small consolation if it is, and an absolute disappointment if it is not.

For anyone running VC 2.5 already, it's not worth upgrading IMHO. The release notes really don't provide any compelling reason to upgrade unless you're running a lot of ESXi and want non-experimental HA (which suggests that the HA code has been moved into the VC agent). For anyone doing fresh installs or upgrading from 2.0.x it would be worth going straight to Update 1 though.

Tuesday, 8 April 2008

VirtualCenter 2.5 & ESX / ESXi 3.5 Update 1 imminent - what caveats will there be?

I must admit, I'm a bit sceptical that VMware will really deliver on it's new naming convention. My understanding is that they want to move away from dot-dot releases and go to 'updates' in order to imply a common code base or something, kind of like Microsoft are doing with their 'no new functionality in service packs' mantra. But like Windows Service Packs, I have no doubt 'updates' will still have all the hardware vendor re-certification requirements of dot releases. Here's hoping they can at least do away with forking patches based on 'update' level... otherwise, what is the bl__dy point?

Dot-dot releases have long been called 'maintenance' releases by VMware, but somehow they are considered significant enough to have serious interop requirements. For example, managing ESX 3.0.2 hosts with VC 2.0.1 is an unsupported configuration. If updating VirtualCenter to a new 'maintenance release' level wasn't tantamount to a full uninstall and reinstall of VC (kissing goodbye to your database in the process if you're not careful), it wouldn't be such a big deal... but we all know that is not the case! I really hope the same upgrade pain and support requirements don't carry through with the new 'update' nomenclature, guess we'll know in a couple of days ;-)

Thursday, 21 February 2008

Datastore Design Considerations

Been a bit of a slow month, although hopefully it'll be a strong finish as I'll be at VMworld Europe next week :-). .

The inspiration for this post is the flurry of marketing BS going on in the storage space currently - mostly due to VDI. From most cases I have seen / heard about, the storage cost ranges from between 1/4 and 1/2 of the total cost of a virtual desktop. And it's more often than not a recurring charge.

So any business unit worth their salt will soon question just why they are paying this recurring charge that over the course of 6 months probably comes close to the once of cost they used to pay for a physical desktop that they kept for 3 years. Which inevitably makes it's way back to you to re evaluate your current storage design, and inevitably ends up with storage vendor marketing hyperbole flooding the internet.

'Thin Provisioning' and 'de-duplication / single instancing' are the buzzwords of the day, but what implications do these various technologies have on your datastore design? The question needs more careful consideration than your CIO who has just watched this or this may care for.

Lets take the first video from incumbent FC champion, EMC. Aside from the fact that it's all whiteboard, the presenter either doesn't know the details of what he is talking about, very craftily steps around an important technical detail, or knows something that none of the storage engineers I work with know. Since the ppl I work with are a pretty damn smart bunch, I suspect it's not the latter.

The important technical detail to note is that he doesn't at any stage actually say he is referring to a VMFS LUN. Why is that important? Because of course, SAN based snapshots are LUN based. So when he takes a 'master' LUN, full copies it 4 times for better performance, and the proceeds to snap off 1000 VM's from those 4 LUNs attached to the one ESX cluster... wait a minute, what was the LUN limit of an ESX host again?

Which leads to the inevitable conclusion that either he must be using an NFS datastore or VMware have finally rolled over to their parent company and EMC can now understand VMFS like no other storage vendor can. Queue the second video from the kings of NAS, NetApp.

Sure the video is a little far fetched (there is obviously no way you could actually run all those VM's using no more than a 10GB of disk), but it illustrates the power of using NFS as a datastore - file based clones are possible, and if you're gonna do NFS datastores then you could certainly do a lot worse than use NetApp. And before you go throwing VMware NFS vs iSCSI vs FC performance papers at me, know that NONE of their papers are based on NFS presented via a NetApp box.

The flipside to NFS via a NetApp however, is that thin provisioning goes out the Window because NetApp's WAFL filesystem touches every block of the LUN it's presented when it gets it (hey, there's a reason why WAFL stands for 'Write Anywhere File Layout'). VMFS doesn't however, thus allowing for thin provisioned LUNs to work as expected although you may need to create .vmdk's as thin too, which is not without caveats. I say may because 3PAR can actually handle the default zero'ed thick .vmdk format so if you're using them you don't need to change anything on the VMware side.

Which brings us to de-duplication / single instancing. This can't be handled by the underlying storage array without some kind of intermediary thing... like another filesystem or OS... to scan for duplicate blocks and then manage what de-duped blocks belong to what higher up the chain, and then manage what to do if something higher up needs to write to a previously de-duped block etc etc. The whole point of disk subsystems in enterprise storage arrays is to do precisely none of that - it should be done at a higher level, ideally the native filesystem level. VMFS can't do this (yet), but Vmware's SVI will go some way to removing the need for de-dup in the first place. NetApp's sis-clone feature has the same effect, will be interesting to see a performance comparison of the 2 when they go GA.

Funny thing is, no one seems to be talking much about the performance implications of stacking all these clones and snaps on top of each other. When 250 VM's do something similutaneously that requires the IO to make it all the way down to the physical device (virus scan, anyone?), hows it gonna act? Better hope you got a ton of cache and nothing else is using it.

Finally, NFS may lead to an easier migration path because any worthwhile NAS box can present NFS and CIFS. So if you have a Citrix XenServer or *cough* Microsoft Hyper-V *cough* *cough* migration on the cards in the future and by that time all the vendors virtual hard disk formats interoperate, then you'll save yourself a lot of pain on the storage side.

So what to do, what to do... as always, it depends on your environment. But I think more than ever, the old paradigm of 'FC is king' needs serious reconsideration, and VMware need to spend some time on VMFS rather than developing half arsed patching solutions (ouch - more on Update Manager during or after VMworld!).

vinternals