vinternals: November 2008

Thursday 27 November 2008

Microsoft Offline Virtual Machine Servicing Tool 2.0

This one seems to have slipped by like a Vaseline coated ninja. Microsoft have released the long awaited version 2.0 of their Offline Virtual Machine Servicing Tool (jokes ppl, jokes).

Seriously though, what impresses me the most about this tool is it's size, which is only a few MB - it's little more than a bunch of scripts cobbled together with a UI. Makes one wonder how easily it could be hacked to work with vCenter and WSUS... hmmmm...

Wednesday 26 November 2008

HA Allows Cake for All? No Wonder I'm Overwieght!

I've got a cake. It's my favourite variety, caramel mud. As much as I'd like to eat the whole cake myself, in a shameless attempt to increase my blog audience I'm going to eat half and give half to the first person to email me after reading this post*.

Question: How much cake is left for the second person to email me after reading this post?

None. That's right, I ate half and gave the other half to the first mailer.

But if I was HA configured to allow a single host failure, some recently observed behaviour indicates that the first mailer would actually only get 1/4 of the cake, and there would be some cake left for every person who mailed me until infinity. This is because HA doesn't appear to have any concept of a 'known good state' - instead, it seems that if a single host fails (half cake eaten by me), it then re-evaluates the cluster size (the half cake left is now considered the whole cake) and adjusts it's resource cap to allow for another single host to fail (first mailer takes half of the new cake size, the cake size is recalculated, next mailer gets half the new cake size, the cake size is recalculated, ad infinitum).

In my mind, this behaviour is not correct. If I tell HA I want to allow for a single host failure, then it should have some idea of what my known good state is. When one host actually does fail, then my risk criterion has been met and all resource guarantees in the event of a subsequent host failure are off. But it seems that all HA knows is how many hosts are in a cluster at any point in time, and how many failures to allow for at any point in time. This obviously has some implications for cluster design, in particular the "do not allow violation of availability constraints" option. Previously I was a strong advocate of taking this option - in the enterprise, soft limits are hardly ever honoured. But now I'm looking at using a "master" resource pool to achieve my resource constraint and switching to the other option.

I'm hoping the first person to mail me after reading this will be Duncan or Mike telling me I don't know wtf I'm talking about, but I've come to this belief after some conversations with VMware during a post mortem of a production incident so I'm not entirely to blame if I'm wrong :-P. If anyone else out there has seen similar behaviour, email me now - that half-a-cake could be yours!

*The cake is a lie.

Thursday 20 November 2008

Symantec reverts to normal, well done. And well done community!

Unfortunately I couldn't break the news when I heard it as I was at work (no way I'm blogging from there, my boss reads this - hi Mark :D), but Symantec has updated the original KB article to something much, much more reasonable (although it seems to be offline at the minute).

No company is immune from the odd premature / alarmist KB article, and however this happened doesn't matter as much as the outcome. I'm sure a lot was going on behind the scenes before I broke the news, but still I'd like to think we all played some part in getting it cleared up so quickly. Power to the people :-)

Tuesday 18 November 2008

Symantec Does _NOT_ Support Vmotion... WTF!?!?!

Make sure you're sitting down before reading this Symantec KB article which is only 1 month old and clearly states they do _not_ support any current version of their product on ESX if Vmotion is enabled. No, it's not a joke.

It's a _fucking_ joke. Symantec have essentially pulled together a list of random issues that are almost certainly intermittent in nature and could have a near infinite number of causes, and somehow slapped the blame on Vmotion without as much as a single word of how they arrived at this conclusion. When Microsoft shut the AV vendors out of the Vista kernel, I actually felt a little sympathy for them (the AV vendors). But after reading this, I can't imagine the bullshit Microsoft must have had to put up with over the years. It's no wonder Microsoft tried to do their own thing with AV.

I urge every enterprise on the planet who are customers of both VMware and Symantec to rain fire and brimstone upon Symantec (I've already started), because your entire server and VDI infrastructure is at this time officially unsupported. The VMware vendor relationship team have already kicked into gear, we need to raise some serious hell as customers to drive this point home, and hard.

This is absolutely disgraceful and must not stand.

UPDATE The original KB article as been updated, support has been restored to normal!

Monday 17 November 2008

Microsoft Azure Infrastructure Deployment Video

I've been a bit quiet of late, mainly due to the digestion of Microsoft's PDC 2008 content (what a great idea to provide the content for free after the conference - VMware could take a leaf out of that book). I meant to post this up yesterday as another SAAAP installment but missed the deadline... but what the hell, I'll tag this post anyway!

One of the more infrastructure oriented PDC sessions was "Under the Hood - Inside the Windows Azure Hosting Environment". Skip forward to around 43 minutes into the presentation, where Chuck Lenzmeier goes into the deployment model used within the Azure cloud (you can stop watching at around 48 minutes).

Conceptually, this is _exactly_ the deployment model I and ppl like Lance Berc envisage for ESXi. Rather than put that base VHD onto local USB devices ala ESXi, Microsoft PXE boot a Windows PE "maintenance os", drop a common base image onto the endpoint, dynamically build a personality (offline) as a differencing disk (ie linked clone), drop that down to the endpoint, and then boot straight off the differencing disk VHD (booting directly off VHD's is a _very_ cool feature of Win7 / Server 2008 R2). I'm glad even Microsoft recognise the massive benefits of this approach - no installation, rapid rollback, etc.

Now ESXi of course has one *massive* advantage over Windows in this scenario - it is weighs in at around the same size as a typical Windows PE build, much smaller than a Hyper-V enabled Server Core build.

And if only VMware would support PXE booting ESXi, you could couple it with Lance's birth client and midwife or our statelesx, and you don't even need the 'maintenance OS'. You get an environment that is almost identical conceptually, but can be deployed much much faster due to the ~150MB or so total size of ESXi (including Windows, Linux and Solaris vmtools iso's and customised configuration to enable kernel mode options like swap or LVM.EnableResignature within ESXi that would otherwise require a reboot (chicken, meet egg :-)) versus near a GB of Windows PE and Hyper-V with Server Core. Of course I don't even need to go into the superiority of ESXi over Hyper-V ;-)

With Windows Server 2008 R2 earmarked for a 2010 release, it will be sometime before the sweet deployment scenario outlined in that video is available to Microsoft customers (gotta love how Microsoft get the good stuff to themselves first - jeez, what a convenient way to ensure your cloud service is much more efficient than anything your customer could build with your own technology). But by that time they will have a helluva set of best practice deployment knowledge for liquid infrastructure like this, and you can bet aspects of their "fabric controller" will find their way into the System Center management suite.

The liquid compute deployment gauntlet has been thrown VMware, time to step up to the plate.

Sunday 9 November 2008

VMware: Unshackle ESXi, or Follow Microsoft Into the Cloud...

One of the things that will be interesting to observe as the cloud matures will the influence the internal and external clouds will have on each on each other.

To this end, I think Microsoft have benefited from years of experience with the web hosting industry, and as a result have completely nailed the correct approach for them to take into the cloud. VMware doesn't have the benefit of this experience, and should watch Microsoft closely as the key to their strategy has more in common with what VMware needs to be successful than one might think. That key being cost.

For years, Microsoft never made a dent in the web hosting space. It would be naive to attribute this solely to the stability and security of the platform - IIS 5 had a few howlers sure, but so has Apache over the same time. IIS 6 however was entirely different - in it's default configuration, not a single remotely exploitable vulnerability has been publicly identified to date. But there was the cost angle - Windows based hosting charges were far greater than comparable Linux based offerings, and they pretty much stayed that way until SWsoft came along with Virtuozzo for Windows in 2005. The massive advantage Virtuozzo had (and still has) over other Virtualisation offerings from a cost perspective is that only a single Windows license is required for the base OS - all the containers running on top of that base don't cost anything more. Now all of a sudden web hosting companies could apply the same techniques to Windows as they had used for Linux based platforms for years, and achieve massive cost reductions.

I can see a similar problem for VMware in the external cloud space. Let's compare 2 companies currently offering infrastructure based cloud services - Terremark and Amazon.

Terremark's 'Infinistructure' is based (among other things) on hardware from HP, a hypervisor from VMware (although it's not clear if this is the only hypervisor available), all managed by an in house developed application called digitalOps which apparently had a developmemt cost of over $10mil USD to date.

Amazon's EC2 on the other hand uses custom built hardware from commodity components, uses the free open source Xen (and likely their own version of Linux for Dom0), and of course a fully developed management API based on open web standards.

Comparing those 2, which do you think will _always_ cost more? If VMware want to seriously compete in the external cloud space, they need to address this. Right now I see 2 options for them. One is to unshackle the free version of ESXi, so that the API is fully functional and hence 3rd parties could write their own management tools to sit on top and not have to pay for the VMware tools (and then have to write their own stuff anyway, as Terremark have done). The other is for VMware to enter the hosting space themselves, as they would be the only company in a position to avoid paying software licenses for ESX and their VI management layer.

I'm sure VMware realise they only have a limited time in which to establish themselves in this space, because 2 trends are in motion that have the potential to render the operating system insignificant, let alone the underlying hypervisor: application virtualisation, and cloud aware web based applications. But if will likely be some time before either of those 2 will be adopted broadly in the enterprise, and hence VMware are pushing the fact that their take on cloud is heavily infrastructure focused, and thus doesn't require any changes in the application layer.

Sure, unshackling free ESXi would mean missing out on some revenue from external cloud providers, but it will go a long way towards insulating them from any external cloud influence creeping into the internal cloud. What do I mean by that? Say I was a CIO, and I had 2 external cloud providers vying for my business. While they both offer identical SLA's surrounding performance and availability, one of them offers a near seamless internal/external cloud transition based on the software stack in my internal cloud. It is however 3 times the cost of the other provider. After signing a short-term agreement with the more expensive cloud provider, I would be straight on the phone to my directs asking for an investigation into the internal cloud, with a view to making it more compatible with the cheaper external offering.

Which is why Microsoft have got the play right. They know that no 3rd party host can pay for Windows and compete with Linux on price. So instead of go after it with Hyper-V on an infrastructure level, they're taking a longer term approach and going after the cloud aware web based application market. Personally, I can't ever see VMware becoming a cloud host, which means they need to do something about unshackling ESXi. And fast - the clock is ticking.

vinternals