jmbrinkman

Archive for November, 2011|Monthly archive page

Netscaler Load Balancing: Monitor TMG Webproxy with User Authentication

In Citrix, Netscaler, TMG 2010 on November 22, 2011 at 11:51

We use a Microsoft Forefront Threat Management Gateway 2010 server array as forward proxy servers. Instead of using a autoconfig script, WPAD or the firewall client we use a load balanced VIP on our Netscalers to direct client towards the proxy. The setup is quite simple – a client connects to the VIP on port 8080 and the Netscalers sends the request over to TMG. Because we want the second proxy server to be passive we use a backup VIP instead of two services behind the first VIP.

Now one of the advantages of a hardware load balancer in this scenario over a software based load balancing solution (such as vanilla or TMG integrated MS Network Load Balancing) is that a Netscaler can be configured in such a way that its application and even application performance aware if you want. We were only looking for application awareness – especially because we ran into situations where TMG said it was happy, SCOM said it was happy and there was more then enough cpu, memory, network resources and bandwith to go around – but clients weren’t able to get a single page from the Internet. But TMG has such a special place in my heart that I’ll devote an entire post to it later this week.

Anyway – Netscaler to the rescue.

This is what I wanted to do: build a monitor that retrieves a website through the webproxy server. That’s been done before: How to Configure an HTTP-ECV Health Monitor for Internet Proxy Servers . But that was for an unauthenticated proxy server.It did give some pointers on how to configure it with authentication. And luckily we allow Basic Authentication (using ntlm should be possible I guess using the right perl script) so all seemed well.

First I’d like to point out that I’ve moved from using the GUI to using the CLI to configure things such as new vservers and monitors. I’ve been in a situation twice where a change in the GUI didn’t come through properly – even after saving and refreshing all.

Secondly – the method in the article mentioned above doesn’t work :(.

I tweaked the parameters and headers over and over but either TMG didn’t accept the request or the Netscaler couldn’t find the pattern in the response. I did some tracing with Network Monitor but even when TMG sent back a proper 200 status code the Netscaler said the service was down. But at some point I found another Knowledge center article: How to Configure a NetScaler Monitor to Authenticate with a User Name and Password.

I quote: “Do not use an HTTP-ECV monitor when sending additional headers such as authentication, host, and so on.”.

Wow silly me – how did I ever get that idea…??

Following the article, what I did was this:

add lb monitor Proxy_Monitor TCP-ECV -send “GET http://www.citrix.com/ HTTP/1.1\r\nProxy-Authorization: Basic Veryintimidatingbase64stringletsnotusepriviligedaccount\r\nHost:www.citrix.com\r\nCache-control: no-cache\r\n\r\n” -recv 302 -LRTM ENABLED -interval 30

Remarks:

  • The base64 string can be obtained by using Powershell (or from the netscaler CLI – see the article):

function ConvertTo-Base64($string) { $bytes = [System.Text.Encoding]::UTF8.GetBytes($string); $encoded = [System.Convert]::ToBase64String($bytes); return $encoded; } Source

  • You need to use the Proxy Authorization instead of the Authorization header
  • You can set the realm using a header or include it in the username (domain\username:password and then encode with base64)
  • TMG really wants you to give it the full GET, so include the whole url and it wants a host header with the hostname of the destination url
  • We are testing getting a page from Internet – not from our cache so I use a cache-control header
  • The receive string here is not 200 but 302 because that’s the redirect we get when we request http://www.citrix.com (or http://www.google.com for that matter).
  • To prevent a failover when a single website is offline for some reason,I’ve made two monitors and bound them to each service, each going to another url and using another user account so that we can prevent an account lockout ruining our day as well. Then by setting the -monThreshold parameter on the service to 1 and giving each monitor a weight of 1 I can ensure that the service is up if one of the monitors is successful.

I hope someone will find this information useful – one small disclaimer: Basic Authentication is not encrypted – just encoded – and therefore basically clear text.

Quest acquires VKernel

In Quest, Virtualization, Vmware on November 17, 2011 at 21:43

Vkernel the virtualization capacity management company is now a part of Quest – now I really should rewrite my posts on cloud management… (Statement on Vkernel’s Blog) Whether this will really “accelerate growth” for Vkernel remains to be seen however as a firm believer in larger frameworks/ecosystems I applaud this addition to Quest’s already impressive, tho scattered, line-up of management tools.

The Virtualization Practice seems to share my opinion – and I agree with them on the fact that the structure of the acquisition (Vkernel will be remain a separate entity) might hinder the integration of Vkernel in a Quest management framework. However Microsoft and Opalis did something similar – and that seems to have turned out alright. I’m not sure yet where I stand in regards to the absolute necessity to, as the Virtualization Practice puts it, “..bubble all of the vSphere metrics up to three simple scores (Health, Efficiency and Risk)..” and I will get back to that some other time.

I did a mini-review of vScope Explorer not to long ago and am going to do one on vOperations as well. Maybe I’ll test drive some of the Quest stuff as well in order to form a well grounded opinion on the acquisition and Quest’s position in the cloud management landscape.

Battle for Cloud City: Microsoft strikes back? Part II.

In Opalis, Operations Manager, SCOM 2012, SCVMM 2012, Service Manager, System Center, Virtualization, Vmware on November 16, 2011 at 22:28

Part I.

One of biggest advantages of posting on a blog when compared with writing an official proposal or something similar is that I get to ramble on about the things I feel are important. Or peculiar, alienating or just entertaining. Looking at private cloud management solutions in a more trivial way give me the opportunity to talk about factors that might or might not matter for most but do say something about how a product is perceived – a degree of brand value if you wish.

You might wonder where this will lead considering that fact that the more serious part of this series started of with some dubious analogies – but don’t worry I actually intend to make a point here. This is my comparison:

I’ve conjured five topics:

  • Names – If have to explain stuff to my boss and I’ve taken the “cloud” and “virtual” hurdles, I want to have a nice set of abbreviations or a awe inspiring product name to work with
  • Powershell – Very important. Maybe a bit overrated by some – a general sense of logic, a search engine and Powergui are all that are needed to keep you from flipping burgers.
  • “Open” Standards – In what sense can each offering be accessed, extended and customized by both vendors and end-users?
  • Citrix, Emc – Alliances – The cloud and virtualization market seems rather peaceful with what I perceive as a mutually beneficial status-quo between Microsoft and Vmware on the hypervisor front, between Microsoft and Citrix on the SBC/VDI front and Cisco and EMC working with both Microsoft and Vmware to tie everything together. However if we are talking about clouds, unification and abstraction a cloud management solution that provides more integration then the cliche “single-pane-of-glass” everyone seems to be selling might dictate the choice for a hypervisor the next time licenses expire…
  • Monitoring Sprawl? – We consolidated 200 servers into 4 pieces of hardware but need 15 servers to monitor our environment…and we might feel unsure about hosting a monitoring solution on a platform that’s monitored by that solution…or which damn web interface do I need to do this…and of course – how can be VM status be green, my server object in maintenance mode and my email stuck in my outbox?

My conclusion? If you are going “Vendor A unless” Microsoft scores the best on the “trivial” side of things. If you go “best-of-breed” vSphere, vCenter (and PowerCLI/CapacityIQ) are very strong at what they do.

Note Bene:

The success of Powershell, pre-alpha stuff from EMC like project Orion and SMI-S show that there is a need for a universal API and framework for managing infrastructure. The sad part is that those initiatives are not new and many technologies have fallen and entered the eternal cloud – or are still there and are still used by deemed unworthy by some (such as SNMP).

The question that remains is – who will bring balance to the force – the chosen but fallen Anakin or the Light’s side counteraction, Luke ?

And now for something completly different.

In Uncategorized on November 11, 2011 at 21:21

Before I started my own blog I had always replied heavily on the blog posts of others in order to find solutions for day-to-day IT problems I was facing. However I always looked for that information ad hoc. About 2, 2 1/2 years ago I started following some blogs on a more regular basis – first using RSS feeds in Outlook which did the job relatively well. However once I got an Ipad I decided I wanted a dedicated app which I could use to read and store interesting posts.

There are quite a few RSS reading apps around but I quickly settled for Newsrack – a very basic but functional interface, no need for Google Reader and rather quick to navigate. I follow an afwul lot of blogs so what I do basically is that I read all posts periodically and star whatever interest me or looks interesting but is to long/complicated to read quickly. Then every week or so I pick up all the starred posts and check for relevance and then decide whether I want to keep it to do something with it later (use the info for something in my daily work, forward to friends or colleagues or write  a blog post), turn it into a task/reminder (or whatever Apple/Microsoft call it) or throw it away.

What I miss in Newsrack (or most RSS readers for that matter) is a way to easily comment on a post. Anyway I thought it would be nice to share something about the way I look at other blogs and to wrap it up I’ve attached an export of my feed list in the OPML format: Newsrack.OPML

It’s a mix of storage, virtualization, Microsoft, Powershell and Bruce Schneider.

A Heraclean Task? Active Directory, Kerberos and the krbtgt account.

In Active Directory, Kerberos on November 11, 2011 at 10:54

Hercules presenting Cerberus to Eurystheus

Does any of this sound familiar?

  • All domain controllers have been logically corrupted or physically damaged to a point that business continuity is impossible; for example, all business applications that depend on AD DS are nonfunctional.
  • A rogue administrator has compromised the Active Directory environment.
  • An attacker intentionally—or an administrator accidentally—runs a script that spreads data corruption across the forest.
  • An attacker intentionally—or an administrator accidentally—extends the Active Directory schema with malicious or conflicting changes.
  • None of the domain controllers can replicate with their replication partners.
  • Changes cannot be made to AD DS at any domain controller.
  • New domain controllers cannot be installed in any domain.

(Source).

When any of the above is true you know that you have a problem. It’s under these circumstances that a Forest Recovery could be necessary. I’ve never been in such a situation and I sincerely hope I never will be. But let’s assume that you are looking out over the smoking ruins of your Active Directory forest and have found someone else to blame it on – what to do next? Either you will actually have a good restore procedure in place based upon Technet or you will Bing for it 😉

Now the procedure itself is pretty straightforward – make sure all DC’s are down and out and restore one DC from a valid backup starting with your forest root domain. There are a lot of steps in between – but hey – if my forest is down and Microsoft tells me to jump through a flaming hoop in my underwear, smoking a cigar and shouting “Developers, Developers, Developers!” – I will do just that.

One of part of the procedure that has always interested me is where you try to make sure no rogue DC screws up your non-authorative restore by replicating the corrupt right back at you. You might wonder why. Well first of all if anything will enable you to really get an understanding of Kerberos and replication – trying to break it will. Secondly let’s assume you make a mistake during the procedure (or perform any of of the steps in the wrong environment when testing 😉 ). The steps are:

Pre-recovery

Step 3 – Shutting down all other writable domain controllers. That seems logical. If they are shut down they can’t replicate.

Recovery:

Restore the first writable domain controller for the forest root domain, steps 8 to 11

  • Delete server and computer objects for all other domain controllers in the forest root domain.
  • Reset the computer account password for the domain controller you are restoring (twice)
  • Reset the krbtgt password (twice)
  • Reset the trust password (if any – and twice). This steps seems fairly logical as well – it’s basically the same thing as deleting the computer accounts for the DC’s in your forest.

In a note, the procedure on Technet tells us that these steps are needed in order to prevent replication – but not how those steps will prevent it. Let alone what will happen if you fail to perform any of the steps. Others have touched on this subject, Jane Lewis (in 2006..) devoted a very informative blog post to the subject: The KRBTGT Account – What is it ? . However this only tells us about the krbtgt password reset – and doesn’t really tell us how this reset will stop replication. In order to grasp the whole concept we will have to take a closer look at the Kerberos authentication process.

Triple headed monster

Kerberos – named after the hell hound of classic mythology which guarded the underworld making sure only dead people could enter – is a authentication protocol. For a good introduction to Kerberos as used in Windows see Kerberos Explained by Mark Walla or check the IETF RFC’s on the subject. The authentication process consists of three actors: a client, a server and a third party – the Key Distribution Center (KDC). Hence the analogy with a dog with three heads.

AD and Kerberos

Active Directory replication uses Kerberos Mutual Authentication to authenticate before a replication operation can be performed. Now I want to try to describe the scenario where you have just restored the first writable DC in your forest root domain and a rogue DC exists in the same domain which contains newer (but corrupt!) data and wants to replicate that data really really bad.

The above diagram shows two ways authentication might occur – however unless the KDC service on DC2 would be broken or otherwise unavailable I think that a DC, as a Kerberos client, will always connect to it’s own KDC. And since a service ticket will be encrypted with the service key by the KDC – TGS, and the service key for DC1 on DC2 will be different from the service key on DC1 – DC1 will never be able to decrypt it.

So why would you have to reset the krbtgt password as well? If the DC1 and DC2 both have different keys for each other ( DC1 had it’s password reset so it’s not equal to the password known by DC2 and DC1 doens’t have a password for DC2 because we removed it’s computer account from the restored AD) there is no way in Hades they will be able to communicate.

The only scenario I could think of where not resetting the krbtgt password would have any serious consequences would be:

  • DC2 uses it’s TGT ticket from it’s own KDC – AS to connect to the KDC – TGS on DC1
  • DC1 will issue a service ticket to DC2 – encrypted with correct service key and the client key contained in the TGT
  • DC2 will then contact the DC1 in order to trigger replication
  • When DC2 presents the service ticket, decrypt it (and validate the authenticator) and create an access token based upon the SID in the ticket.

Now I’m not sure if: a) a client would ever use a it’s own KDC – AS but another KDC – TGS b) an access token could be created and/or the authorization could be given to a client that no longer exists in AD Now to determine if this scenario is even possible I would have to test this in a LAB and in due time I will – but I’d also love to hear from someone else who already tried something similar or has more theoretical knowledge about how this process works. Of course – even if resetting the krbtgt account has little or no effect it won’t hurt to do it.

Even if you would reset this account on an active and healthy AD you shouldn’t experience any real issues except that all previously created TGT’s are invalid on the DC you changed it on and any new TGT’s issued by the DC you changed it on will be invalid on all other DC’s. This could lead to a service disruption for users at least until the new krbtgt password is replicated through the forest ( and that is something for which I unfortunately have seen “test” results from a production environment). The step is even mentioned in certain kb articles by MS to resolve certain problems with Kerberos authentication – without any warning or information on the risk and impact.

Event ID 14 – Kerberos Key Integrity

Event ID 10 – KDC Password Configuration

But I do get an uneasy feeling performing steps in procedure which is so complicated and crucial without knowing why I am performing said steps – and that’s where the analogy with Heraclean tasks comes into play. Hercules actually had to perform two extra tasks to atone for his sins (slaying his sons after being driven mad by Hera) because he didn’t read the fine print and two of his labours weren’t counted by Eurystheus…which forced to him to confront Kerberos in the first place…

This blog has moved to mutiplechoicesystemsengineer.nl.

Mini-Review: Monitoring vSphere with SCVMM and SCOM 2012

In Powershell, SCOM 2012, SCVMM 2012, System Center, Virtualization, Vmware on November 7, 2011 at 23:03

Sometime ago I posted my Vsphere monitoring shoot-out. I recently had the time to install the RC of the SCVMM 2012 and the beta of SCOM 2012. There are plenty of guides out there that describe how to get you started with both products ( SCOM 2012 beta in ten minutes , SCOM 2012 Beta step by step, SCVMM 2012 Survival Guide ) so I won’t get into that to much. Some general remarks:

SCVMM

  • You need the Windows 7 AIK which is only downloadable as an ISO or IMG. That annoyed me.
  • I used SQL 2008 R2 Express as a database – in hindsight it would have been better to use a full SQL trial and host both SCVMM and SCOM’s databases.
  • Besides that the install was quick and painless

SCOM

  • Collation, Collation, Collation! Choose SQL_Latin1_General_CP1_CI_AS as your SQL collation otherwise SCOM won’t find your SQL instance and it will not tell you you picked the wrong collation.
  • You need .NET 4
  • I had some issues installing the SCOM agent on the SCVMM server. I got this error:

Log Name:      Application

Source:        MsiInstaller

Date:          4-11-2011 17:53:33

Event ID:      1013

Task Category: None

Level:         Error

Keywords:      Classic

User:          ****\****

Computer:      FQ.DN

Description:

Product: System Center Operations Manager 2012 Agent — Microsoft ESENT Keys are required to install this application.  Please see the release notes for more information.

Apparantly this is not a SCOM 2012 specific error but more a general SCOM error on Windows 2008 R2 boxes. Running msiexec from an elevated command prompt solved the problem.

Adding vSphere to SCVMM

This part is pretty straightforward as well. Open the Virtual Machine Manager Console, Fabric pane and choose Add ResourceVmware vCenter Server. Create a Run As account which has enough the required privileges (local admin on the vCenter server according to Technet). After you’ve added the vCenter server you need to each Resource Cluster (or individual host) as well in much the same way as you added the vCenter server. But since you’re already connected to vCenter you don’t have to enter RC or host names – you can just select them in a browsing dialog.

Strangely enough I wasn’t able to retrieve and accept the certificate for any of my hosts using a domain account – which does have root equivalent privileges on the hosts – but either the AD integration is flawed or I made a mistake configuring it. But I used a second Run As account using the default vSphere root account and I was able to retrieve and accept the certificates.

After that I was able to view all my hosts and vm’s in SCVMM. Same goes for templates and host networking. SCVMM even sees my dvSwitches and sees them as one entity – but same goes for my vSwitches…which is not really what I would like to see. Portgroups aren’t shown in the networking pane – but I was able to find them in the vm guest properties. I did a quick test to see if I could actually manage stuff – and I could but for now I’m more interested in monitoring vSphere I’ll get down to managing vSphere some other time.

Connecting SCVMM to SCOM

I followed this great post on the SCVMM blog to connect SCVMM to SCOM. Most notable improvement over the previous versions: no need to install the VMM console on the SCOM server. However you still need to install the SCOMsole on the VMM Server. Oh and creating the connection is now a simple wizard in the VMM console :). I had some issues with not being able to search the online SCOM catalog I needed to download the prequisite MP’s by hand.

Once I got that sorted out I completed the wizard and the connection was made.

And? Has it gotten any better?

Yes. Because vSphere and vCenter are represented just as vSphere and vCenter in both SCVMM and SCOM instead of weird vm’s on a mutated Hyper-V server the visibility and navigation is much better. But my SCOMsole immediatly got filled up with alerts telling me my vm’s didn’t have VSG installed – and because everything is discovered through your VMM server (which is does still seem to see as a Hyper-V server) it started complaining about the fact that I had more then 384 vm’s on a host.

Alerts are also a lot quicker. Views are a bit poor – especially when you consider that the way my vSphere datacenter hierarchy is displayed in SCVMM is pretty good. The fact that SCOM and SCVMM will allow me to view a diagram of a service as defined in SCVMM look really promising but I haven’t tested that yet. If you put a host into maintenance mode in SCVMM its status is automatically propagated to SCOM. There is still no link between the vm as an instance running on vSphere and the Windows computer object in SCOM – that’s a real shame.

There isn’t a lot of Vmware specific stuff there as well. I guess that remains as MS likes to call it a partner opportunity – or something you could develop yourself using vCenter and System Center’s common denominator Powershell. But I believe even that might be less of a challenge then before because of the improved SNMP support in SCOM 2012 (so you can just that in addition to the information exposed by vCenter). Still the biggest improvement seems to be on the managing side rather then on the monitoring side – which makes taking the monitoring shortcomings for granted much more plausible then before.

Battle for Cloud City: Microsoft strikes back? Part I.

In Opalis, Operations Manager, Service Manager, System Center, Virtualization, Vmware on November 7, 2011 at 10:57

A long, long time ago in a galaxy far away business thrived on the planet of Bespin. An almost unlimited source of revenue – clouds – secured the quiet life of Cloud City’s inhabitants 🙂

But those days are gone and The Empire is attempting to take control of the clouds with its hosts of Hyper-V fighters and the SCVDMM (System Central Virtual Destruction and Mayhem Manager) aka “Death Star”.

A day after the announcement of the GA of Vmware’s vCenter Configuration manager, Vmware’s vOperations Suite and Microsoft System Center Suite are facing off in their battle for the private cloud. Of course there are other vendors that provide similar management suites – but because both suites are directly linked with each vendor’s own hypervisor layer I think both will be an obvious choice for customers. Almost a year ago I already voiced my views on why I think that Microsoft might have an advantage here – but in this post I want take a brief look at both suites ( and related products from both vendors) to see what areas of private cloud management they cover.

The term suite implies a set of tools built upon a central framework and using a single data set – however each suite consists of several essentially different products that have been brought together with varying levels of integration. This is because of the different roots of each product but also because each product is built to be used separately as well as in combination with the rest of the suites. This and the fact that both suites are able to connect to other system management software as well means that if a feature is missing from the suite that you might be able to integrate another products with either suite just as well. Both suites have links with EMC Ionix family for instance.

I’m going to do that by comparing each offering in 3 different categories:

  • Configuration and Monitoring: the five infrastructure layers
  • Trivia 😉
  • Management and additional features

I’ve compiled a small table for each category highlighting 4 or 5 components that I believe make up that category – each category will get its own post.

This is in no way a complete or even refined comparison but its also a comparison based on documented features and aspects of both products – however I do intend to test and blogs about the two suites extensively in the near future.

When I mention a product I am talking about its most recent version – unless stated otherwise. Most of the System Center 2012 stuff is still beta or RC, some might say that that makes this comparison unfair – on both sides. But I think the fact that Microsoft might lack some features because the product isn’t finished is nullified by the fact they don’t have  to provide the quality and stability needed for a released product. And you could make the same argument the other way around.

C&M: The five infrastructure layers

First Star Wars and now this category that sounds like a Kung Fu movie..

In this part I want to look at which part of your “private cloud” infrastructure each suite can manage, configure and monitor. The layers that I have defined here are:

  • Storage
  • Network
  • Hypervisor
  • Guests
  • Applications

This leads to the following table (click to enlarage):


My conclusion: Microsoft is able to cover every layer with regard to monitoring and most with configuration/provisioning etc. Vmware is not. But if you can’t configure network devices from System Center and you need another application to do that chances are that application will also be able to monitor those devices.

Nota Bene:

  • Service Manager and Orchestrator really add value because they are the applications that really tie all the data from SCOM and SCCM together and makes it possible to use that data to build an intelligent management infrastructure.
  • As mentioned in other blogs and sources – dynamic discovery, self learning performance and capacity analysis are key features in managing a highly abstracted/virtualized infrastructure. Vmware sees this and seems to have given such features priority offer more “classical” features.

Sources:

vCenter Operations Docs

vCenter Configuration Manager Docs

Nice blog post comparing Vmware with other systems management applications

SCOM 2007 R2: Monitoring vSphere Shoot Out

In Operations Manager, Virtualization on November 1, 2011 at 20:52

Update: I’ve done a mini-review on SCVMM/SCOM 2012 and vSphere monitoring

We are a Microsoft shop. And a Vmware shop. We use SCOM to monitor everything and vSphere to host all our servers. So you can imagine how crucially important it is for us to properly monitor vSphere. With SCOM. Of course Virtual Center does a great job in giving us basic information about our hypervisor environment and the state of our virtual machines. But without the information about our applications SCOM provides and no real way to relate the two sets of data we really needed a way to get that information into one system.

Of course, there are other monitoring solutions, both for vSphere and for Microsoft applications. But we want to take advantage of our investment in SCOM and we firmly believe that SCOM is the best options to monitor  a 99% Microsoft infrastructure.

We were not the first facing this challenge. Because a challenge it was. We did our best to look at as many options as we could and in the end made a choice based on both functionality and price.

In this post I want to give a short overview of the solutions we looked at and give my personal opinion on each of them.

The contenders

In no particular order:

We also expressed some interest in a management pack created by Bridgeways, but they were very slow to respond to our request for a evaluation and once we got a response the amount of information we had to provide in order to evaluate the pack was so huge we decided it was not worth the effort.

Small disclaimer: we really did our best to give each solution a fair shot, however it could be possible that additional configuration or tweaking would increase the performance or the quality of the data. On the other hand we didn’t take into account how hard to was to actually get the solutions working – because the installation process (especially under Windows 2008) wasn’t always easy though nothing we couldn’t handle.

Round 1: What do they monitor – and how?

All of the solutions work through vCenter, with the exception of QMX which is able to monitor vSphere hosts directly through SNMP and SSH. I guess you could configure Jalasoft or even SCOM itself as a generic SNMP device or build your own sets of monitors and rules but in general you will still need vCenter as a middle man to monitor your hosts.

None of them consists of just a Management Pack – they all need a service running on either a SCOM server or a separate server with access to SCOM. Jalasoft and QMX are frameworks – so its possible to monitor other devices as well which makes it easier to digest that you need to add another component to your monitoring infrastructure – SCVMM could also be used to monitor Hyper-V or to manage vSphere and Hyper-V.

Jalasoft’s Smart MP monitors just vCenter. Hosts are discovered as part of the vCenter server but aren’t represented as separate entities. SCVMM monitors both vCenter, hosts and virtual machines however it will not give you any vSphere specific data such as CPU ready times, Memory swapping etc. During our tests a vSphere host failed and we had fixed the problem before SCVMM alerted us. QMX gives you an afwul lot of options – it can monitor vmware logs, syslogs on the esx servers, esxtop data (my personal favourite) and also give you the possibility to create custom filters on log files to trigger an alert if an entry matching the filter is logged. It also is aware of vCenter alerts en events but I didn’t find any monitor or alerts relating to DRS or HA.

Veeam monitors just about everything that makes vSphere vSphere. Also a lot of work has been put in the knowledge in the alerts as well – and the alerting is really quick and accurate. Therefore Veeam wins this round.

Round 2: Pricing

vSphere is expensive – period. And since vCenter has its own monitoring capabilities it could be hard to justify another large investment. As always its hard to define a ROI on solutions that mitigate risks if it is possible at all. QMX for vSphere is free. Extensions for other devices are not and are generally somewhat more expensive then other solutions (for instance for networking devices) – but I’ll talk more about that in round three.

With Jalasoft you pay per device. If you have one vCenter server, you pay for one device. SCVMM is a a part of the System Center Suite. If you have the proper agreement with Microsoft you get it for “free” once you’ve joined the dark side.

Veeam is so closely aligned with vSphere – they even have (or at least had with vSphere 4.*) the same pricing model. And the price per socket is quite high. But you could ask yourself – if proper monitoring, performance analysis and trend based alerting can increase my consolidation ratio I will be able to host more servers per physical host and need less sockets, less vSphere licenses and less Veeam licenses.

QMX is completly free – except for the OS license for the machine you host it on – so QMX wins this round.

Round 3: Vision, Tactics, Strategy..whatever

This round is about how the solution fits in a management or monitoring vision. So the outcome is going to be very subjective. But hey – when vendors talk about a journey to the cloud they are talking about just that – a vision or even a paradigm if you want about how to manage infrastructure to properly deliver services to users.

If you are virtualizing your infrastructure you are consolidating. So one thing you don’t want to do is to introduce a monitoring server sprawl. Despite the name the current incarnation of the System Center Suite is not at all an organic whole. Still using SCVMM makes sense, especially if you also use Hyper-V in your environment – but you would still need to check vCenter regularly as well because otherwise you are going to miss crucial information about the state of your environment.

Jalasoft and QMX are frameworks. QMX also gives you the possiblity to extend System Center Configuration Manager and has the broadest support for other non-Microsoft platforms and devices. Jalasoft is very network oriented but has a great integration with another add-on to SCOM, Savision LiveMaps.

Veeam – as described in the previous rounds – is very vSphere oriented. It does vSphere, it does it very well, but you will still need something of a framework next to Veeam and SCOM to monitor the other layers of your infrastructure such as your SAN storage or your network.

I put my faith in the frameworks. And I think its inevitable that a solution like Veeam will be built by either Vmware themselves or one of the vendors that offer a monitoring framework at some point in the near future. This round goes to QMX because of the integration with SCCM and the support for just about any non-Windows platform or application out there.

So the winner is..and some final thoughts

I think QMX is the best option available today if you are looking for a solution that is very configurable, affordable and has enough promise for the future to justify investing time and money into making the framework part of your monitoring infrastructure. But….

  • There are other options – vKernel has quite a nice toolset and claims to connect to SCOM – I will be testing that soonish
  • SCVMM 2012 is said to prvoide better vSphere integration and SCOM 2012 is said to have improved network device monitoring. I will look at those two in detail as well and report back with my findings.
  • You could build your own MP – you get get all the relevant data from vCenter using Powershell and SNMP gets and traps
  • SCVMM 2008 has a nasty habit of setting custom properties on your virtual machines – but you can us Powershell (isn’t that ironic) to get rid of those properties – for more info : VCritical article
  • Since Powershell and vSphere are so compatible I’m really surprised that I haven’t found a solution based on just Powershell to link SCOM and vSphere together.