jmbrinkman

Posts Tagged ‘Microsoft’

Start Me Up? – Windows 8 Consumer Preview First Thoughts

In Proxy, Tech Ed on July 3, 2012 at 20:16

After two years of being highly skeptical about everything Microsoft – especially if you consider how positive I was after attending TechEd 2010 – attending another edition of TechEd sorta won me over again. The promise of a integrated holistic (yes even MS itself uses this word now) management platform finally seems to be fulfilled with System Center 2012 and even Server 2012 without the whole suite seems to all about integration, open standards and the acknowledgement of the fact that for some people and companies there is no cloud like their private cloud.

I even installed Windows 8. Most of the reviews I’ve seen have been ambiguous to say the least. At TechEd 2012 I saw tablets running Win8 – on that platform the Metro UI looks and feels more modern then IOS. Obviously most of the way you interface, the strong connection to cloud apps and the ability to federate data from different sources have been well stolen from Apple. But true multitasking (even if you can only run two apps next to each other on one screen) is a big plus. The fact that you can use your regular Desktop apps on those (non-ARM) devices might be an advantage as well – but a lot will depend on how well they are suited to be used with a touch interface.

Now running it on your desktop…or laptop is a whole different matter. The absence of integration between the Metro and Desktop worlds is a big problem. I don’t mind having nice looking apps to do certain jobs – like reading a book or watching a movie – I do mind being able to ALT-TAB through both these full screen apps and my desktop apps. And I don’t mind having 10 ways to alter my settings – as long as lead to the same set of settings. I need to know what to change and where to change it.

If you read any of my previous articles you know I have a special interest in proxy servers. Well hang on to yourselves – Metro gave us another way to define a proxy. Metro apps bar IE 10 Metro don’t use WinINET or Winhttp – their proxy is defined in a Group Policy Setting. If you want to read which look here.

There is also the so called improvements for multi monitor setups. When I ran the pre-install wizard it told me Ultramon wasn’t supported so I crossed my fingers and hoped for the best. What I got was:

– An customizable dual screen taskbar. Finally

– Hotkeys to move Windows around like it was in Windows 7 – but no buttons in the right hand corner of each window like in Ultramon or similar utilities

– Metro on one screen, the Desktop on the other. Now at first that made me really happy. If they won’t integrate maybe I can run them next to each other on different screens. But no – selecting a Desktop app will minimize my Metro….

The Desktop itself is faster, more responsive and I don’t care for the Start button that much. I wonder if the Rolling Stones were still getting royalties from way back when MS used Start me up at the Windows 95 launch – but I doubt they will care either. Press the Windows button and start typing – you get a nice quick list of suggestions be it regular programs, applets that change settings or individual files.

The Metro apps are good to have on my laptop at home when I want to look at some photo’s, chat or look up random stuff on Wikipedia. But I hope I’ll be able to turn it off on my workstation at work, unless Microsoft finds a way to access both worlds in a unified and seamless manner.

Netscaler Load Balancing: Monitor TMG Webproxy with User Authentication

In Citrix, Netscaler, TMG 2010 on November 22, 2011 at 11:51

We use a Microsoft Forefront Threat Management Gateway 2010 server array as forward proxy servers. Instead of using a autoconfig script, WPAD or the firewall client we use a load balanced VIP on our Netscalers to direct client towards the proxy. The setup is quite simple – a client connects to the VIP on port 8080 and the Netscalers sends the request over to TMG. Because we want the second proxy server to be passive we use a backup VIP instead of two services behind the first VIP.

Now one of the advantages of a hardware load balancer in this scenario over a software based load balancing solution (such as vanilla or TMG integrated MS Network Load Balancing) is that a Netscaler can be configured in such a way that its application and even application performance aware if you want. We were only looking for application awareness – especially because we ran into situations where TMG said it was happy, SCOM said it was happy and there was more then enough cpu, memory, network resources and bandwith to go around – but clients weren’t able to get a single page from the Internet. But TMG has such a special place in my heart that I’ll devote an entire post to it later this week.

Anyway – Netscaler to the rescue.

This is what I wanted to do: build a monitor that retrieves a website through the webproxy server. That’s been done before: How to Configure an HTTP-ECV Health Monitor for Internet Proxy Servers . But that was for an unauthenticated proxy server.It did give some pointers on how to configure it with authentication. And luckily we allow Basic Authentication (using ntlm should be possible I guess using the right perl script) so all seemed well.

First I’d like to point out that I’ve moved from using the GUI to using the CLI to configure things such as new vservers and monitors. I’ve been in a situation twice where a change in the GUI didn’t come through properly – even after saving and refreshing all.

Secondly – the method in the article mentioned above doesn’t work :(.

I tweaked the parameters and headers over and over but either TMG didn’t accept the request or the Netscaler couldn’t find the pattern in the response. I did some tracing with Network Monitor but even when TMG sent back a proper 200 status code the Netscaler said the service was down. But at some point I found another Knowledge center article: How to Configure a NetScaler Monitor to Authenticate with a User Name and Password.

I quote: “Do not use an HTTP-ECV monitor when sending additional headers such as authentication, host, and so on.”.

Wow silly me – how did I ever get that idea…??

Following the article, what I did was this:

add lb monitor Proxy_Monitor TCP-ECV -send “GET http://www.citrix.com/ HTTP/1.1\r\nProxy-Authorization: Basic Veryintimidatingbase64stringletsnotusepriviligedaccount\r\nHost:www.citrix.com\r\nCache-control: no-cache\r\n\r\n” -recv 302 -LRTM ENABLED -interval 30

Remarks:

  • The base64 string can be obtained by using Powershell (or from the netscaler CLI – see the article):

function ConvertTo-Base64($string) { $bytes = [System.Text.Encoding]::UTF8.GetBytes($string); $encoded = [System.Convert]::ToBase64String($bytes); return $encoded; } Source

  • You need to use the Proxy Authorization instead of the Authorization header
  • You can set the realm using a header or include it in the username (domain\username:password and then encode with base64)
  • TMG really wants you to give it the full GET, so include the whole url and it wants a host header with the hostname of the destination url
  • We are testing getting a page from Internet – not from our cache so I use a cache-control header
  • The receive string here is not 200 but 302 because that’s the redirect we get when we request http://www.citrix.com (or http://www.google.com for that matter).
  • To prevent a failover when a single website is offline for some reason,I’ve made two monitors and bound them to each service, each going to another url and using another user account so that we can prevent an account lockout ruining our day as well. Then by setting the -monThreshold parameter on the service to 1 and giving each monitor a weight of 1 I can ensure that the service is up if one of the monitors is successful.

I hope someone will find this information useful – one small disclaimer: Basic Authentication is not encrypted – just encoded – and therefore basically clear text.

A Heraclean Task? Active Directory, Kerberos and the krbtgt account.

In Active Directory, Kerberos on November 11, 2011 at 10:54

Hercules presenting Cerberus to Eurystheus

Does any of this sound familiar?

  • All domain controllers have been logically corrupted or physically damaged to a point that business continuity is impossible; for example, all business applications that depend on AD DS are nonfunctional.
  • A rogue administrator has compromised the Active Directory environment.
  • An attacker intentionally—or an administrator accidentally—runs a script that spreads data corruption across the forest.
  • An attacker intentionally—or an administrator accidentally—extends the Active Directory schema with malicious or conflicting changes.
  • None of the domain controllers can replicate with their replication partners.
  • Changes cannot be made to AD DS at any domain controller.
  • New domain controllers cannot be installed in any domain.

(Source).

When any of the above is true you know that you have a problem. It’s under these circumstances that a Forest Recovery could be necessary. I’ve never been in such a situation and I sincerely hope I never will be. But let’s assume that you are looking out over the smoking ruins of your Active Directory forest and have found someone else to blame it on – what to do next? Either you will actually have a good restore procedure in place based upon Technet or you will Bing for it 😉

Now the procedure itself is pretty straightforward – make sure all DC’s are down and out and restore one DC from a valid backup starting with your forest root domain. There are a lot of steps in between – but hey – if my forest is down and Microsoft tells me to jump through a flaming hoop in my underwear, smoking a cigar and shouting “Developers, Developers, Developers!” – I will do just that.

One of part of the procedure that has always interested me is where you try to make sure no rogue DC screws up your non-authorative restore by replicating the corrupt right back at you. You might wonder why. Well first of all if anything will enable you to really get an understanding of Kerberos and replication – trying to break it will. Secondly let’s assume you make a mistake during the procedure (or perform any of of the steps in the wrong environment when testing 😉 ). The steps are:

Pre-recovery

Step 3 – Shutting down all other writable domain controllers. That seems logical. If they are shut down they can’t replicate.

Recovery:

Restore the first writable domain controller for the forest root domain, steps 8 to 11

  • Delete server and computer objects for all other domain controllers in the forest root domain.
  • Reset the computer account password for the domain controller you are restoring (twice)
  • Reset the krbtgt password (twice)
  • Reset the trust password (if any – and twice). This steps seems fairly logical as well – it’s basically the same thing as deleting the computer accounts for the DC’s in your forest.

In a note, the procedure on Technet tells us that these steps are needed in order to prevent replication – but not how those steps will prevent it. Let alone what will happen if you fail to perform any of the steps. Others have touched on this subject, Jane Lewis (in 2006..) devoted a very informative blog post to the subject: The KRBTGT Account – What is it ? . However this only tells us about the krbtgt password reset – and doesn’t really tell us how this reset will stop replication. In order to grasp the whole concept we will have to take a closer look at the Kerberos authentication process.

Triple headed monster

Kerberos – named after the hell hound of classic mythology which guarded the underworld making sure only dead people could enter – is a authentication protocol. For a good introduction to Kerberos as used in Windows see Kerberos Explained by Mark Walla or check the IETF RFC’s on the subject. The authentication process consists of three actors: a client, a server and a third party – the Key Distribution Center (KDC). Hence the analogy with a dog with three heads.

AD and Kerberos

Active Directory replication uses Kerberos Mutual Authentication to authenticate before a replication operation can be performed. Now I want to try to describe the scenario where you have just restored the first writable DC in your forest root domain and a rogue DC exists in the same domain which contains newer (but corrupt!) data and wants to replicate that data really really bad.

The above diagram shows two ways authentication might occur – however unless the KDC service on DC2 would be broken or otherwise unavailable I think that a DC, as a Kerberos client, will always connect to it’s own KDC. And since a service ticket will be encrypted with the service key by the KDC – TGS, and the service key for DC1 on DC2 will be different from the service key on DC1 – DC1 will never be able to decrypt it.

So why would you have to reset the krbtgt password as well? If the DC1 and DC2 both have different keys for each other ( DC1 had it’s password reset so it’s not equal to the password known by DC2 and DC1 doens’t have a password for DC2 because we removed it’s computer account from the restored AD) there is no way in Hades they will be able to communicate.

The only scenario I could think of where not resetting the krbtgt password would have any serious consequences would be:

  • DC2 uses it’s TGT ticket from it’s own KDC – AS to connect to the KDC – TGS on DC1
  • DC1 will issue a service ticket to DC2 – encrypted with correct service key and the client key contained in the TGT
  • DC2 will then contact the DC1 in order to trigger replication
  • When DC2 presents the service ticket, decrypt it (and validate the authenticator) and create an access token based upon the SID in the ticket.

Now I’m not sure if: a) a client would ever use a it’s own KDC – AS but another KDC – TGS b) an access token could be created and/or the authorization could be given to a client that no longer exists in AD Now to determine if this scenario is even possible I would have to test this in a LAB and in due time I will – but I’d also love to hear from someone else who already tried something similar or has more theoretical knowledge about how this process works. Of course – even if resetting the krbtgt account has little or no effect it won’t hurt to do it.

Even if you would reset this account on an active and healthy AD you shouldn’t experience any real issues except that all previously created TGT’s are invalid on the DC you changed it on and any new TGT’s issued by the DC you changed it on will be invalid on all other DC’s. This could lead to a service disruption for users at least until the new krbtgt password is replicated through the forest ( and that is something for which I unfortunately have seen “test” results from a production environment). The step is even mentioned in certain kb articles by MS to resolve certain problems with Kerberos authentication – without any warning or information on the risk and impact.

Event ID 14 – Kerberos Key Integrity

Event ID 10 – KDC Password Configuration

But I do get an uneasy feeling performing steps in procedure which is so complicated and crucial without knowing why I am performing said steps – and that’s where the analogy with Heraclean tasks comes into play. Hercules actually had to perform two extra tasks to atone for his sins (slaying his sons after being driven mad by Hera) because he didn’t read the fine print and two of his labours weren’t counted by Eurystheus…which forced to him to confront Kerberos in the first place…

This blog has moved to mutiplechoicesystemsengineer.nl.

Battle for Cloud City: Microsoft strikes back? Part I.

In Opalis, Operations Manager, Service Manager, System Center, Virtualization, Vmware on November 7, 2011 at 10:57

A long, long time ago in a galaxy far away business thrived on the planet of Bespin. An almost unlimited source of revenue – clouds – secured the quiet life of Cloud City’s inhabitants 🙂

But those days are gone and The Empire is attempting to take control of the clouds with its hosts of Hyper-V fighters and the SCVDMM (System Central Virtual Destruction and Mayhem Manager) aka “Death Star”.

A day after the announcement of the GA of Vmware’s vCenter Configuration manager, Vmware’s vOperations Suite and Microsoft System Center Suite are facing off in their battle for the private cloud. Of course there are other vendors that provide similar management suites – but because both suites are directly linked with each vendor’s own hypervisor layer I think both will be an obvious choice for customers. Almost a year ago I already voiced my views on why I think that Microsoft might have an advantage here – but in this post I want take a brief look at both suites ( and related products from both vendors) to see what areas of private cloud management they cover.

The term suite implies a set of tools built upon a central framework and using a single data set – however each suite consists of several essentially different products that have been brought together with varying levels of integration. This is because of the different roots of each product but also because each product is built to be used separately as well as in combination with the rest of the suites. This and the fact that both suites are able to connect to other system management software as well means that if a feature is missing from the suite that you might be able to integrate another products with either suite just as well. Both suites have links with EMC Ionix family for instance.

I’m going to do that by comparing each offering in 3 different categories:

  • Configuration and Monitoring: the five infrastructure layers
  • Trivia 😉
  • Management and additional features

I’ve compiled a small table for each category highlighting 4 or 5 components that I believe make up that category – each category will get its own post.

This is in no way a complete or even refined comparison but its also a comparison based on documented features and aspects of both products – however I do intend to test and blogs about the two suites extensively in the near future.

When I mention a product I am talking about its most recent version – unless stated otherwise. Most of the System Center 2012 stuff is still beta or RC, some might say that that makes this comparison unfair – on both sides. But I think the fact that Microsoft might lack some features because the product isn’t finished is nullified by the fact they don’t have  to provide the quality and stability needed for a released product. And you could make the same argument the other way around.

C&M: The five infrastructure layers

First Star Wars and now this category that sounds like a Kung Fu movie..

In this part I want to look at which part of your “private cloud” infrastructure each suite can manage, configure and monitor. The layers that I have defined here are:

  • Storage
  • Network
  • Hypervisor
  • Guests
  • Applications

This leads to the following table (click to enlarage):


My conclusion: Microsoft is able to cover every layer with regard to monitoring and most with configuration/provisioning etc. Vmware is not. But if you can’t configure network devices from System Center and you need another application to do that chances are that application will also be able to monitor those devices.

Nota Bene:

  • Service Manager and Orchestrator really add value because they are the applications that really tie all the data from SCOM and SCCM together and makes it possible to use that data to build an intelligent management infrastructure.
  • As mentioned in other blogs and sources – dynamic discovery, self learning performance and capacity analysis are key features in managing a highly abstracted/virtualized infrastructure. Vmware sees this and seems to have given such features priority offer more “classical” features.

Sources:

vCenter Operations Docs

vCenter Configuration Manager Docs

Nice blog post comparing Vmware with other systems management applications

SCOM 2007 R2: Monitoring vSphere Shoot Out

In Operations Manager, Virtualization on November 1, 2011 at 20:52

Update: I’ve done a mini-review on SCVMM/SCOM 2012 and vSphere monitoring

We are a Microsoft shop. And a Vmware shop. We use SCOM to monitor everything and vSphere to host all our servers. So you can imagine how crucially important it is for us to properly monitor vSphere. With SCOM. Of course Virtual Center does a great job in giving us basic information about our hypervisor environment and the state of our virtual machines. But without the information about our applications SCOM provides and no real way to relate the two sets of data we really needed a way to get that information into one system.

Of course, there are other monitoring solutions, both for vSphere and for Microsoft applications. But we want to take advantage of our investment in SCOM and we firmly believe that SCOM is the best options to monitor  a 99% Microsoft infrastructure.

We were not the first facing this challenge. Because a challenge it was. We did our best to look at as many options as we could and in the end made a choice based on both functionality and price.

In this post I want to give a short overview of the solutions we looked at and give my personal opinion on each of them.

The contenders

In no particular order:

We also expressed some interest in a management pack created by Bridgeways, but they were very slow to respond to our request for a evaluation and once we got a response the amount of information we had to provide in order to evaluate the pack was so huge we decided it was not worth the effort.

Small disclaimer: we really did our best to give each solution a fair shot, however it could be possible that additional configuration or tweaking would increase the performance or the quality of the data. On the other hand we didn’t take into account how hard to was to actually get the solutions working – because the installation process (especially under Windows 2008) wasn’t always easy though nothing we couldn’t handle.

Round 1: What do they monitor – and how?

All of the solutions work through vCenter, with the exception of QMX which is able to monitor vSphere hosts directly through SNMP and SSH. I guess you could configure Jalasoft or even SCOM itself as a generic SNMP device or build your own sets of monitors and rules but in general you will still need vCenter as a middle man to monitor your hosts.

None of them consists of just a Management Pack – they all need a service running on either a SCOM server or a separate server with access to SCOM. Jalasoft and QMX are frameworks – so its possible to monitor other devices as well which makes it easier to digest that you need to add another component to your monitoring infrastructure – SCVMM could also be used to monitor Hyper-V or to manage vSphere and Hyper-V.

Jalasoft’s Smart MP monitors just vCenter. Hosts are discovered as part of the vCenter server but aren’t represented as separate entities. SCVMM monitors both vCenter, hosts and virtual machines however it will not give you any vSphere specific data such as CPU ready times, Memory swapping etc. During our tests a vSphere host failed and we had fixed the problem before SCVMM alerted us. QMX gives you an afwul lot of options – it can monitor vmware logs, syslogs on the esx servers, esxtop data (my personal favourite) and also give you the possibility to create custom filters on log files to trigger an alert if an entry matching the filter is logged. It also is aware of vCenter alerts en events but I didn’t find any monitor or alerts relating to DRS or HA.

Veeam monitors just about everything that makes vSphere vSphere. Also a lot of work has been put in the knowledge in the alerts as well – and the alerting is really quick and accurate. Therefore Veeam wins this round.

Round 2: Pricing

vSphere is expensive – period. And since vCenter has its own monitoring capabilities it could be hard to justify another large investment. As always its hard to define a ROI on solutions that mitigate risks if it is possible at all. QMX for vSphere is free. Extensions for other devices are not and are generally somewhat more expensive then other solutions (for instance for networking devices) – but I’ll talk more about that in round three.

With Jalasoft you pay per device. If you have one vCenter server, you pay for one device. SCVMM is a a part of the System Center Suite. If you have the proper agreement with Microsoft you get it for “free” once you’ve joined the dark side.

Veeam is so closely aligned with vSphere – they even have (or at least had with vSphere 4.*) the same pricing model. And the price per socket is quite high. But you could ask yourself – if proper monitoring, performance analysis and trend based alerting can increase my consolidation ratio I will be able to host more servers per physical host and need less sockets, less vSphere licenses and less Veeam licenses.

QMX is completly free – except for the OS license for the machine you host it on – so QMX wins this round.

Round 3: Vision, Tactics, Strategy..whatever

This round is about how the solution fits in a management or monitoring vision. So the outcome is going to be very subjective. But hey – when vendors talk about a journey to the cloud they are talking about just that – a vision or even a paradigm if you want about how to manage infrastructure to properly deliver services to users.

If you are virtualizing your infrastructure you are consolidating. So one thing you don’t want to do is to introduce a monitoring server sprawl. Despite the name the current incarnation of the System Center Suite is not at all an organic whole. Still using SCVMM makes sense, especially if you also use Hyper-V in your environment – but you would still need to check vCenter regularly as well because otherwise you are going to miss crucial information about the state of your environment.

Jalasoft and QMX are frameworks. QMX also gives you the possiblity to extend System Center Configuration Manager and has the broadest support for other non-Microsoft platforms and devices. Jalasoft is very network oriented but has a great integration with another add-on to SCOM, Savision LiveMaps.

Veeam – as described in the previous rounds – is very vSphere oriented. It does vSphere, it does it very well, but you will still need something of a framework next to Veeam and SCOM to monitor the other layers of your infrastructure such as your SAN storage or your network.

I put my faith in the frameworks. And I think its inevitable that a solution like Veeam will be built by either Vmware themselves or one of the vendors that offer a monitoring framework at some point in the near future. This round goes to QMX because of the integration with SCCM and the support for just about any non-Windows platform or application out there.

So the winner is..and some final thoughts

I think QMX is the best option available today if you are looking for a solution that is very configurable, affordable and has enough promise for the future to justify investing time and money into making the framework part of your monitoring infrastructure. But….

  • There are other options – vKernel has quite a nice toolset and claims to connect to SCOM – I will be testing that soonish
  • SCVMM 2012 is said to prvoide better vSphere integration and SCOM 2012 is said to have improved network device monitoring. I will look at those two in detail as well and report back with my findings.
  • You could build your own MP – you get get all the relevant data from vCenter using Powershell and SNMP gets and traps
  • SCVMM 2008 has a nasty habit of setting custom properties on your virtual machines – but you can us Powershell (isn’t that ironic) to get rid of those properties – for more info : VCritical article
  • Since Powershell and vSphere are so compatible I’m really surprised that I haven’t found a solution based on just Powershell to link SCOM and vSphere together.

Monitoring Citrix Netscaler Load Balancers with SCOM 2007 R2 Part II.

In Citrix, Netscaler, Operations Manager on October 20, 2011 at 22:07

This is part two of my series on monitoring Citrix Netscalers with SCOM 2007 R2 ( Part I ).

In the previous post I discussed why we decided to use SCOM to monitor the Netscalers, the MP’s installation and the Netscaler’s configuration. In this post I will discuss discovering the Netscalers in SCOM and the general usage of the MP.

Discovery

The Netscalers need to be discovered as generic network devices. After they’ve been discovered a scheduled discovery will discover them as Netscaler devices based on their SNMP OID. After that another discovery runs to identify the installed features and modes.

  • Open the SCOM console, choose Administration and start the Discovery wizard.
  • Choose Network Devices
  • Specify an ip range that includes both your NSIP’s.
  • Select SNMP v2, specify your community string and Management Server

  • Now start the discovery, if you’ve configured the Netscaler correctly the wizard will detect two network devices. You will be able to see them both listed under Administration/Network Devices

The discoveries that are ran automatically against all network devices run every 21600 seconds. So you can either wait until it start or override the discovery. The discovery simply discovers all SNMP devices with a certain OID (if included a screenshot of the xml as a reference):

After the Netscalers have been identified as Netscaler Devices they will show up under Monitoring/Citrix Netscaler Devices/All Devices and the following discoveries which are ttargeted at the Citrix NetScaler Device class will start to discover additional classes and some properties to the Citrix Netscaler Device class:

  • Citrix Netscaler Feature Discovery – this will detect all features and their state ( Load Balancing, Access Gateway etc)
  • Citrix Netscaler Mode Discovery – this will detect all modes and their state (L2 versus L3 etc)
  • Citrix Netscaler Device Discovery – this will add the Node State ( Primary/Secondary), Host Name, HA Peer IP and hardware version

This is the point where we ran into some issues. Discovering the Citrix Netscaler Device class went fine but the other classes weren’t discovered at all and the extra attributes weren’t populated. Looking at the evenlogs on the management server I discovered an event with the following error message:

Error Message: 91\2600\Citrix.NetScaler.VirtualServerState.vbs(44, 9) Microsoft VBScript runtime error: ActiveX component can’t create object: ‘SScripting.SNMPManager’

This leads me to the Citrix Knowledge Center article I mentioned earlier ( Case Study: When installing…Error Message ). I downloaded the MP from the Citrix Community page and installed that over the version I had downloaded from MyCitrix and after a reboot the discoveries did identify the modes, features and attributes.

Configuring the MP

When we look at the Monitoring view – the Netscaler MP has 4 main nodes:

  • The root node – this contains an alerts view, a config changes view and events view and a Network Diagram.
  • The Device state node – this shows has two views: Active Devices which lists all the primary nodes and All Devices which shows all nodes.
  • The License & Modes node – this give a state view of all the features and modes as they are configured on each appliance
  • The Performance node – this has a rather large number of performance views

Alerts seems pretty self-explanatory however it is important to note that the alerts contain little information. You’ll know  a rule has triggered an alert but not why. Same goes for the Config Changes. Both will tell you there has been a alert or a config change, but the actual data is in the events view. Here all events (be it triggered alerts or snmp traps or config saves, changes, reboot etc) are logged with all the data provided by the SNMP GET or trap.

The network Diagram was a bit of a disappointment, I would have hoped to see the Vservers and the services in there as well.

License and mode views aren’t to pretty but they do the job, Licenses:

Unfortunatly you’ll need to select a row to see to which appliance it belongs when looking at licenses. The modes view is much better:

The performance views are grouped into several categories, ACL, IP, SSL etc. None of the rules and monitors are enabled by default. Which brings me to a point of criticism – why are all rules and monitor disabled by default and then overidden with an override that’s stored in main Citirx Netscaler MP? Again something that goes against Best Pratices.

Actually most performance counters aren’t active (or have an override by default)when you install the pack – you’ll need to override them one-by-one to be able to get that data into SCOM. This is where a tool such as OverrideExplorer ( I used v3.3. ) can prove to be invaluable, since for each category there are several snmp get rules and in order to fully populate the performance views you’ll need to override all of them.

One clue – when you open the authoring pane in SCOM and limit to the scope to include only the Netscalers you can find the rules needed to each catergory by looking at their name. They will start with the name of the performance view in the monitoring pane and start with a capital. In the picture below you can see all the TCP rules, and if you look at the Override Management Pack you can see I used a custom override pack which means they weren’t enabled by default:

Using this information you can override the performance rules in bulk using Override Explorer.

Then you are ready to go. In the next part I will show the MP in action and show how you can configure and enable/disable the SNMP traps sent by the Netscalers.

Monitoring Citrix Netscaler Load Balancers with SCOM 2007 R2 Part I.

In Citrix, Netscaler, Operations Manager on October 19, 2011 at 19:19

Introduction

(Part II , Part III)

We recently introduced two Citrix Netscaler clusters into our environment. The first cluster was already running as a Citrix Access Gateway cluster (as an upgrade from our Secure Gateway – needed to support Citrix receiver on IOS devices), we purchased a load balancing license for that cluster and are using it to load balance servers in our DMZ. The other cluster is used to load balance servers in our internal network.

We mainly use the load balancers to create what I call “controlled redundancy”, but we do use it for several critical applications, such as the before mentioned XenApp environment. And one of the key elements in achieving this state of controlled redundancy in my humble opinions is being able to monitor these clusters.

Citrix offers an excellent application to monitor and administer their line of networking products called “Command Center”. But our central monitoring solution is Microsoft SCOM 2007. Of course we could have decided to use both products side-by-side or try to engineer some connector between Command Center and SCOM. But since the number of management task we have to perform on our Netscalers is very small – and the fact that Citrix has a SCOM MP for the Netscalers – we are now managing the two cluster using the GUI and SSH for the time being and installed the SCOM MP.

In this series of posts I am going to show how we installed, configured and tuned the management pack. I’m also going to cover the configuration of the Netscalers and the usage of the Netscaler pack – mainly because its structure is a little different then most standard Microsoft MP’s.

We use vSphere as our virtualization platform so I have no experience with the PRO MP’s that are provided to use SCVMM PRO TIPS  – so all I can say about that is that its unfortunate that there is no comparable feature for vSphere.

Installation

The SCOM pack can be downloaded from myctrix if you have the proper licenses associated with your accounts. However – the same pack can also be obtained from the following Citrix Community blog post 🙂

http://community.citrix.com/pages/viewpage.action?pageId=79463085

I found that link in this Citrix KB article: http://support.citrix.com/article/CTX122844 – which discusses an issue with this pack and a x64 OS. We actually ran into this issue but more about that later.

Btw both downloads will get you the 2.0 version of the MP – there is a 1.0 version out there for older firmware builds. We have both a classic 9.2 build and a ncore 9.2 build in our environment and we use the 2.0 pack for both.

The installation is pretty straightforward. We do all SNMP based monitoring from a separate management server so it made sense for us to install the MP there. The management pack can do SNMP gets and receive SNMP traps so you’ll have to enable the built-in SNMP service on the management server.

You run the installer and then import the MP into SCOM.  Now its time to configure the Netscalers!

Netscaler Configuration

In order to configure the Netscalers to be monitored by SCOM there are a couple of things you’lll need to configure, but one of things that really bugged me was the fact that in order to properly monitor the cluster I needed to be able to add both nodes to SCOM – which basically means that you have to create your NSIPs in a routed part of your network, which is against Citrix best practices ( or somehow multi-home your management server of course).

So besides configuring your NSIP so that it’s reachable and has SNMP enabled everything you need to configure is in the System\SNMP node of the Netscaler GUI. I’m not familiar with the CLI yet however your just as easily configure it there I guess.

  • First there is the SNMP community:

To monitor the Netscalers only a GET permission is needed, choose Add and input your SNMP string en choose the permission

  • Then you’ll to add the SCOM server(s) or their IP range as SNMP Manager:

Choose Management Host to use a single IP, network for multiple. In our case we have a dedicated VLAN for our monitoring and management servers.

  • Next up are SNMP traps:

This is that part where I ran into some issues – it took me some time to figure out I needed to use Specific as the type instead of Generic. You also need to define the Trap destination and port. Before,I mentioned you needed to use the NSIP to monitor the Netscalers, but that’s only for the SNMP GETS because you are able to set a cluster wide SNIP or MIP as the source address. Minimum severity and Community name are obvious however don’t be fooled by the parenthesis in the Community Name field – you actually have to enter your own string without parenthesis!

That’s most of the configuration on the Netscalers – in the next two parts I’ll discuss discovering the Netscalers, how to tune and configure the monitoring process on both SCOM and the Netscaler and I’ll try to show a little bit about the structure and the usage of the MP – especially because its a little different then your ordinary Microsoft MP.

(Part II , Part III)

Sharepoint 2010 List Alerts and Managed Metadata bug fixed

In Sharepoint 2010 on May 19, 2011 at 06:59

In a previous post I discussed a bug with alert on a Sharepoint 2010 list that contains more than one managed metadata column. We’ve just been informed that this bug has been solved in this cumulative update package. The package is for Sharepoint Foundation and Server, there is a separate update package for Sharepoint Foundation-only available as well.

We’ve tested the update package and the issue has indeed been solved. After applying the package you will need to run the Sharepoint Products Configuration wizard – so there will be some downtime on your farm.

Microsoft Server Activesync, Iphone and client certificates issues

In Exchange, Unified Communications on November 16, 2010 at 16:32

At my company we are currently performing a pilot to see if we can offer corporate email to users through an Iphone. We decided to go for a simple setup: one dedicated Exchange 2007 Client Access Server facing the Internet (behind a firewall of course) using HTTPS and client certificates on the Iphones. There are plenty of guides out there that discuss this topic (with or without client certificates and with reverse proxy server infront of the CAS server) so I’m not going to elaborate on that .We did take some standard security measures like using the Microsoft SCW, enabling the Windows Firewall on the external interface and install a virus scanner on the CAS server itself. We were using Iphone 4’s with IOS 4.1 and Exchange 2007 sp1 RU 9.

After we had rolled out the profiles the Iphone’s were syncing fine, but sometimes users weren’t able to connect to the server. One particular symptom that came up every now and then was that messages containing attachments wouldn’t send but would get stuck in the device’s Outbox. Some messages would remain stuck indefinitely while other would send after a certain time period.

On the CAS server itself I noticed the following error in the Application event log:

And in the System Log:

There were also some entries in the httperr1.log:

2010-11-15 22:47:19 109.34.215.23 60140 192.168.2.40 443 HTTP/1.1 POST /Microsoft-Server-ActiveSync?User=MY USERNAME&DeviceId=MYDEVICE&DeviceType=iPhone&Cmd=Ping – 1 Connection_Abandoned_By_AppPool MSExchangeSyncAppPool

At times we would also see Connection_Dropped_By_AppPool MSExchangeSyncAppPool and the same error as above but with the actual send and save command string.

Doing some research (aka using Google/Bing) gave me some information about IIS deadlocks and I found the following suggestions:

– Add another CPU if you have a single CPU VM

– Adjust the machine.config file for the .NET version mentioned in the event log

We tested both and that had no impact.

Additional troubleshooting steps we took were:

– Remove Anti virus, disable Windows Firewall -> No effect whatsoever

– We checked the session time-out on the Firewall, because Direct Push uses very HTTP sessions -> The firewall had a time-out value of 30 minutes and since the Direct Push sessions last about 15 minutes that couldn’t be the cause of our problems either

– Upgraded one of the Iphone’s to the IOS 4.2 GM -> Nada

After that Icontacted PSS in order to jointly investigate the issue. They looked at the logs and we performed a trace but nothing really came up.

Then I decided to have another look myself. I fired up Wireshark, exported the key of the SSL certificate and traced and decrypted the conversations between the device and the CAS server. In the conversations I noticed the following HTTP response:


So apparently the web server had problems with the size of the request. Searching Technet I found this article:

http://technet.microsoft.com/en-us/library/cc737382%28WS.10%29.aspx:

If a client sends a long HTTP request, for example, a POST request, to a Web server running IIS 6.0, the IIS worker process might receive enough data to parse request headers, but not receive the entire request entity body. When the IIS worker process detects that client certificates are required to return data to the client, IIS attempts to renegotiate the client connection. However, the client cannot renegotiate the connection because it is waiting to send the remaining request data to IIS.

If client renegotiation is requested, the request entity body must be preloaded using SSL preload. SSL preload will use the value of the UploadReadAheadSize metabase property, which is used for ISAPI extensions. However, if UploadReadAheadSize is smaller than the content length, an HTTP 413 error is returned, and the connection is closed to prevent deadlock. (Deadlock occurs because a client is waiting to complete sending a request entity, while the server is waiting for renegotiation to complete, but renegotiation requires that the client to be able to send data, which it cannot do).”

I’ve tried enlarging the UploadReachAheadSize to 64k but as could be expected (the attachment was much larger than that) that didn’t help. And just as the article says, increasing this value would create an attack surface on our server. So I followed the link on the bottom of the article to this article:

http://technet.microsoft.com/en-us/library/cc778630%28WS.10%29.aspx:

The SSLAlwaysNegoClientCert property controls SSL client connection negotiations. If this property is set to true, any time SSL connections are negotiated, the server will immediately negotiate a client certificate, preventing an expensive renegotiation. Setting SSLAlwaysNegoClientCert also helps eliminate client certificate renegotiation deadlocks, which may occur when a client is blocked on sending a large request body when a renegotiation request is received.”

I then used the adsutil script to set that value and voila! The messages were sent normally and the errors stopped occuring.

If you want apply either of those settings you should remember to restart the IIS Admin service and not just reset IIS.

I’ve seen several posts on the web dealing with the same issue or at least the same symptom. They might be related to our issue and I think that the UploadReachAheadSize could also affect sending email messages when no client certificates are being used.

Persistence is Futile

In Opalis, System Center on November 11, 2010 at 17:43

 

Opalis

In my earlier post I mentioned Opalis. Now what is Opalis? Opalis is an IT process automation tool. It gives you the possibility to visually design workflows that orchestrate, manage and monitor your whole process. By using integration packs Opalis is able to communicate with a host of different systems, vendors and platforms. You can get data out of systems, into systems and base your workflow’s logica on the repsonses you get from those systems.

In the breakout session I attended Opalis was compared to a mainframe run book: a formalization of all the steps involved in a process from start to end. And because of the great interoperability you can start by taking your “informal” processes and putting them into Opalis – no chance in functionality but know you let Opalis handle the execution (for instance calling Powershell), the monitoring/logging (by raising an alert in SCOM if something goes wrong or even creating an incident in Service Manager) and the decision making logic. So instead of incorperating all of that in every script you find in your environment you create a template which you can then reuse for every task.

Opalis itself was a so called third party tool vendor but is now a fully owned subsidiary of Microsoft and has been included in the System Center suite. In later posts I will try to get into the technical details of Opalis and how it relates to Microsoft Cloud management solution.