How to use OpsMgr to leverage monitors on groups as overall health indicators
Hello, my name is Claudio Broglia, I’m a Dataplatform Consultant in Microsoft Services Italy.Many customers ask me about having a monitor for a group of servers that would provide some sort of overall health indicator for the collection as a whole. Usually that means something that would work like this:
- A “ Yellow ” state when > x% of the servers are in an unhealthy state
- A “ Red ” state when > y% of the servers are in an unhealthy state
This request typically is made when designing some kind of monitoring dashboard because they want a sort of health state index of the farm.
As a first answer, I try to explain them that it’s not the number of servers unhealthy, but it’s the type of problem that affects them that they need to be worried about. In fact, it could be a single alert on a single server (e.g. a certificate expired) that causes a downtime or service disruption. That said, having a health state indicator like this is often times a request coming not from the technical folks, but from management as they want to know at a glance the overall state of their infrastructure.
Unfortunately, a monitor of this kind in Operations Manager is not possible with the standard tools. In fact, the only monitor that aggregates health states and rolls up based on a percentage of objects is the dependency rollup monitor and it simply rolls up the worst state of the percentage of servers indicated in a good health state. Reading it can be more difficult to understand than what it is. This is best explained using an image, and the best is the one given by the Health Rollup tab when creating a new monitor as shown below.
To meet the goals of our original request above we will need something more flexible for our health indicator. To do that, we will use groups to have our server collection and the monitors linked to them rollup the state of their members. The idea here is to have many groups, each with the same members, where each group has a monitor that rolls-up the health state with a different percentage. In our “service dashboard” we can then link a set of corresponding shapes resembling a health indicator. Additionally, to these shapes we will apply a custom Data Graphic so that we can have a different color based on the number of objects that go unhealthy.
To start, let’s say that we want to graduate our “health index” with three states: All objects healthy, > 33% unhealthy, > 66% of objects unhealthy.
To do that, we first create two groups, one for the 33% indicator and one for the 66% indicator. Below is an example based on the default class Windows Server.
1. Create a new group, and host it in your custom management pack, and name it with some mnemonic. For example, for the 33% index, add “ –
33%” as the name suffix.
2. Next, add your objects, either by Explicit or Dynamic Membership and complete the wizard.
3. Now create than another group for the 66% index state, naming it accordingly.
4. Go to the Authoring -> Monitors section, find each group, right-click on its Entity Health and create a new Dependency Rollup Monitor.
Follow the wizard, giving a name to the monitor and hosting it in your custom management pack. Then, in the Monitor Dependency section, choose “Entity Health” in the “Object (Contains Entities)” section. This allows you to make the monitor state depend on the Entity Health of the Group members.
In the Health Rollup Policy section, select “Worst state of the specified percentage of members in good health state” and set it accordingly for the Group. For the 33% index, you should specify 67%, as the monitor works taking the percentage of objects in good (or better) state. So, to have 33% or more of objects unhealthy, you need to consider the 67% healthy before one of the objects unhealthy states is included in the rollup. Accordingly, for the other group for the 66% index, you would specify 34%.
Now you have two groups, with the same members, which becomes unhealthy in two different steps – one when at least the 33% of objects is unhealthy, and one when the percentage is at least 66%.
Next we need to switch to Visio. You need to have the Visio 2010 Extensions for System Center 2012 installed to perform the next steps.
Create a new Visio document and choose your favorite shape. Add three of them to the workspace. The first one you can color green, as it represent the (hopefully) default state of “everything ok”. Something like this:
Then, select the middle shape, go to the Operations Manager tab, and select Link Shape.
Specify All Management Packs, remove the flag from Show only commonly-used classes and find your 33% group. Link it to the shape.
We need to now apply a custom Data Graphic to this shape. We want to leave it “off” when the state of the 33% Group is healthy and switch it on otherwise. To do that, go to Data Graphics and Create a New Data Graphic.
Choose New Item.
In the Data field specify “Health State” and in the Displayed as field choose “Color by Value”. Next, compile the possible Health State and give a neutral color to all except the Warning and Error state. To these two, choose to color both of them as yellow.
To make it easier to recognize, rename the Data Graphic with something meaningful.
Repeat the process for the other shape, but for this one choose red as the color in the Data Graphics section.
All done. Now we have a graphical indicator of the number of objects – in our example, servers – unhealthy, that gets colored dynamically as the number increases. You can repeat the process to have as many indicators as needed.
Hope this helps!
Claudio Broglia | Dataplatform Consultant | Microsoft