Operational Maturity part two – Belts and braces

Following on from the previous post from our guest blogger Nic Blank on operational maturity and storage, I’d like to call out another area in operational maturity – Transporting and Routing Mail.

Exchange 2010′s transport High Availability model is rather simple – add more HUB servers, and they become redundant, shadow transport ensures mail delivery and so on.

However, this requires some planning. Shadow transport is a great feature; it really is. It allows the failure of a Hub Transport server without the loss of messages in transit. However; two caveats come to mind:

  1. You need more than one HUB Transport server in a given Active Directory site in order for Hub Transport servers to load balance mail traffic
  2. You need an Exchange Server on either side of the Hub Transport server – either an edge or another Hub transport server on either side OR a mail server as the originator or destination.

So where’s the issue? Nic defined Operational Maturity as the absence or presence of the technology and processes required in order to absorb and mitigate a failure in an acceptable time frame (normally the SLA).

He also made the point that Exchange is really good at absorbing failure if it’s built to do so. Hub Transport server supporting Shadow Transport is one of those features. In simple terms, if a failure of a Hub Transport server is detected, the messages which that Hub Transport server was responsible for are going to be re-assigned by the previous hop or message originator to another hub transport server in the same site.

If operational maturity is low, i.e. if an organization didn’t have sufficient reporting and remedial measures in place to determine that failures had occurred, under extreme circumstances, the Hub Transport server would keep on failing until mail flow fails altogether.

We can mitigate this in part by introducing message routing intelligence on the outside of the organization. However we need to make sure that are not exacerbating the problem by simply relieving a pain point and not addressing the cause.

Moving on, where do we start with operation maturity? We don’t have to start with a full blown SCOM implementation to ensure mail is flowing between two points. If you have nothing at all start with:

  • Simple mail testing tools that send mail between two mailboxes at nominated points in the org.
  • Free tools and or Scripts capable of pinging servers at regular intervals, checking for service availability, etc
  • More tools or scripts to monitor for disk availability and disk free space usage

Often starting somewhere with basic belts and braces is better than not doing anything, but it is quite critical to combine simple tools and simple processes into simple standards and thereby raise the operational maturity from zero.

Anything greater than zero is a short term win and will leave your organization with something to work on and improve.


Cheap discs require Operational maturity

Guest Blogger Nicolas Blank is an Exchange MVP and Microsoft Infrastructure Architect specializing in Exchange, Active Directory, architecture, systems management, migration and scripting. Nicolas spends what spare time he has writing, blogging and talking about Exchange and associated technologies. His blog can be found at:


In the past we’ve blogged about the use of SATA disks; one post was even cheekily entitled “Give your SAN to the SQL team“. That’s a great idea but how do we turn storage features into operational reality?

There’s no denying that Exchange 2010 offers more storage flexibility than any of its predecessors. Exchange can now deploy on Big, Fast and Expensive storage (AKA SANs), Disk Shelves, comprising RAID SAS, near line SAS or even JBOD Enterprise SATA disks.

Let us look at the last one of these, JBOD, or the use of individual disks to store Exchange databases.

In this configuration, an exchange sizing calculation determines the number of IOPS required per server – depending on user activity, divides it roughly into the known IOPS capability of the disk specified, and returns the number of disks required. Each disk then holds a unique database, which is then mirrored over the network to similar servers, with similar disks.

The first hurdle is simply the number of disks. When producing a Highly Available (HA) configuration, each database is mirrored to two or more servers. Medium to large configurations will require many more than 20 disks per server, which means: using mount points to compensate for the lack of available drives.

The next hurdle is standardizing the servers required in order to have identical enough storage and configurations across all servers, in order to create the HA configuration in the first place.

The next consideration is critical. . Assume your build processes are standardised, your servers, storage and HA configurations are identical and as clean as a whistle; How do you know when you’ve had a failure?

Exchange 2010 – assuming it’s been designed and built to do so – is really good at absorbing failures – disk, database, transport, client access, you name it. The downside is – how do you know if something’s failed OR you’re down to your last good copy after the other two or three databases failed?

The answer involves many co-dependent factors – monitoring software, monitoring personnel and operational procedure. There’s no point in deploying SCOM which predicts disk failure, alerts on the disk failure, reports on the reduced SLA, if no-one is consuming the data and actioning appropriately.

For the purposes of this article then, we can define Operational Maturity as the absence or presence of the technology and processes required in order to absorb and mitigate a failure in an acceptable time frame (normally the SLA).

Don’t think that this pain is a JBOD pain only. JBOD lessens the storage cost, but reduces SOME of the storage management which the SAN team may absorb and the Exchange administrator may be insulated from.

Irrespective of the storage model used, the combination of monitoring, processes, plans, documentation and activities define the operational maturity of an IT organisation. Due to the criticality of mail, Exchange has massive visibility in the average business, especially during an outage.

While cheap storage is a valid option for Exchange, it may not be a great option for IT shops that are relatively new to the concepts mentioned in this post. Just from a storage point of view, a more traditional RAID storage shelf with lower IOPS may be a better consideration.


Just Enough On-Site: IT Strategy for the New Information Age

We’re on the verge of a New Information Age. The old one has been around for thirty years or more, and it’s legacy is not all that wonderful. There’s been an explosion in the volume of data produced, sent and stored on servers, desktops and laptops around the world. Companies have tried to manage by keeping pace, adding servers, amassing file stores and updating PCs every few months.

Email, not surprisingly, has been at the heart of this digital big bang, with 97% of written business communication based on email, and some 84% of corporate IP being held in email systems. For IT Directors and managers of corporate email systems, then, the Information Age has resulted in a complex and costly IT infrastructure, accompanied by huge levels of risk, given the critical value of the information held in these systems.