All posts tagged Continuity

In April 2010, Mimecast released a report entitled “Keeping the Enterprise Agile and Mobile” in which we examined the growing pressure to keep BlackBerry services up and running at all times.

At the time, we thought the results were pretty interesting and events over the past few days have played them out pretty well.

Our report found that the expectations of BlackBerry users are extremely high – 66% of respondents claimed that as much as one hour of downtime per month is not acceptable and a further 22% saying NO downtime is acceptable at all! I can only imagine how these users feel about the last three days’ worth of interruptions…

With the reported impact on support desks and the board level fall out that BlackBerry outages seem to cause, we were, at the time, surprised by the high percentages of organizations that had no provisions for high availability (41%) in place at all. A further 59% said they couldn’t provide continuity for their users and 61% don’t have an internal BlackBerry availability SLA.

So with these numbers, the corporate world breathed a collective sigh of relief when RIM announced that the outages that they have been having are only affecting their BIS and BBM users… Well, they sighed until their corporate users started complaining about service unavailability.

Continue Reading →

Add your comment (0)

Enterprise Consultant
Mimecast

Exchange migrations tend to be complex.  Even smaller organizations running Small Business Server with less than 75 users, may take a week or more to plan, prepare and execute their email migration.

Any business that’s been through a migration at least once will remember that most of the migration effort was spent in planning. Otherwise they may remember the large mop-up operation and the time spent visiting desktops, recovering mail and rolling aspects of the migration backwards and forwards.

Data loss (what PSTs?), client upgrades and wrongly migrated data tend to come to mind when thinking about what can go wrong, as well as the mail server that crashed during the migration. During a migration a fair amount of change is introduced and additional processing is forced onto both the source and target Exchange platform. For an older platform at the limits of its lifespan or operational capacity, the extra overhead an email migration introduces may be the straw that breaks the camel’s back.

Cloud based email continuity may act as insurance in this regard by enabling client continuity and transactional continuity in case the migration wobbles or breaks. Let’s explore that in a bit more detail.

Migrations are heavily process driven. In order to migrate, a fair amount of surveying, planning, lab testing, etc need to be accomplished. It makes sense to use the desktop visit of the plan/survey component to introduce the agents required onto the desktops in order to make client continuity possible.

If an Exchange server in the source or the target organization were to fail during the migration, Outlook clients would be redirected to the cloud, with little or no disruption to service or – crucially – the user experience. This allows the outage to be addressed, mail flow and client mail service to be restored without the pressure of fighting two fires concurrently – ie, a broken environment and a broken migration.

Cloud based email continuity allows you to benefit from the scale of the cloud as a side effect of leveraging continuity in the cloud, provided of course your users have the  required network or internet connectivity to beat a path to the cloud.

In our day to day lives we’re generally quite comfortable accepting the argument of personal insurance, which guards us against any number of possible scenarios, such as breaking a leg while skiing, medical insurance, insurance against theft, and so on. All of these boil down to paying a small amount of money to a much larger entity and thereby being guaranteed the benefit of that entity’s scale and reach in the case of something unfortunate happening.

As the idea of cloud on demand becomes more pervasive, insuring your migration in the short term against loss of email continuity makes as much sense as taking out insurance on your car before you take it on the road.

Add your comment (0)

Enterprise Consultant
Mimecast

Last week I was reading Robin Gaddum’s post for Continuity Central describing how he communicates the concept of risk management in business continuity by applying Murphy’s Laws. Gaddum’s proposed three laws as follows:

If it can go wrong, it will go wrong
If it cannot possibly go wrong, it’ll still go wrong
In real life, puppies die… Get over it. (Or, disasters always have an impact)

Gaddum’s post can be summed up by saying, you must always conduct a risk assessment, invest in risk prevention, there is always residual risk, there is always an impact. He’s not wrong!

The thing about Murphy’s law; there is really only one. The adage usually goes “Anything that can go wrong, will go wrong,” there are also less polite ways of putting it, but us humans generally accept that ‘stuff’ does happen, and there is little we can do about it. Murphy’s law helps our brains rationalize and bring order to what is otherwise a wildly chaotic universe, to an extent we try to control that chaos, but not always or all ways.

Of course, I understand that Gaddum is looking for the best way to communicate the concepts of risk management in relation to business continuity, but I’m inclined to think that thinking about risk in relation to “stuff happens” only really achieves an in-depth risk analysis.

The Risk Assessment is a vital part of a Business Continuity Plan and should never be underestimated; all too often have I seen senior manager dismiss a risk because their preconceived ideas are still stuck in the “It’ll never happen to us” or “we’ll deal with it when it happens… until then” mentality. In this situation I always like to ask them how they would feel if the Captain and Co-pilot on their next commercial flight had the same attitude?

When it comes to business continuity, and aviation for that matter, there’s plenty that can go wrong; regardless of how well we prepare things still do go wrong, accidents still happen. More often than not when examining the contributing factors and cause of an incident, but after the fact, human error is identified as the most significant contribution. As they say, “aeroplanes don’t have accidents, pilots do.” As a result Human Factors makes up a significant part of the Aviation industry, where planning, assessing, designing, building & monitoring around the way ‘humans’ do things and behave is the key.

Gaddum makes a point to remind us of the importance of a business continuity plan, which he describes as:

“…our last ditch defense to enable recovery once that most improbable and unforeseen event has taken us out.”

But I find this quite alarmist, after-all how many BCP documents include an Emergency Action Plan for meteor strikes, or herds of marauding donkeys? Those are “most improbable” and certainly “unforeseen”. Why not think about this in terms of human failures instead – what are the most likely human failings that will cause your business suffer an outage?

Reliance on Murphy and his (or her) tendency to be right in hindsight will leave us worrying about those donkeys. Instead think about what your admins might get wrong when they’re overly tired, or when they have made multiple changes at once, will mean your BCP doc is much more relevant. It’ll also mean your BCP Planning Team have considered the individuals in your organizations and how their actions could affect your continuing business. Looking for a human cause and effect angle takes time but is well worth it in the long run, just ask a pilot.

This is a much more powerful place to be; better than staying awake at night wondering how high a donkey-proof fence needs to be.

Add your comment (0)

Google and Microsoft have recently been poking holes in each others’ uptime SLAs (Service Level Agreements.) The squabble has been summed up here by Paul Thurrot from Windows IT Pro.

In short Google claimed its Google Apps service had achieved 99.984% uptime in 2010 and, citing an independent report, went on to say this was 46 times more available than Microsoft’s Exchange Server. Microsoft retaliated by saying BPOS achieved 99.9%  (or better) uptime in 2010 and this was in line with their SLA. Microsoft quite rightly protested at Google’s definitions of uptime and what should or should not be included.

The discussion continues.

Uptime is one of those things included in your service provider’s SLA that you never really give much attention to, unless it’s alarmingly low: 90%, for example. Most Cloud, SaaS or hosted providers will give uptime SLA figures of between 99.9% (three nines) and 99.999% (five nines). Mimecast proudly offers a 100% uptime SLA.

All of these nines represent different levels of ‘guaranteed’  service availability. For example, one nine (90%) allows for 36.5 days of downtime per year. As I said, alarming. Two nines (99%) would give you 3.65 days of downtime per year, three nines (99.9%) 8.76 hours, four nines (99.99%) 52.56 minutes and five nines (99.999%) 5.26 minutes per year. Lastly six nines, which is largely academic, gives a mere 31.5 seconds.

What does all of this mean to you as a consumer of  these services?  In terms of actual service, very little, unless you happen to be in the minority percentage; that is to say everything has gone dark and quiet and you’re suffering a service outage.

What is much more important is how the vendor treats you in the event they don’t achieve 100%. It is hard for any vendor to absolutely guarantee 100% uptime all of the time, so you must make sure there is a provision for service credits or financial compensation in the event of an outage. If not, the SLA is worthless. Any reputable SaaS or Cloud vendor will have absolute confidence in their infrastructure, so based on historical performance a 100% availability SLA will be justifiable. Mimecast offers 100% precisely for this reason.  We have spent a large amount of R&D time on getting the infrastructure right so it can be used to back up our SLA, and as a result we win many customers from vendors whose SLAs have flattered to deceive.

Standards?

A larger issue perhaps we ought to consider is highlighted by the arrows Google is flinging in Microsoft’s direction: namely, how do vendors really define uptime? What sort of event do they class as an outage? Does the event have to occur for any length of time to qualify? Is planned downtime included in the calculation? And so on.

There is no standard with which uptime  is defined and common sense isn’t always applied either. In other markets, consumers are reasonably protected from spurious vendor claims by independent third parties like Consumer Reports or Which. Not so with the claims tech companies make regarding the effectiveness of their solutions, and the result is a great deal of spin, which in turn inevitably leads to misinterpretation and confusion.

Fortunately, we’re not the only ones to see the need for standards here.  Although it’s early days still, you can get an overview of ongoing current efforts at cloud-standards.org.

Google and Microsoft’s argument is based largely on differences in measurement rather than any meaningful level of service. In a highly competitive market, any small differentiation can be a perceived bonus (by the vendor) but if we’re all using different tape measures to mark our lines, the only reliable way tell who comes out on top is to talk to the long-term customers.

Add your comment (2)

A recent survey by CDW reminded me of how I used to feel about continuity when I ran my businesses.

I thought problems wouldn’t happen, and that the cost of preventing them would outweigh the cost of dealing with them. It turns out that I wasn’t alone in my approach:

In a poll of 200 IT managers at medium and large businesses who had experienced network disruptions in the past year, 82 percent said that prior to the outage they’d felt confident their IT infrastructure could handle disruptions and support users effectively. Despite their optimism, nearly all—97 percent—admitted the network disruptions had a detrimental impact on business in the last year.

I genuinely thought it wouldn’t happen to me. Until I lost the disks from our Exchange server and the recent backups were found to be corrupt. We suffered two days of downtime. That was four years ago- I’ve still got one of the disk platters as a reminder.

But one of the frustrating aspects of actually working in the continuity business is that people are over confident about how resilient their network is to problems, despite the evidence otherwise.

How do you explain to people that stuff happens out of their control and to be prepared for the worst without spreading FUD?

“The survey confirms that while many businesses believe they are prepared for an unplanned network disruption, many are not,” said Norm Lillis, CDW vice president of system solutions.

A broader survey of 7,099 CDW customers revealed a little more than a quarter of the companies experienced a significant network disruption of at least 4 hours within the last year.

What surprises me is how often I go into continuity mode using Mimecast. In the past I would have assumed there were no emails, but for some reason Exchange has gone to sleep and forgotten to deliver me emails, whereas now they come through on continuity mode. And it’s not as if it’s an overly stretched Exchange box either. It just is what it is. I find continuity is particularly useful on the train too- as the continuity mode is considerably more efficient at maintaining a connection than Outlook natively to Exchange.

Maybe I’m just biased. But I’ve got the disk platter handy to remind me of those two awful days of downtime.

I know from my perspective that once you’ve got it, you’ll never go back.

Add your comment (0)

Cloud Strategist
Mimecast

Article Tags

,

Mike Vizard mentioned something on his blog a few weeks ago, that I thought had been missed by many. I had been thinking about a series of posts for this blog under the umbrella of email continuity and was putting together a list of common outages businesses have to deal with; here in the US, for Gulf States in particular, the hurricane is the biggie.

Vizard, like me, had spotted that NOAA are predicting an “above-normal hurricane season” – but he does go on to warn that;

The predictions are rarely on target, but the havoc wrought by Hurricanes Katrina and Andrew prompts people to take the issue seriously.

Which is quite true. Of course the Weather is an archetypal example of Chaos Theory at work, and that makes predicting its patterns and movement almost impossible; but what we do know is that if Danielle, Earl, Fiona or Gaston make landfall this year everything in their path will be subject to a new type of Chaos.

For many it’s a case of board up and move out. Everything grinds to a halt for a few days until the threat has passed. If you’re running a business this is not good, but you may have already thought about a way to keep your essential services like email up and running. I know of IT managers who simply turn off their Exchange Servers, unplug them and drive them away – and that works, but leaves your users and customers with nothing.

And this is where a cloud based email continuity service would step in. Vizard’s points out that advances in cloud computing can help you mitigate the impact of any disaster, not just a hurricane. Vizard;

The key thing to remember is that servers in the cloud are usually thousands of miles away from the actual disaster, and as long as you can provide people with access to them, you can be back in business…

Admittedly if I were facing down a large enough threat, I would be telling my users to collect their things and go, and it’s likely all my local services such email, Internet and power would be unusable anyway. But relying on a continuity solution based, as Vizard points out, thousands of miles away means that once we’re safely inland we can get back on the air.

And that’s the important part, getting back on the air! Telling my customers we’re still in business and we’re still able to respond to, them regardless of the situation outside, means I don’t loose business or worse, simply vanish.

Keeping a weather eye out is always a challenge, but the last thing you need to do is vanish.

Add your comment (0)

Last week I was reading this article on Cloud Recovery, by our fellow cloud vendor Geminare’s CEO, Joshua Geist.

The thrust of Joshua’s excellent writing is about the concept of Cloud Recovery, or as some are calling it; RaaS, Recovery as a Service. As Joshua quite rightly points out, the cloud makes a perfect platform from which to launch disaster recover and business continuity efforts.

But, should the discussion be limited to just ‘Recovery’? I think there are many more important aspects to consider, not least continuity or the ability for your users to ‘continue’ working during the outage. Recovery, in my mind, picks up the pieces afterwards. Continuity is king!

Enter the Cloud

The article makes a solid argument for using the Cloud to the best  of its advantage, especially for services like backup and recovery. The market seems to agree – when presented with reduced cost and complexity by moving to the cloud it’s hard not to be in the headlines. As CRN reported this week from the Nth Generation Technical Symposium, some 23% of the attendees claimed to be using cloud services for DR and BCP.

And this is not just limited to businesses either. Many of today’s RaaS models  grew up on lessons learned from consumers, who have had access to cloud based backup and recovery services for some time, the latest being directed at their social networking persona’s by the likes of the excellent, Boston based, Backupify.

What’s missing?

I can’t help but think that Cloud Backup, RaaS or just plain old Backup and Recovery is missing something, and that just limiting the discussion to “Recovery” is, …well… limiting.

Geist touches on this slightly at the end of the article…

Imagine, businesses that operate in high-risk areas such as hurricane alley engaging a Cloud Recovery provider minutes after notification of impending risks, deploying a high availability solution on the fly, and unplugging their servers the same day that a warning is issued.

.. the use of the phrase “deploying a high availability solution” is what I’m after, that’s where the real value is added. The ability for your users to continue working whilst the outage and subsequent recovery is occurring.

A solid recovery strategy based on RaaS is going to make life a lot easier than relying on a dubious tape based strategy, but the organization is still subject to an RPO & RTO for each service or infrastructure.

Enter CaaS Keep reading…

Add your comment (4)

CISSP, CCSK
Mimecast

Article Tags

, ,

RIM & their Blackberry handsets (other mobile devices are available) are having a hard time of it in some Middle-Eastern countries at the moment. As this blog and other outlets have reported, the encryption used to protect data in transit isn’t agreeable to the Governments of those States.

Whilst this is alarming news for those users that might be affected by the controls, the rest of us won’t have to worry so much about the sudden loss of functionality on our handhelds. The regulation wrapped around encryption is more lenient here in the US and most of Europe. Why then do I still have a twinge of worry in the back of my mind?

What this saga does highlight, is our reliance on a solution or service that can act as a single point of failure in its own right. Access to mobile email is ubiquitous these days, regardless of your choice of mobile device – the rise of smart-phones has allowed everyone to access everything from anywhere. And when “anywhere and everything,” don’t line up with access, boy do we know about it.

As someone who has been on the end of irate phone calls from a C-level manager, about lack of access to mobile email I understand that balancing availability is a bit like stacking slices of Swiss Cheese. Line up all the holes and all is right with the world; but move a single slice out of place and access stops. Of course those slices can represent many things, from device to service provider to my own email infrastructure.

Mobile email is quick and easy to deploy, and unbelievable useful, but how many of us consider the wider impact of that service grinding to a halt? I don’t want to pick on RIM or Blackberry, but they have had some well publicized network outages; other devices suffer too, perhaps not in the same way because they are more reliant on our local network being up and running.

Many of the BES and Blackberry solutions I see were first installed sometime ago, probably when the  early 5000 or 6000 series devices were introduced. Remember those, with the integrated phone you could only use with a headset?  Back then we only installed a single BES server, today we’re looking at clustering them for full resilience. But even then we’re still limiting the extent to which we can continue to provide service.

I know complete site outages are rare, but loss of service to a device isn’t so. Hand on heart, is a simple on-site cluster the best we can do? Have we sat down and examined every small part of that service to make sure we can cope in the event of an outage? If I sent you my Swiss Cheese model, would all of your slices line up?

Add your comment (1)

Townsend uses a cloud-based approach for its email environment to lower email storage and to ensure email business continuity.  Their email is now archived off site automatically and their employees have continuous access to email even during power or network outages.

Add your comment (0)