Ensuring Cloud Service Continuity During Disasters
Posted on Thu, Nov 10, 2011 @ 12:32 PM
Is your security infrastructure prepared for the worst?
It’s a distressing feeling when you’re in the middle of an unfolding disaster, and suddenly it hits you—your plans and resources are being overwhelmed by the event. At this point, you’ll need to ask yourself: How and when will we re-establish services for customers and employees? What will this disaster cost us in terms of revenue, reputation, and recovery expenses? How can we avoid being in this position again?
As a SaaS provider of security management systems, our business is based on the availability of our services around the clock. Failure of a disaster backup or recovery plan has a direct, immediate impact on our ability to deliver the services our customers depend on. We are continuously evaluating and upgrading our contingency plans, but there is nothing like a real emergency to put those theoretical plans to the test.
Earthquakes and Hurricanes
In August, we were presented with the unusual opportunity to test our plans against two very different types of emergencies: the kind you can only anticipate and the ones you know are coming. The earthquake of August 23rd was an example of the former. Other such examples are technological failures, terrorist incidents, hacking events, fires, and loss of key personnel. These events occur without warning and test your processes as they exist. Different from an earthquake, Hurricane Irene falls into the second category where we have some advance warning, such as blizzards, tornadoes, work stoppages, planned technical work, or large disruptive public gatherings..
Both types of disasters can have significant, even disastrous, impacts on providers who have the responsibility to deliver their services around the clock. Let’s examine some factors we have found important in managing each type of event;
Unannounced events: These events are a demanding test of your plans, and advance planning is critical advance planning. Your contingency plan should consider the following;
-
Who is responsible for what aspects of ensuring continuity and recovery?
-
How will the team communicate internally and with customers?
-
What happens if normal communication paths are disrupted?
-
Who takes over if we lose access to key personnel?
-
What happens if you lose key assets expected to be available for recovery--people, facilities, communication methods, etc.?
Forecast events: These disasters should be simpler to manage since you have a reasonable amount of information ahead of time. In addition to the key elements in place for unannounced events you now can focus on additional elements such as;
-
Ensuring availability of key employees by staggering shifts and dispersing locations
-
Plan for accommodations, food, and communication in case of extended events
-
Check with partners on the status of their preparation
-
Communicate what is expected of your employees and customers to assist with the preparation and what communication to expect during and after the event
Getting Ready for Our Disaster Close-up
Both events this past August necessitated activation of our emergency response plans. For the earthquake, we were forced to evacuate our building and re-establish essential services from alternate locations. Our plan had anticipated this occurrence, so we had the infrastructure and procedures in place. Our biggest challenge was communication. Phones, cell phones, and some Internet providers were down, so access to multiple, redundant communications paths was essential.
Our data centers remained operational after the earthquake, but we prepared for a full switch-over to our disaster recovery center in case aftershocks affected the primary sites. Our data center personnel assisted the decision-making by immediate proactive communication on their operational status directly following the earthquake and until normal operations resumed.
As hurricane Irene barreled up the east coast, we were afforded the opportunity to coordinate back-up plans in advance of the event. Our operations team staged technical support and data center resources and prepared to switch operations to our data center across the country, if necessary. Fortunately, Irene passed over and our operations were not impacted at all.
These events provided valuable, live tests of our emergency preparedness and disaster recovery plans. Continuous upgrading and evaluation of these plans are essential in helping you avoid that queasy feeling that an unfolding disaster just got the better of you.
- John Szczygiel