Disaster Recovery Planning: Part 1

Having a Disaster Recovery Plan in place for your telecommunications systems is a very good thing. It is a good exercise to go through, to consider all the possible disaster scenarios and how you would respond. Today we have many new technologies to bring to bear on this question. VoIP, SIP Trunking, Hosted VoIP Services and IPPBXs all have the capability to deliver great disaster recovery services at reduced (or even zero) cost. But even if you have an older traditional PBX, you can take advantage of the new technology to create a cost-effective strategy.

NOTE: I originally planned this post as a single article. However, it got so long I decided to break it into two pieces. See Part 2 here.

Whoa, Trigger!

Keep in mind that having a plan in place is one thing, making the decision to ‘pull the trigger’ and activate the plan is quite another. Some of the things that we refer to as “Disasters” in telecommunications are really more like “Service Outages of Unknown Duration”. How long you should wait before switching to Plan B is perhaps something you should think about before the disaster happens.

The kinds of disasters you need to plan for can be divided into three broad categories. In order of increasing probability, they are:

Site Failure, where an entire site becomes unusable or inaccessible (Fire/Flood/Act of God)
Equipment Failure, where your PBX becomes completely unusable.
Service Failure, where your telecom services T1/PRI are down.

I want to talk about each of these scenarios individually, but readers of this blog will not be surprised to learn SIP trunking will play a central role here. I have talked before about the Disaster Recovery features of SIP trunks and the fact that it is robust, automatic and free.

Site Failure

While a complete site failure could be the most devastating scenario, it is also the least likely. It is also the most expensive to recover from, thus the most expensive to accommodate. Two options are to provide a Hot Standby Site or a Cold Standby System.

Hot Standby

A Hot Standby Site is a redundant facility, ready to go at a moments notice should a disaster occur. This option may make sense if you are in a critical industry (military) or providing critical services (medical). It may also make sense if you have a very high value per call or a high cost of outage, such as when you are providing a SLA (Service Level Agreement) for your call center. Hot Standby sites are expensive. Not only are you paying for the facility itself, utilities, furnishings, etc., but from a telecom standpoint you are paying for duplicate hardware, duplicate software (and subscriptions), duplicate licenses, maintenance, etc. Another large cost here can be duplicate trunking. In addition, the concept itself is cumbersome and risky. Will the standby system work when you need it? The only way to really know for sure is to test it periodically. Doing a monthly fail-over and restore to your standby system can be costly and disruptive.

Cold Standby System

A Cold Standby System involves having duplicate hardware offline. When a disaster occurs, the hardware must be deployed at some new location. The Cold Standby System is less expensive, but still has many of the same costs as a Hot Standby. Licenses, software and a system state backup will need to be loaded into the system prior to taking it live. This all takes time, making this impractical for certain situations. The issue of telecom services also has to be addressed in the Disaster Recovery Plan. When the main site goes offline, how do you provision trunks at the new site? Cold Standby also has many of the same risks and concerns as Hot Standby.

Re-Route

Today, there is a third option, one that is quite cost effective and makes sense for many businesses as prudent planning without breaking the bank. That option is to re-route calls to alternate destinations and plan for answering those calls off-site until regular service can be restored. Read Part 2 to learn more about your options.

Equipment Failure

Telecommunication systems, generally speaking, are engineered for a large MTBF (Mean Time Between Failures). And, with today’s modular system designs, even a major failure can often be addressed by simply swapping a module. That being said, catastrophic failures still do occur. For example, losing a hard disk drive can leave a system completely inoperable. Here, your choice of telecom vendor is key.

Get What You Paid For

If you bought your system from the lowest-cost, out-of-state vendor who shipped it to you and let you install it yourself, good luck! A local vendor who stocks spare parts and can get a technician on-site quickly will be worth their weight in gold at a time like this. Work with your vendor to establish contingency plans for a catastrophic failure. If your PBX has the capability to image the hard disk on a scheduled basis, be sure that is happening reliably and that you test restoring from the image. It is an essential part of your Disaster Recovery Plan to make a backup copy of the live image and keep that stored off-site.

Becoming Redundant

You could, of course, provide a Hot Standby System or Redundant Hardware. There are various levels of redundancy, of course. Some companies will opt for redundant power supplies, redundant common control or some other level that addresses their specific concerns. Similar to the Hot Standby Site decision, this option can get expensive and only makes sense for certain companies where service uptime is a mission critical factor. Many SMBs will find that the redundant hardware option is just not cost effective for them.

Do you have a Disaster Recovery Plan? Have you ever had to use it? Have you thought about what triggers would cause you to activate your plan? Share your thoughts in the comments.

Whoa, Trigger!

Site Failure

Hot Standby

Cold Standby System

Re-Route

Equipment Failure

Get What You Paid For

Becoming Redundant

Recent Posts

Recent Comments

Pin It on Pinterest

Share This