What to consider when using disaster recovery services

In a previous post, I discussed how disaster recovery planning can give a business a competitive edge by helping to identify risks and how best to ‘manage’ them. Two of the potential approaches; risk transference and risk mitigation can involve using the services of a Disaster Recovery (DR) service provider. In this post I write about some of the important requirements that you will need to consider when using such services.

Different types of DR services

There are a wide variety of DR services providers, in terms of their size, degree of specialisation and scope of services. This is in line with the varying needs of companies; some companies require a secure facility to store a backup tape and access to a single server to restore the backup in case of emergency within twenty four hours, alternatively other companies may require more immediate access to data and a range of servers to host a number of different applications.
Different DR services to manage risk
In the previous DR related post, I spoke about completing a risk assessment, including a business impact analysis which will provide you with a list of risks and a ranking for each risk according to the company’s potential exposure (business impact x probability of occurrence). Based on the threshold decided by the company, the management of those risks with the highest exposure can then be part of the offering from the DR services provider.

RPO and RTO help define your recovery needs

To help you decide the relative merits of different DR services providers and to help them propose an appropriate solution, you need to work with each of your business colleagues to decide on the appropriate recovery point and recovery time objectives for each of the businesses IT services that are to be covered by the DR plan such as email, access to shared folders, or the ERP or CRM systems. The criticality of each of the services to the business does have an understandable bearing on the RPO and RTO values;

  • Recovery point objective (RPO) defines what data is deemed acceptable to be lost in terms of a time value. For example, a RPO for a shared folder could be 60 minutes which means that to the business owners, if a problem should occur, it acceptable for any updates to documents in the shared folder that occurred inside the last hour to be lost. For critical IT services such as email and key transaction systems, RPO values can be as low as a matter of minutes.
  • Recovery time objective (RTO) defines the maximum amount of time that a business owner will allow for a service to be restored. One common RTO example relates to access to email, where a typical RTO could be a matter of minutes, in that access to email needs to be restored within minutes.
RPO and Replication

Traditionally, a tape backup would be used to backup data at the end of the working day and the tape would be stored off-site for security purposes. One weakness with this approach is that if a problem occurs at some point during the day, any data transacted on a system on that day would not have been backed up as the backup only occurs at the end of the day. Increasingly, for many businesses, a typical RPO is less than the length of the working day and can be of the order of a number of minutes.

To help support low RPO requirements, a replication technology is required which enables data to be replicated from one system to another system (off-site in this scenario) in real-time. Replication technology can be operated in a number of modes; continuously where data is being copied on a quasi real-time basis or at certain periods to correspond with the RPO requirements (i.e. if an RPO is 4 hours, then every 3 ½ to 4 hours).

Depending on the volume of data and the frequency of replication, a broadband or leased line solution will be required to ensure that the data is replicated successfully and quickly to your off-site storage facility. Replication solutions use software agents for different applications (e.g. Microsoft exchange, SAP, Oracle CRM, Sage etc) and for servers (e.g. File server, DNS server etc) to ensure that the data specific to that server is copied successfully.

There are a range of replication solutions of which I have listed four by way of example;

How to meet a strict RTO (i.e. get the systems up and running again)

Having the most up to date data available according to your RPO requirements is one consideration, but in a disaster, it’s also important that the systems are up and available as per the business requirements. Some replication technologies support the copying of a complete image of the server, including the operating system, application software and transaction data to be copied as part of t replication process.
Recovery time is important
This means that a fully functional application can be up and running relatively quickly on a virtual or physical server by applying the replicated image. With other replication solutions that don’t capture a full image, there would be a requirement for OS and application patches to be applied to the off-site servers or for up to date snapshot images to be available at the DR service providers site.

One other consideration when selecting the appropriate DR solution in relation to the meeting RTO requirements is to decide whether to use syndicated servers or dedicated servers at your DR service provider’s site. Using syndicated servers involves the service provider making a commitment that within an agreed time frame, servers would be made available for your IT team to prepare in the event of an emergency. As the name suggests, dedicated servers are available to your IT team to use on a 24/7 basis, but at a higher cost.

Disaster recovery is a large topic and in this post, I have covered some of the important items to consider when designing an appropriate solution for your business, having first listed and decided on an appropriate risk management approach.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Anti-Spam Protection by WP-SpamFree