Reliant Technology - Recovery Times for Deduped Data

Data deduplication has been a mainstream technology in enterprise datacenters for over seven years. Because it’s been so widely adopted, IT decision makers sometimes treat dedupe as an afterthought.

But the way deduplication is implemented can have a major impact on the success of an organization’s overall backup strategy, especially when considering recovery time objectives.

Latency Issues WIth Deduplication

For all its benefits, from a backup retention and replication standpoint, deduplication does have drawbacks. Dedupe is both memory and CPU intensive, and algorithms take time to process before data is written to disk.

Some backup appliances try to solve this issue by throwing more CPU power at it. But going this route only increases the cost of disk-based backup and makes it less practical than traditional tape systems.

But the bigger latency culprit is the time required to recover data from a deduplicated copy. Restoring deduplicated data to its original format takes time, especially if the data recovery is large or consists of many separate files.

When a recovery operation takes place from a backup image, the data is usually unavailable anywhere else, and the affected application is inaccessible until the data can be fully recovered. So the data dehydration/rehydration process has a detrimental impact on Recovery Time Objectives (RTO).

Hybrid Solutions

One method for solving the latency issues is to implement a hybrid disk backup solution. With a hybrid backup solution, you have a platform with both non-deduplicated disk storage for fast recoveries, and a separate, deduplicated partition to store backup images for longer-term retention and offsite replication.

You could think of it as having both a recovery zone and a backup archive zone.The non-deduped storage partition only need to hold a week’s worth of backup images (one full backup and 4 nightly incrementals), since the majority of restore requests take place within 7 days of data creation.

This approach requires sacrificing some storage efficiency, but it also removes the dedupe performance penalty from the restore process. And since only a single week’s worth of data is maintained in the native area, this portion of the disk footprint can be kept small. The remaining disk backup capacity is then reserved for deduplicated backup images.

This small change in architecture can have a big impact on the speed that organizations respond to critical restore requests, especially larger ones like a full virtual machine recovery. In some cases, it may take only minutes to restore a large data set from non-deduplicated storage versus several hours to restore the same information from deduplicated disk.

Booting From Backup

There are other reasons to consider a backup solution with a non-deduplicated disk partition. Backup virtualization apps like Veeam and vRanger support booting virtual machines directly from the backup images on disk. This concept, known as “instant recovery,” allows admins to point users directly to VM backup images on disk, for rapid application recovery.

Instant recovery can be great for environments that lack high-availability clustering. But if VM OS backup images and application data are only stored in deduped format, it may be a non-starter, since the amount of time required to ‘rehydrate’ the data before it could be presented back to the VM on the backup platform would make it impractical.

So “in-place” recoveries on deduplicated disk might be unfeasible due to latency that’s introduced into the process. With blocks of data needing to be continuously un-deduplicated and re-deduplicated, the random I/O would be a huge drain on performance.

Performance can actually be so poor that some deduplication hardware vendors recommend installing stand-alone disk in front of their systems, just for purposes of instant recovery. This obviously means additional cost and management complexity, but having backup images immediately available in their native format allows virtual administrators to conduct fast recoveries or employ in-place recovery technology directly from the backup image.

FInal Take

Deduplication is widely adopted because it’s a proven, valuable technology. But organizations considering dedupe for their data center need to also consider its potential effects on recovery times.

Want to Know More?

To manage this latency issue, adding a disk staging landing area without deduplication and a separate zone for extended retention is a solution that combines the best features of native disk and deduplicated storage. As the World's #1 Reseller of Certified Pre-Owned Storage Hardware & Support, you can be sure that our team of experts can help. If you want to learn more about data backups and recovery, reach out to our storage and support specialists at 1.877.227.0828.