Backing up the House: Why Backup isn’t Archive

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.