An Intelligent Approach to Managing Risk and Liability of Legacy Backup Data

How eliminating tape as a long-term archive mitigates legacy data risks

The most significant expense and pain associated with stockpiling legacy data on backup tape or disk archives is the risk of the unknown. Backup images are archives of all user data, including email, text documents and PDFs from the CEO, to contracts from legal and manufacturing, to documents from research and discovery.

When these highly-sensitive records are not managed properly – archived, encrypted or even purged – they could be requested to support litigation and regulatory compliance. These potential “smoking guns” could cost your organization millions in fines along with even more in embarrassment and loss of public trust.

Managing legacy business records properly will allow mitigation of risk and control of future potential expense.
In the past, data on old backup tapes has been difficult to access. As tapes age they become more and more inaccessible and if you need to know what is on these tapes, or restore files and emails in support legal and compliance requirements, it can be extremely complex and expensive.

Index Engines provides an intelligent method of managing and restoring content from legacy backup tapes. By eliminating the need for the backup software that created the tape, Index Engines can ingest legacy backup catalogs and provide reports and analysis of the contents in order to determine a disposition strategy.

Additionally, Index Engines can scan tapes and provide deep intelligence, including content metadata, so individual files and mailboxes can be extracted without the original backup software. Leveraging this new approach towards backup data access, Index Engines eliminates traditional tape restoration and intelligently manages legacy data using a more cost-effective approach.

With intelligent knowledge and access to legacy backup data there is no need to maintain non-production backup software. Additionally, tape and disk archives can be analyzed and a disposition strategy can be defined that secures sensitive data and eliminates what no longer has value to the business.

As a result, the number of data center inefficiencies can be reduced and wasted costs can be recovered and reallocated to other critical initiatives.

The process of determining what legacy backup content has value and what is redundant, outdated and trivial is not as complex as one may think. If you can develop a policy of what should be preserved, which requires input from the legal and records management team, this would result in restoring less than 1% of the legacy backup content.

Even if you haven’t determined the policy of what to keep, a single instance of legacy tape data can be restored to a disk based archive and then retention policies can be defined. The right scenario is dependent on how sound your data retention policy is.

Some organizations can be very specific as to what should be preserved (based on content type, owner, and date range), others may not have a detailed policy or a “save everything” policy. Either way a solution exists to migrate and secure data of value online and eliminate the use of tape as a long-term archive.

Developing a Data Disposition Strategy: A Case Study in User Shares

Join us for a 30-minute Tech Break
Jan. 29 at 1 pm ET/10 am PT

One international financial services company discovered that 34% of their data was duplicate, 68% of the data had not been touched in over 3 years and they had nearly 17,200 individual files containing PII on their main network.

While most organizations share similar network profiles, what this company did next reduced their costs and mitigated untold millions in risk.

Leveraging metadata and the latest technology, discover how they developed a data disposition strategy and reclaimed 54% of their capacity and implemented a value-based archive.

Join us for a 30-minute tech break Thursday, Jan. 29 at 1 pm ET/10 am PT to see how they did it and what you can learn from their user data mistakes.
Register now

Index Engines Backup Migration Solutions Now Available through EMC

As a Select Partner in the EMC Business Partner Program for Technology Connect Partners, Index Engines today announced its Catalyst platform is now available through EMC and its channel partners.

Catalyst enables EMC clients to seamlessly transition to EMC®’s Data Protection Suite featuring NetWorker® and Avamar® solutions while still maintaining access to their legacy backup data without the need for the original software.

Catalyst offers a range of solutions including catalog management, data restoration without the need of original backup software, and single instance migration of tape data to disk to support retention requirements. Intelligent management and access to legacy backup data supports information governance and compliance requirements, while controlling data center costs.

“We’re thrilled with our enhanced partnership with EMC and the value we’ll jointly be able to deliver to customers,” Index Engines Vice President Jim McGann said. “Companies are no longer forced to maintain non-production backup environments out of fear of a compliance or eDiscovery requests. They can now maintain easy, searchable access to backup data while transitioning to state-of-the-art EMC technology.”

Catalyst enables clients to deploy EMC’s Data Protection Suite, retire their existing backup vendor and still maintain access to legacy data without the need for the original backup software. The Catalog Engine searches, manages and generates reports on legacy data with the Catalyst Tape Indexes utilized to deeply index the content so it can be searched and restored.

Catalyst also delivers comprehensive metadata search and migration tools in order to restore a single instance of tape data to disk without the need for the original backup software. This allows clients with risks and costs associated with utilizing tape as a long-term archive to efficiently migrate this sensitive data to EMC disk for long-term preservation and management. Once this migration is complete, the legacy tapes can be remediated recouping offsite tape storage expenses.

“EMC is pleased that Index Engines has joined EMC Technology Connect as a Select Partner, demonstrating its commitment to excellence in technology innovation for data protection and migration,” said Don Lamburn, Director, EMC Technology Connect. “We look forward to working with Index Engines to ensure that our mutual customers have the highest level of support possible for their information infrastructure initiatives.”

10 Mission-Critical Unstructured Data Projects To Control Costs and Streamline Operations in 2015

Everyone’s talking about unstructured data lately – the cost, the risk, the massive growth – but little is being done to manage it.
Analyst group IDC estimates unstructured data growth at 40-60 percent per year, a statistic that is not only startling, but puts a great deal of emphasis on the need to start managing it today or at least have it on the schedule for 2015.

With budgets tightening – often to pay for storage costs – data center managers are struggling to find the highest impact projects that will see an immediate ROI. While there’s no one project that will reclaim all of the unstructured data rotting away in the data center, there are 10 crucial projects that will help streamline and control costs in the data center.

1. Clean up abandoned data and reclaim capacity: When employees leave the organization, their files and email languish on networks and servers. With the owner no longer available to manage and maintain the content it remains abandoned and clogs up corporate servers. Data centers must manage this abandoned data to avoid losing any valuable content and to reclaim capacity.

2. Migrate aged data to cheaper storage tiers: As data ages on the network it can become less valuable. Storing data that has not been accessed in three years or longer is a waste of budget. Migrate this data to less expensive storage platforms. Aged data can represent between 40% of current server capacity.

3. Defensively remediate legacy backup tapes and recoup offsite storage expenses: Old backup tapes that have piled up in offsite storage are a big line item on your annual budget. Using unstructured data profiling technology these tapes can be scanned, without the need of the original backup software, and a metadata index of the contents generated. Using this metadata profile relevant content can be extracted and archived and the tapes can be defensibly remediated, reclaiming offsite storage expenses.

4. Purge redundant and outdated files and free-up storage: Network servers can easily be comprised of 35 – 45% duplicate content. This content builds over time and results in wasted storage capacity. Once duplicates are identified a policy can be implemented to purge what is no longer required such as redundant files that have not been accessed in over three years, or those owned by ex-employees.

5. Profile and move data to the cloud: Many data centers have cloud initiatives where aged and less useful business data is migrated to more cost effective hosted storage. Finding the data and on-ramping it to the cloud however is a challenge of you lack understanding of your data: who owns it, when it was last accessed, types of files, etc.

6. Audit and remove personal multimedia content (ie. music, video) from user shares: User shares become a repository not only aged and abandoned files, but personal music, photo and video content that have no value to the business and in fact may be a liability. Once this data is classified reports can be generated showing the top 50 owners of this content, total capacity and location. This information can be used to set and enforce quotas and work with the data owners to clean up the content and reclaim capacity.

7. Archive sensitive content and support eDiscovery more cost effectively: Legal and compliance requests for user files and email can be disruptive and time consuming. Finding the relevant content and extracting it in a defensible manner is the key challenge. Streamlining access to critical data so you can respond to legal requests quicker, not only lessons their time burden but saves you time and money during location efforts.

8. Audit and secure PII to control risk: Users don’t always abide by corporate data policies. Sharing sensitive information containing client social security and credit card numbers, such as tax forms, credit reports and application, can easily happen. Find this information, audit email and servers, and take the appropriate action to ensure client data is secure. Some content may need to be relocated and moved to an archive, encrypted or even purged from the network. Managing PII ensures compliance with corporate policies and controls liability associated with sensitive data.

9. Manage and control liability hidden in PSTs: Email contains sensitive corporate data including communications of agreements, contracts, private business discussions and more. Many firms have email archives in place to monitor and protect this data, however, users can easily create their own mini-archive or PST of the content that is not managed by corporate. PSTs have caused great pain when involved in litigation as email that was thought to be no longer in existence suddenly appears in a hidden PST.

10. Implement accurate charge-backs based on metadata profiles and Active Directory ownership: Chargebacks will allow data center to accurately recoup storage expenses and work with the departments to develop a more meaningful data policy including purging of what they no longer require.

There are a number of ways companies can approach these projects, but to maximize impact a number of file-level metadata tools, sometimes referred to unstructured data profiling, exist.

Through the file-level information date, owner, location, file type, number of copies and last accessed information can be determined, which will help data center managers classify data and put disposition policies in place.

The benefits of managing unstructured data include reduced risk, capacity and reclaimed data center budget. With finances already tight and data growing rapidly, don’t leave these projects off the schedule in 2015

Legacy Backup Catalog and Data Management

Catalyst from Index Engines unlocks content in IBM and Symantec backup formats (with Commvault support coming in 1H2015) allowing access and management of the data without the need for the original software.

Clients that currently use IBM’s TSM or Symantec’s NBU now have a cost effective and intelligent migration strategy to best of breed backup platforms. If a non-production TSM/NBU instance, resulting from a merger or acquisition is maintained in order to provide access to legacy tape data, these environments can be retired and replaced with a single solution that provides simplified access to the data going forward.

Additionally, clients who have large volumes of legacy backup tapes in offsite storage vaults (Iron Mountain, Recall, etc.), containing data required to support legal and compliance, can now intelligently restore this data and extract the records that are relevant to disk for simplified access and management. This is typically a small portion of what is contained on tape, which differentiates Catalyst from a migration tool where all data is moved from tape.

[embedplusvideo height=”500″ width=”640″ editlink=”” standard=”″ vars=”ytid=szXFKecHLQg&width=640&height=500&start=&stop=&rs=w&hd=0&autoplay=0&react=0&chapters=&notes=” id=”ep9656″ /]

Index Engines Offers Complimentary 1 TB Software Licenses to Shed Light on User Data Growth

Index Engines has announced it now offers a complimentary 1 TB licenses of its Catalyst product so organizations can get a better grasp on their unstructured user data before they start budgeting for 2015.

This VMware-based plug and play download gives organizations 30 days to perform a metadata and full-text profile on the LAN file data of their choice, giving them information on last accessed time, owner, created date, number of duplicates, file type and more while performing PII pattern searches for credit card and Social Security numbers.

“Most storage executives don’t know what they have, if it has value, if it poses a risk or liability, if it is a security violation or if it’s employee vacation photos and music libraries,” Index Engines VP Jim McGann said. “Catalyst will give them a peek into their data, understand growth sources, and develop intelligent disposition strategies in order to control costs and liability hidden in user content.”

Index Engines’ Catalyst software is designed to deliver a file-centric view of the data center. Catalyst processes all forms of unstructured files and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and, optionally, what key terms are in it.

Leveraging the rich metadata or full text index in conjunction with Active Directory integration, content can be profiled and analyzed with a single click. High-level summary reports allow instant insight into enterprise storage providing unprecedented knowledge of data assets.

The limited-use license features the data profiling features of Catalyst, but does not include the automated disposition and archiving capabilities of the full version of the enterprise-ready software.

The no-cost Catalyst licenses are currently available to enterprise data centers on Index Engines’ website at

The Cost of Abandoned Data

The average corporate turnover rate for employees is 15.1 percent across all industries, with some specific verticals experiencing as high as 30%. For an organization with 10,000 employees this can account for 1,500 to 3,000 people annually (Compensation Force: 2013 Turnover Rates by Industry).

When an employee leaves an organization the IT department will typically wipe or recycle their hard drive, containing their digital files and email, however, they neglect to clean and manage former employees’ data on corporate networks and servers.

For this scenario, a company of 10,000 looking at the conservative annual turnover of 1,500 employees, this could account for easily 60 TB of data that is abandoned in the data center each year. Over 10 years this explodes to beyond half a petabyte.

Abandoned data is unstructured files, email and other data owned by ex-employees that languishes on networks and servers. Gartner estimates that the 2013 average Annual Storage Cost per Raw TB of capacity is $3,212 (Gartner: IT Key Metrics Data 2014: Key Infrastructure Measures: Storage Analysis: Current Year, Dec. 2013). This can account for millions of wasted expenses each year.

Abandoned data consists of old working documents that have long outlived their business value: revisions of letters, old spreadsheets, presentations and aged email. However, a small percentage of this content can easily contain sensitive files and email. It is this small percentage of contracts, confidential email exchanges, client records and other similar documents, which adds a level of risk and liability for the corporation.

The bulk of the data is typically what is known as redundant, outdated and trivial content – or ROT – that is simply taking up space and resulting in unnecessary management and data center costs.

The following are factors you will need to take into account in order to understand the cost impact of abandoned data:

Risk and Liability

The number one expense associated with abandoned data is the legal exposure created by not managing abandoned user data. The risk and liability inherent in sensitive data including client records, personally identifiable information (PII), or records required for eDiscovery or compliance can cost a company millions along with unwanted negative press and exposure.

Managing sensitive records is always a challenge; however, managing this content when the owner of the data is no longer an employee and no one knows it exists is an even more complex challenge. Think of the CEOs former admin creating a PST archive of their email and storing it on some obscure server. It is difficult to put a value on this exposure, but it is something that should be keeping your legal and compliance teams up at night.

Storage Costs

In the example above 60 TB of abandoned data can exist on corporate servers each year for a company of 10,000 employees. At the same time this data is cluttering the data center, organizations are increasing their storage capacity at a rate of 40-60 percent annually. Reclaiming this capacity and cleaning up abandoned data, most of it can disappear tomorrow and no one would miss it, is equivalent to getting free storage capacity. Since most IT budgets are decreasing, this is an easy approach towards making every dollar count.

Backup and Disaster Recover

One of the hidden costs of not managing and controlling abandoned data is in corporate disaster recovery costs. The cost and resources required to ensure all data is backed up and protected is one of the more expensive line items on an IT budget.

Compressed backup windows, offsite storage costs and management of backup content all contribute to ever-growing data center resources. With abandoned data accounting for tens, even hundreds of terabytes, it has become a significant component to the expenses associated with disaster recovery. Assuming a conservative 15 percent of data that is backup up no longer has any business value annually and should be moved offline or even remediated, this can easily reduce disaster recovery costs and expenses by up to 50 percent on a server over five years old.

Management Costs

Data is constantly migrated to new platforms or consolidated in order to streamline operations. Migrating and consolidating data is a constant and painful operation. It becomes even more painful when you know that much of the data no longer has value. If 30-50 percent of the data from a five-year-old storage platform is migrated to a new storage platform, or even the cloud, is owned by ex-employees, much of this effort is wasted.

Beyond a migration of data, day-to-day management of servers is a key task in any corporate data center. Reducing the volume of data under management will have a lasting impact on budgets and resources required to support the explosive growth of unstructured user data.

Untapped Knowledge

When a knowledge worker leaves the organization and their content converts from active to abandoned data it instantly “disappears” into the network. Since no one owns this content, even content that has long-term value to the organization, it can no longer be exposed and leveraged by existing employees. Research data, competitive analysis and historical reports all get lost and can no longer provide value to the organization.

The cost of not leveraging existing corporate knowledge can be significant. In today’s competitive market staying one step ahead is critical to maintaining and gaining market share. Arming your knowledge workers with all the data they need, including value added content generated by ex-employees, will help maintain leadership in the market.

Data Profiling

Data profiling, also known as file analysis, uncovers abandoned data so it can be managed. Understanding what abandoned data exists is the first step in defining a data policy that can reclaim wasted expense and control long-term risk and liability of this unknown and unmanaged content.

In “Market Guide for File Analysis Software”, published September 23, 2014, Gartner recommends profiling data to gain a better understanding of the unstructured data environment and ROT including abandoned data, stating:

“Data visualization maps created by file analysis can be presented to other parts of the organization and be used to better identify the value and risk of the data, enabling IT, line of business, compliance, etc., to make more-informed decisions regarding classification, information governance, storage management and content migration. Once known, redundant, outdated and trivial data can be defensibly deleted, and retention policies can be applied to other data.”

Data profiling works by processing all forms of unstructured files and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and, optionally, what key terms are in it.

High-level summary reports allow instant insight into enterprise storage providing never-before knowledge of data assets. Through this process, mystery data can be managed and classified, including content that has outlived its business value or that which is owned by ex-employees and is now abandoned on the network.

This simple and analyst-recommended process helps organizations reclaim up to 40% of active data capacity and mitigates legal and compliance risks associated with unmanaged data.

Becoming Litigation Ready with Index Engines

Organizations have hoarded massive volumes of unstructured user files and email over decades. This unmanaged data is currently clustered away in enterprise servers, network computers, user shares, SharePoint, email databases, and even legacy backup tapes, with much of it consisting of mystery content. Within this content there is a significant volume of data with no business value that can be purged, as well as sensitive content that contains intellectual property or liabilities in future lawsuits.

Without detailed knowledge of this user content, organizations will continue to spend significant money and time managing and hoarding unknown data, stockpiled on massive servers, which only increases eDiscovery costs as well as future risk and liability.

Litigation Readiness® means actively managing data as well as the growing costs of storage and eDiscovery according to policy. Legal professionals are discovering the best way to battle this data growth is to leverage technology.

Using Index Engines’ Catalyst platform, organizations gain a high-level view into corporate user data including files and email. Only with this knowledge, can they easily manage corporate assets and identify and take action on responsive or sensitive data when necessary.

Discover more about litigation readiness here:

Meet us at ILTA

Are you attending ILTA’s 37th Annual Educational Conference?

We are, and we’d love to see you there! Index Engines has some exciting new product enhancements and a pricing model built for partners to meet and exceed their clients evolving needs, including:

– Producing data from tapes in response to a legal event or court orders.
– Creating a repository of legal hold data that is easily accessible, forensically defensible and cost effective.
– Producing information about their data for assessments in a manageable platform.
– Determining what, if any, legal liability may reside in the data contained in their infrastructure.

If you’re attending ILTA and would like to learn more about our technology or discuss opportunities to work together, please contact

Also, if you’re not attending, but would still like to talk, contact:

Michelle King
Business Development Manager
Index Engines

Email Lives Forever, Except When it’s Gone: An information technology take on Lois Lerner’s and all other lost email

When headlines hit that the IRS was missing data, most information technology professionals jumped to the same conclusion: how could data ever really be gone?

Data loss is not common in this day and age as millions of dollars have gone towards most organization’s data centers just to make sure data doesn’t get lost. In fact, the opposite can actually be the issue: there’s too many copies of the same data.

For example, an email sent is immediately stored on the sender and receivers PC as well as the server. Nightly, each email box is backed up on the company’s server for disaster recovery – in case of a computer crash or something more disruptive. After a week goes by, there’s seven or more copies of that email stored somewhere. In addition, there’s a good chance that a copy of that email has been copied to archive for long-term retention based on Lerner’s senior status.

If that email becomes lost from the desktop or email server, there should be many copies that exist in other locations. There are few reasons feasible that data could ever really be gone even if a company attempted to destroy its data, Enron taught us that, but breakdowns in policy and lack of information management and data center search solutions could make it lost.

Where does data go

Most mid to large-sized organizations store copies of their legacy data on a troublesome format called backup tape. Resembling a VHS that has been cut in half, data is backed up nightly from all the servers to sets of backup tapes.

Unlike a VHS that may be recorded over many times, these tapes are permanent and quickly fill up. Some are stored within a company’s data center, but the bulk of this data is sent to offsite vaults meant to house and protect these tape archives.

Retrieving that data isn’t as simple is putting a VHS in a VCR. Systems advanced and organizations changed storage vendors over the last 20 years, making many of these tapes inaccessible as they were originally recorded on proprietary technology that wasn’t compatible with other vendors.

The original software to access some tapes hasn’t been around for over 10 years and requires either specialty direct tape indexing technology or expensive restoration of the original software. In addition, knowing which data is where at a company with thousands of employees, years later, is no easy task and can be claimed as burdensome in a less high profile situation.

The less certainty about where the data is, the longer and more costly finding is, but the data still exists – somewhere.

Why can’t we find data

The backup environment at these organizations is massive and finding needed data is traditionally a long, expensive process that is only compounded by the breakdown in corporate policy.

Managing corporate data should be a unified effort between the IT department, legal team and records management, but in all actuality each assumes its own part and it causes large policy gaps.

Without this proactive communication and a partnership between legal and IT organizations, IT will continue to store information that no longer has business value but can turn into a liability. eDiscovery costs, finding and collecting data, will also remain high as every time a request is made a new and time consuming search must be commenced through thousands of legacy tapes.

In the past if legal asked IT what data exists where, there would be a blank response. If IT asked legal about data policies, what they should keep and what they can dispose of, the answer would not come easily and each department did “their” job. IT stores the data. Legal requests data. Records management recommends policy. Legal and IT can’t decide who implements policy so no one does.

The data stays on legacy tape, but no one knows exactly where.

Perception versus reality

The lost Lerner emails should serve as a wakeup call for enterprises to understand the lifecycle of data. In order for data to truly “no longer exist,” an organization would need to access all environments (all those backup tapes) and apply a defensible deletion policy. Otherwise claiming that data is “gone” is a weak excuse.

However, permanently removing email can be done, and is actually a beneficial way to control the long-term risk of aged data once it outlives its business, compliance and legal value. This isn’t a back-door ad-hoc job of users hitting a delete key or dumping tape in shredders, but firm policy dictated by those who are charged with protecting the company from any liability.

Deleting corporate data must be done under the guidance of legal and records management professionals – with the key challenge of ensuring the enterprise is keeping what is required for regulatory, compliance and legal purposes, while disposing data that can be misinterpreted or cause a security breach.

The only accurate and defensible way to get rid of data in a corporation is to define a solid policy, and apply it to not only your current production data, but the legacy data as well. By ignoring the legacy data, all an organization does is lose some of the copies. No data should ever be lost. It should be archived, managed or purged.

***for more information on backup tapes, or for quotes contact***