Deletion has to be Defensible, even for the IRS

The painful lesson learned when ignoring backup tapes as part of your defensible deletion and data governance policies

Lois Lerner’s emails are gone. We know this, but more than a server issue or hard drive crash, the backup tapes that archived the untampered with and complete records of those emails were destroyed.

Now, it could cost IRS Commissioner John Koskinen his job. 18 US Congressman are seeking impeachment against Koskinen on the grounds of his “failure to check Lerner’s cell phone and backup tapes that contained missing emails related to the scandal.”

According to a Wall Street Journal article, there are a few points that Koskinen is being accused of, all which could have been avoided with a proper data governance policy and documentation of the policy.

  1. “In February 2014 Congress instructed Koskinen to supply all emails related to Lerner… A few weeks after the subpoena, IRS employees in West Virginia erased 422 backup tapes, destroying up to 24,000 Lerner emails.”

Tapes need to be incorporated into governance policies. Had these tapes been part of a defensible deletion or information governance policy, they likely would have been managed properly and treated as records or defensibly deleted as a part of the normal IT process.

  1. “The second charge cites “a pattern of deception” and three “materially false” statements Koskinen has made to Congress, under oath, including his assurances that no Lerner emails had been lost. In fact Lerner’s hard drive had crashed and employees erased tapes.”

After disaster recovery, tapes can become a defacto archive. Once a tape is no longer useful for disaster recovery, it’s nothing more than a snapshot of data. Despite any legal claim stating otherwise, they serve no other purpose except for a defacto archive and should be treated as such. Financial burden and inaccessibility arguments are also becoming null and void.

  1. “A final charge accuses Koskinen of incompetence, noting how despite his insistence that his agency had gone to “great lengths” to retrieve lost Lerner emails, the IRS failed to search disaster backup tapes, a Lerner BlackBerry and laptop, the email server and its backup tapes. When the Treasury Inspector General did his own search, he found 1,000 new Lerner emails in 14 days.”

Data – email included – never dies (easily). When creating policy, it’s important to understand where the data goes: desktop, secondary hard drive, server, backup tapes, disk, archive. By understanding this and creating (and auditing) policy restricting portable devices, PSTs and other places data can go, an organization can more effectively create an enforceable policy and manage risk and liability.

Data, including what is archived on backup tapes, must be properly audited and managed. When data is deleted without an understanding of why, how and when, problems inherently arise, especially if this data is at the heart of high profile litigation. All data – especially data on backup tapes – should have a governance policy surrounding it to make it defensible and avoid the pitfalls of the IRS.

Webinar: 10 Reasons Tape Is a Lousy Archive

Join us Thursday October 8th at 1 pm ET/10 am PT for this educational webinar. Register Now

One of the most frequent misuses of backup tape is as an archive for sensitive user data.

Organizations don’t typically design tape as their archive, however, it inadvertently becomes one when old backup tapes are sent to offsite storage after cycling out of their disaster recovery usefulness.

Join us Thursday, October 8th for a 45-minute webinar to explore 10 Reasons Tape Isn’t a Good Archive and discover how to secure your data, mitigate risk and simplify tape restores in support of legal and compliance.

What could have happened to Hillary’s emails?

Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to hear arguments to order the State Department to question Hillary Clinton on the existence of emails on backup tape archives, The Hill Reported, but information management company Index Engines can explain exactly what this means and how it is done.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that likely has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire by copying all email ever created onto backup tapes.

This standard “IT” process produces a snapshot of what actually happened and it is secure and tamper proof, and represents a factual record of the past and are much more reliable than the records stored on local servers and hard drives that can be accessed by many and easily spoiled.

In this case, the backup of the email server most likely occurred at an offsite location chosen by the hosting provider, Platte River Networks, and the data was placed on tapes that are typically preserved in offsite storage vaults. When the main server was shut down, the tapes could have been forgotten about.

Index Engines has software that can quickly scan backup tapes, index the contents of the email, and make it searchable and accessible without the use of any other third party software or infrastructure. Through this process keywords, time frames and file types can be quickly produced and extracted without corruption.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“When an email is sent, it is copied and archived and preserved many times over. This is a disaster recovery feature standard in any data center. What Hillary Clinton probably didn’t know is that exact copies of what existed is archived in data center disaster recovery archives, or backup tapes, that allow for a rebuilding of an email server in case of a failure.”

Data Governance in the Back Seat of Your Car

The IT manager at Cancer Care Group, P.C. thought nothing of throwing a backup tape which contained the names, addresses, dates of birth, Social Security numbers, insurance information and clinical information of approximately 55,000 patients in the back seat of their car (read the article).

They probably did this every week for years, years before HIPPA existed, in order to comply with their disaster recovery procedures. What they didn’t think about was what would their legal and compliance teams think about this?

When organizations IT departments work in a vacuum and don’t understand the implications of preserving and archiving data to tape and then carelessly transporting these records outside the protection of the corporate environment, they put their organization at harm financially.

Data governance should not take place in the back seat of a car. It should be in corporate conference rooms where IT and legal collaborate to determine what is the best course of action to protect and manage sensitive corporate records. Data governance means knowing what exists, where it is (even backup tapes), and how it is managed according to policy. I am assuming for most organizations this would not include the back seat of a car.

When IT made the decision to move data offsite via backup tapes in order to fulfill their disaster recovery strategy, they cost the organization $750,000 in fines, years of litigation and a multiyear corrective action plan that is to be monitored by Department of Health and Human Services (HHS), not to mention public embarrassment.

In today’s legal and regulatory climate it is astounding that IT organizations have the freedom to carelessly manage sensitive corporate records. Decades of corporate records archived on backup tapes are stored in salt mines, basement cabinets, employee’s garages, even backseats of cars apparently.

How will organizations implement sound policies and procedures in compliance with regulations like HIPPA if they don’t even know what they have or where it is?

Tapes are a great, cost-effective tool for backup, but disaster recovery tapes aren’t a capable archive. Archiving data from tape, including the legacy stockpiles, is critical in forming a sound data governance policy and securing data from compliance issues, data breaches and the back seat of a well-meaning employee’s car.

What Hillary Clinton Can Teach Us about Backup Tapes and Archiving

Backup tapes are often the ignored child in the data governance and eDiscovery world. It has been widely understood that tapes are burdensome and expensive to collect data from and are built only for the remote chance of needing them for disaster recovery and not legal purposes.

This week Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to order the review of emails from backup tape archives of Hillary Clinton’s email server. This will once again put backup tapes front and center in a high profile event.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire.

When the server was set up and Platte River engaged, all copies of Clinton’s email were captured by standard backup procedures and copied on backup tapes or disk based backups. This standard “IT” process is a snapshot of what actually happened and it is secure and tamper proof, and represent the factual record of the past.

In the case of Clinton and her email, these backup tapes are much more reliable than the records stored on local servers and hard drives that are accessible by many and easily spoiled (remember Lois Lehrner). So as Judge Walton knows, when push comes to shove, lets go to the backup tapes to understand what really happened.

Despite backup tapes having a reputation of being inaccessible and burdensome, information management company Index Engines makes them as available for collection as online records. Index Engines can quickly scan these tapes, index the contents of the email, and make it searchable and accessible without the use of complex third-party software or risk of corrupting the data.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“We’ve assisted in countless legal cases where data was thought to be long gone, yet with a simple search of backup tapes using our software, the ‘smoking gun’ is quickly found.”

Organizations need to include these backup tapes in their data governance strategies and ensure they preserve and secure or properly remediate sensitive content before they are forced to produce it.

 

To learn more about securing your organization’s legacy tape data, contact info@indexengines.com.

What’s Abandoned Data Costing You?

The average corporate turnover rate for employees is 15.1 percent across all industries, with some specific verticals experiencing as high as 30%. For an organization with 10,000 employees this can account for 1,500 to 3,000 people annually (Compensation Force: 2013 Turnover Rates by Industry).

When an employee leaves an organization the IT department will typically wipe or recycle their hard drive, containing their digital files and email, however, they neglect to clean and manage former employees’ data on corporate networks and servers.

For this scenario, a company of 10,000 looking at the conservative annual turnover of 1,500 employees, this could account for easily 60 TB of data that is abandoned in the data center each year. Over 10 years this explodes to beyond half a petabyte.

Abandoned data is unstructured files, email and other data owned by ex-employees that languishes on networks and servers. Gartner estimates that the 2013 average Annual Storage Cost per Raw TB of capacity is $3,212 (Gartner: IT Key Metrics Data 2014: Key Infrastructure Measures: Storage Analysis: Current Year, Dec. 2013). This can account for millions of wasted expenses each year.

Abandoned data consists of old working documents that have long outlived their business value: revisions of letters, old spreadsheets, presentations and aged email. However, a small percentage of this content can easily contain sensitive files and email. It is this small percentage of contracts, confidential email exchanges,
client records and other similar documents, which adds a level of risk and liability for the corporation.

The bulk of the data is typically what is known as redundant, outdated and trivial content – or ROT – that is simply taking up space and resulting in unnecessary management and data center costs.

The following are data factors to consider…

Keep Reading Download our free whitepaper.

Defensible Deletion Methodology

How to Control Risk and Manage Expenses Associated with User Content

What is Defensible Deletion?

For decades, organizations have had strategies in place to protect and safeguard data. Enforcing those strategies, however, has been one of the greatest challenges faced by corporation over the past two decades. Users hoard data on desktops, including archiving email in local repositories known as pst’s. Every year, storage administrators add massive silos of disk to allow employees to save more and more files, even maintaining space for users that left the company years ago.

Archiving and records managers continually copy important documents and email into proprietary repositories for long term retention. Business continuity and backup administrators replicate all content on a weekly basis and archive data to offsite storage for safekeeping in case of a disaster.

Data is continually replicated, re-stored and hidden throughout an enterprise. Even with sound policies and procedures in place, simply finding data so that it can be managed is an enormous undertaking. Defensible deletion is a process, within an overall information governance policy, that provides comprehensive knowledge and access to all user data so that policy can be applied and data can be managed according to specific compliance requirements.

Implementing a defensible deletion methodology not only mitigates long term risks and liabilities related to enterprise data assets, but also saves time and expense in supporting ongoing litigation and eDiscovery efforts, while reducing data center budget used for storing and managing data that is no longer useful.

Keep Reading Download our free Defensible Deletion whitepaper.

Backing up the House: Why Backup isn’t Archive

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.

Backing Up the House

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.

Index Engines Adds One-Click Data Profiling Reports to Catalyst Express, the Company’s Free 5 TB Enterprise Data Management Software

Catalyst Express gives organizations the ability to automate reports on file content and metadata including location, name, size, extension, dates, duplicates, PII and more

HOLMDEL, NJ – Index Engines has announced the addition of stored reports and automation to Catalyst Express, the information management company’s free user data management software.

These reports allow one-click access to detailed knowlesge of up to 5TB of user data, including aged data, abandoned and active data, duplicates, large files, PII, and more. Reports can be run on demand or scheudled to run as needed.

“Most organizations don’t know what they have, if it has value, if it’s stored in the correct place, if it poses a risk or liability, or if it’s employee vacation photos and music libraries,” Index Engines VP Jim McGann said. “Catalyst will give them this insight into their data and help them determine and execute data policies.”

These canned reports can be used to understand what exists and develop an appropriate disposition strategy, or they can be customized accordance to the users needs.

Customized reports can include file metadata attributes such as path, file name, size, extension, accessed date, modified date, host names, Active Directory group membership, as well as security metadata including read, write, and browse access to files.

Reports including in this new product include::
• Abandoned files, those not accessed in more than 3 years.
• Active files, those accessed or modified within 90 days
• Duplicate content, files with the same document signature
• Large files, files larger than 1GB or 4GB
• Multimedia files, all video, music and image files
• PII, files containing credit card and social security numbers

Index Engines’ Catalyst product line scales to large global enterprise data center environments consisting of petabytes of unstructured data. The new Catalyst Express software is a no-cost entry point that allows clients to leverage the value of the Catalyst platform and begin to control costs and risk associated with unstructured user data.

Leveraging the rich metadata or full-text indexing in conjunction with Active Directory integration and security analysis through indexing of file ACLs, content can be managed with a single click.

High-level reports allow instant insight into enterprise storage providing unprecedented knowledge of data assets so decisions can be made on disposition, governance policies and even data security.

Upgrade options for Catalyst Express include:
• Additional terabytes of capacity
• Advanced data management policies
• Integrated forensic archiving and eDiscovery workflows
• Detailed indexing of file system audit trails
• Metadata and full content indexing of Exchange, Notes, and Sharepoint
• Federated search for distribute environments
• Support for data within backup images (tape or disk)

“Catalyst is implemented worldwide to help manage petabytes of critical business data assets,” McGann said. “With this new product Index Engines is providing a great opportunity to begin managing risk and costs associated with user data at an attractive $0.”

Catalyst Express is available for download at http://www.indexengines.com/catalyst-express
###