10 User Data Projects Not to Leave Off the Schedule in 2016

With budgets tightening – often to pay for storage costs – data center managers are struggling to find the highest impact projects that will see an immediate ROI.

While there’s no one project that will reclaim all of the unstructured data rotting away in the data center, there are 10 crucial data projects not to leave off the schedule in 2016.

– Clean up abandoned data and reclaim capacity: When employees leave the organization, their files and email languish on networks and servers. With the owner no longer available to manage and maintain the content, it remains abandoned and clogs up corporate servers. Data centers must manage this abandoned data to avoid losing any valuable content and to reclaim capacity.

– Migrate aged data to cheaper storage tiers: As data ages on the network it can become less valuable. Storing data that has not been accessed in three years or longer is a waste of budget. Migrate this data to less expensive storage platforms. Aged data can represent between 40% and 70% of current server capacity.

– Implement accurate chargebacks based on metadata profiles and Active Directory ownership: Chargebacks enable data centers to recoup storage expenses and work with the departments to develop a more meaningful data policy including purging of what they no longer require.

– Defensively remediate legacy backup tapes and recoup offsite storage expenses: Old backup tapes that have piled up in offsite storage are a big line item on your annual budget. Using unstructured data profiling technology these tapes can be scanned, without the need of the original backup software, and a metadata index of the contents generated. Using this metadata, profile relevant content and extract needed content so the tapes can be defensibly remediated, reclaiming offsite storage expenses.

– Purge redundant and outdated files to free-up storage: Network servers can easily be comprised of 35 – 45% duplicate content. This content builds over time and results in wasted storage capacity. Once duplicates are identified, a policy can be implemented to purge what is no longer required such as redundant files that have not been accessed in over three years, or those owned by ex-employees.

– Audit and remove personal multimedia content (ie. music, video) from user shares: User shares become a repository of not only aged and abandoned files, but personal music, photo and video content that have no value to the business and in fact may be a liability. Once this data is classified reports can be generated showing the top 50 owners of this content, total capacity and location. This information can be used to set and enforce quotas and work with the data owners to clean up the content and reclaim capacity.

– Profile and move data to the cloud: Many data centers have cloud initiatives where aged and less useful business data is migrated to more cost-effective hosted storage. Finding the data and on-ramping it to the cloud ,however ,is a challenge of you lack understanding of your data: who owns it, when it was last accessed, types of files, etc.

– Archive sensitive content and support eDiscovery more cost effectively: Legal and compliance requests for user files and email can be disruptive and time consuming. Finding the relevant content and extracting it in a defensible manner are the key challenges. Streamlining access to critical data so you can respond to legal requests quicker not only lessens their time burden but saves you time and money during location efforts.

– Audit and secure PII to control risk: Users don’t always abide by corporate data policies. Sharing sensitive information containing client social security and credit card numbers, such as tax forms, credit reports and application, can easily happen. Find this information, audit email and servers, and take the appropriate action to ensure client data is secure. Some content may need to be relocated and moved to an archive, encrypted or even purged from the network. Managing PII ensures compliance with corporate policies and controls liability associated with sensitive data.

– Manage and control liability hidden in PSTs: Email contains sensitive corporate data including communications of agreements, contracts, private business discussions and more. Many firms have email archives in place to monitor and protect this data, however, users can easily create their own mini-archive or PST of the content that is not managed by corporate. PSTs have caused great pain when involved in litigation as email that was thought to be no longer in existence suddenly appears in a hidden PST.

Deletion has to be Defensible, even for the IRS

The painful lesson learned when ignoring backup tapes as part of your defensible deletion and data governance policies

Lois Lerner’s emails are gone. We know this, but more than a server issue or hard drive crash, the backup tapes that archived the untampered with and complete records of those emails were destroyed.

Now, it could cost IRS Commissioner John Koskinen his job. 18 US Congressman are seeking impeachment against Koskinen on the grounds of his “failure to check Lerner’s cell phone and backup tapes that contained missing emails related to the scandal.”

According to a Wall Street Journal article, there are a few points that Koskinen is being accused of, all which could have been avoided with a proper data governance policy and documentation of the policy.

  1. “In February 2014 Congress instructed Koskinen to supply all emails related to Lerner… A few weeks after the subpoena, IRS employees in West Virginia erased 422 backup tapes, destroying up to 24,000 Lerner emails.”

Tapes need to be incorporated into governance policies. Had these tapes been part of a defensible deletion or information governance policy, they likely would have been managed properly and treated as records or defensibly deleted as a part of the normal IT process.

  1. “The second charge cites “a pattern of deception” and three “materially false” statements Koskinen has made to Congress, under oath, including his assurances that no Lerner emails had been lost. In fact Lerner’s hard drive had crashed and employees erased tapes.”

After disaster recovery, tapes can become a defacto archive. Once a tape is no longer useful for disaster recovery, it’s nothing more than a snapshot of data. Despite any legal claim stating otherwise, they serve no other purpose except for a defacto archive and should be treated as such. Financial burden and inaccessibility arguments are also becoming null and void.

  1. “A final charge accuses Koskinen of incompetence, noting how despite his insistence that his agency had gone to “great lengths” to retrieve lost Lerner emails, the IRS failed to search disaster backup tapes, a Lerner BlackBerry and laptop, the email server and its backup tapes. When the Treasury Inspector General did his own search, he found 1,000 new Lerner emails in 14 days.”

Data – email included – never dies (easily). When creating policy, it’s important to understand where the data goes: desktop, secondary hard drive, server, backup tapes, disk, archive. By understanding this and creating (and auditing) policy restricting portable devices, PSTs and other places data can go, an organization can more effectively create an enforceable policy and manage risk and liability.

Data, including what is archived on backup tapes, must be properly audited and managed. When data is deleted without an understanding of why, how and when, problems inherently arise, especially if this data is at the heart of high profile litigation. All data – especially data on backup tapes – should have a governance policy surrounding it to make it defensible and avoid the pitfalls of the IRS.

Webinar: 10 Reasons Tape Is a Lousy Archive

Join us Thursday October 8th at 1 pm ET/10 am PT for this educational webinar. Register Now

One of the most frequent misuses of backup tape is as an archive for sensitive user data.

Organizations don’t typically design tape as their archive, however, it inadvertently becomes one when old backup tapes are sent to offsite storage after cycling out of their disaster recovery usefulness.

Join us Thursday, October 8th for a 45-minute webinar to explore 10 Reasons Tape Isn’t a Good Archive and discover how to secure your data, mitigate risk and simplify tape restores in support of legal and compliance.

What could have happened to Hillary’s emails?

Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to hear arguments to order the State Department to question Hillary Clinton on the existence of emails on backup tape archives, The Hill Reported, but information management company Index Engines can explain exactly what this means and how it is done.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that likely has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire by copying all email ever created onto backup tapes.

This standard “IT” process produces a snapshot of what actually happened and it is secure and tamper proof, and represents a factual record of the past and are much more reliable than the records stored on local servers and hard drives that can be accessed by many and easily spoiled.

In this case, the backup of the email server most likely occurred at an offsite location chosen by the hosting provider, Platte River Networks, and the data was placed on tapes that are typically preserved in offsite storage vaults. When the main server was shut down, the tapes could have been forgotten about.

Index Engines has software that can quickly scan backup tapes, index the contents of the email, and make it searchable and accessible without the use of any other third party software or infrastructure. Through this process keywords, time frames and file types can be quickly produced and extracted without corruption.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“When an email is sent, it is copied and archived and preserved many times over. This is a disaster recovery feature standard in any data center. What Hillary Clinton probably didn’t know is that exact copies of what existed is archived in data center disaster recovery archives, or backup tapes, that allow for a rebuilding of an email server in case of a failure.”

Data Governance in the Back Seat of Your Car

The IT manager at Cancer Care Group, P.C. thought nothing of throwing a backup tape which contained the names, addresses, dates of birth, Social Security numbers, insurance information and clinical information of approximately 55,000 patients in the back seat of their car (read the article).

They probably did this every week for years, years before HIPPA existed, in order to comply with their disaster recovery procedures. What they didn’t think about was what would their legal and compliance teams think about this?

When organizations IT departments work in a vacuum and don’t understand the implications of preserving and archiving data to tape and then carelessly transporting these records outside the protection of the corporate environment, they put their organization at harm financially.

Data governance should not take place in the back seat of a car. It should be in corporate conference rooms where IT and legal collaborate to determine what is the best course of action to protect and manage sensitive corporate records. Data governance means knowing what exists, where it is (even backup tapes), and how it is managed according to policy. I am assuming for most organizations this would not include the back seat of a car.

When IT made the decision to move data offsite via backup tapes in order to fulfill their disaster recovery strategy, they cost the organization $750,000 in fines, years of litigation and a multiyear corrective action plan that is to be monitored by Department of Health and Human Services (HHS), not to mention public embarrassment.

In today’s legal and regulatory climate it is astounding that IT organizations have the freedom to carelessly manage sensitive corporate records. Decades of corporate records archived on backup tapes are stored in salt mines, basement cabinets, employee’s garages, even backseats of cars apparently.

How will organizations implement sound policies and procedures in compliance with regulations like HIPPA if they don’t even know what they have or where it is?

Tapes are a great, cost-effective tool for backup, but disaster recovery tapes aren’t a capable archive. Archiving data from tape, including the legacy stockpiles, is critical in forming a sound data governance policy and securing data from compliance issues, data breaches and the back seat of a well-meaning employee’s car.

What Hillary Clinton Can Teach Us about Backup Tapes and Archiving

Backup tapes are often the ignored child in the data governance and eDiscovery world. It has been widely understood that tapes are burdensome and expensive to collect data from and are built only for the remote chance of needing them for disaster recovery and not legal purposes.

This week Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to order the review of emails from backup tape archives of Hillary Clinton’s email server. This will once again put backup tapes front and center in a high profile event.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire.

When the server was set up and Platte River engaged, all copies of Clinton’s email were captured by standard backup procedures and copied on backup tapes or disk based backups. This standard “IT” process is a snapshot of what actually happened and it is secure and tamper proof, and represent the factual record of the past.

In the case of Clinton and her email, these backup tapes are much more reliable than the records stored on local servers and hard drives that are accessible by many and easily spoiled (remember Lois Lehrner). So as Judge Walton knows, when push comes to shove, lets go to the backup tapes to understand what really happened.

Despite backup tapes having a reputation of being inaccessible and burdensome, information management company Index Engines makes them as available for collection as online records. Index Engines can quickly scan these tapes, index the contents of the email, and make it searchable and accessible without the use of complex third-party software or risk of corrupting the data.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“We’ve assisted in countless legal cases where data was thought to be long gone, yet with a simple search of backup tapes using our software, the ‘smoking gun’ is quickly found.”

Organizations need to include these backup tapes in their data governance strategies and ensure they preserve and secure or properly remediate sensitive content before they are forced to produce it.

 

To learn more about securing your organization’s legacy tape data, contact info@indexengines.com.

What’s Abandoned Data Costing You?

The average corporate turnover rate for employees is 15.1 percent across all industries, with some specific verticals experiencing as high as 30%. For an organization with 10,000 employees this can account for 1,500 to 3,000 people annually (Compensation Force: 2013 Turnover Rates by Industry).

When an employee leaves an organization the IT department will typically wipe or recycle their hard drive, containing their digital files and email, however, they neglect to clean and manage former employees’ data on corporate networks and servers.

For this scenario, a company of 10,000 looking at the conservative annual turnover of 1,500 employees, this could account for easily 60 TB of data that is abandoned in the data center each year. Over 10 years this explodes to beyond half a petabyte.

Abandoned data is unstructured files, email and other data owned by ex-employees that languishes on networks and servers. Gartner estimates that the 2013 average Annual Storage Cost per Raw TB of capacity is $3,212 (Gartner: IT Key Metrics Data 2014: Key Infrastructure Measures: Storage Analysis: Current Year, Dec. 2013). This can account for millions of wasted expenses each year.

Abandoned data consists of old working documents that have long outlived their business value: revisions of letters, old spreadsheets, presentations and aged email. However, a small percentage of this content can easily contain sensitive files and email. It is this small percentage of contracts, confidential email exchanges,
client records and other similar documents, which adds a level of risk and liability for the corporation.

The bulk of the data is typically what is known as redundant, outdated and trivial content – or ROT – that is simply taking up space and resulting in unnecessary management and data center costs.

The following are data factors to consider…

Keep Reading Download our free whitepaper.

Defensible Deletion Methodology

How to Control Risk and Manage Expenses Associated with User Content

What is Defensible Deletion?

For decades, organizations have had strategies in place to protect and safeguard data. Enforcing those strategies, however, has been one of the greatest challenges faced by corporation over the past two decades. Users hoard data on desktops, including archiving email in local repositories known as pst’s. Every year, storage administrators add massive silos of disk to allow employees to save more and more files, even maintaining space for users that left the company years ago.

Archiving and records managers continually copy important documents and email into proprietary repositories for long term retention. Business continuity and backup administrators replicate all content on a weekly basis and archive data to offsite storage for safekeeping in case of a disaster.

Data is continually replicated, re-stored and hidden throughout an enterprise. Even with sound policies and procedures in place, simply finding data so that it can be managed is an enormous undertaking. Defensible deletion is a process, within an overall information governance policy, that provides comprehensive knowledge and access to all user data so that policy can be applied and data can be managed according to specific compliance requirements.

Implementing a defensible deletion methodology not only mitigates long term risks and liabilities related to enterprise data assets, but also saves time and expense in supporting ongoing litigation and eDiscovery efforts, while reducing data center budget used for storing and managing data that is no longer useful.

Keep Reading Download our free Defensible Deletion whitepaper.

Backing up the House: Why Backup isn’t Archive

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.

Backing Up the House

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.