Webinar: 10 Reasons Tape Is a Lousy Archive

Join us Thursday October 8th at 1 pm ET/10 am PT for this educational webinar. Register Now

One of the most frequent misuses of backup tape is as an archive for sensitive user data.

Organizations don’t typically design tape as their archive, however, it inadvertently becomes one when old backup tapes are sent to offsite storage after cycling out of their disaster recovery usefulness.

Join us Thursday, October 8th for a 45-minute webinar to explore 10 Reasons Tape Isn’t a Good Archive and discover how to secure your data, mitigate risk and simplify tape restores in support of legal and compliance.

What could have happened to Hillary’s emails?

Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to hear arguments to order the State Department to question Hillary Clinton on the existence of emails on backup tape archives, The Hill Reported, but information management company Index Engines can explain exactly what this means and how it is done.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that likely has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire by copying all email ever created onto backup tapes.

This standard “IT” process produces a snapshot of what actually happened and it is secure and tamper proof, and represents a factual record of the past and are much more reliable than the records stored on local servers and hard drives that can be accessed by many and easily spoiled.

In this case, the backup of the email server most likely occurred at an offsite location chosen by the hosting provider, Platte River Networks, and the data was placed on tapes that are typically preserved in offsite storage vaults. When the main server was shut down, the tapes could have been forgotten about.

Index Engines has software that can quickly scan backup tapes, index the contents of the email, and make it searchable and accessible without the use of any other third party software or infrastructure. Through this process keywords, time frames and file types can be quickly produced and extracted without corruption.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“When an email is sent, it is copied and archived and preserved many times over. This is a disaster recovery feature standard in any data center. What Hillary Clinton probably didn’t know is that exact copies of what existed is archived in data center disaster recovery archives, or backup tapes, that allow for a rebuilding of an email server in case of a failure.”

Data Governance in the Back Seat of Your Car

The IT manager at Cancer Care Group, P.C. thought nothing of throwing a backup tape which contained the names, addresses, dates of birth, Social Security numbers, insurance information and clinical information of approximately 55,000 patients in the back seat of their car (read the article).

They probably did this every week for years, years before HIPPA existed, in order to comply with their disaster recovery procedures. What they didn’t think about was what would their legal and compliance teams think about this?

When organizations IT departments work in a vacuum and don’t understand the implications of preserving and archiving data to tape and then carelessly transporting these records outside the protection of the corporate environment, they put their organization at harm financially.

Data governance should not take place in the back seat of a car. It should be in corporate conference rooms where IT and legal collaborate to determine what is the best course of action to protect and manage sensitive corporate records. Data governance means knowing what exists, where it is (even backup tapes), and how it is managed according to policy. I am assuming for most organizations this would not include the back seat of a car.

When IT made the decision to move data offsite via backup tapes in order to fulfill their disaster recovery strategy, they cost the organization $750,000 in fines, years of litigation and a multiyear corrective action plan that is to be monitored by Department of Health and Human Services (HHS), not to mention public embarrassment.

In today’s legal and regulatory climate it is astounding that IT organizations have the freedom to carelessly manage sensitive corporate records. Decades of corporate records archived on backup tapes are stored in salt mines, basement cabinets, employee’s garages, even backseats of cars apparently.

How will organizations implement sound policies and procedures in compliance with regulations like HIPPA if they don’t even know what they have or where it is?

Tapes are a great, cost-effective tool for backup, but disaster recovery tapes aren’t a capable archive. Archiving data from tape, including the legacy stockpiles, is critical in forming a sound data governance policy and securing data from compliance issues, data breaches and the back seat of a well-meaning employee’s car.

What Hillary Clinton Can Teach Us about Backup Tapes and Archiving

Backup tapes are often the ignored child in the data governance and eDiscovery world. It has been widely understood that tapes are burdensome and expensive to collect data from and are built only for the remote chance of needing them for disaster recovery and not legal purposes.

This week Judge Reggie Walton of the U.S. District Court for the D.C. Circuit is expected to order the review of emails from backup tape archives of Hillary Clinton’s email server. This will once again put backup tapes front and center in a high profile event.

When Clinton implemented an email server to control and manage her correspondence, her team hired Platte River Networks to host this environment. This is a third-party organization that has procedures in place to protect data and ensure it can be restored in the case of a disaster such as a flood or fire.

When the server was set up and Platte River engaged, all copies of Clinton’s email were captured by standard backup procedures and copied on backup tapes or disk based backups. This standard “IT” process is a snapshot of what actually happened and it is secure and tamper proof, and represent the factual record of the past.

In the case of Clinton and her email, these backup tapes are much more reliable than the records stored on local servers and hard drives that are accessible by many and easily spoiled (remember Lois Lehrner). So as Judge Walton knows, when push comes to shove, lets go to the backup tapes to understand what really happened.

Despite backup tapes having a reputation of being inaccessible and burdensome, information management company Index Engines makes them as available for collection as online records. Index Engines can quickly scan these tapes, index the contents of the email, and make it searchable and accessible without the use of complex third-party software or risk of corrupting the data.

“Data never dies,” said Tim Williams, CEO of Index Engines. “All modern organizations have robust data protection processes that make copies of everything and archive it on backup media to ensure it can survive a disaster. In cases like this, those copies represent the factual truth. They can’t be changed after the fact.

“We’ve assisted in countless legal cases where data was thought to be long gone, yet with a simple search of backup tapes using our software, the ‘smoking gun’ is quickly found.”

Organizations need to include these backup tapes in their data governance strategies and ensure they preserve and secure or properly remediate sensitive content before they are forced to produce it.

 

To learn more about securing your organization’s legacy tape data, contact info@indexengines.com.

What’s Abandoned Data Costing You?

The average corporate turnover rate for employees is 15.1 percent across all industries, with some specific verticals experiencing as high as 30%. For an organization with 10,000 employees this can account for 1,500 to 3,000 people annually (Compensation Force: 2013 Turnover Rates by Industry).

When an employee leaves an organization the IT department will typically wipe or recycle their hard drive, containing their digital files and email, however, they neglect to clean and manage former employees’ data on corporate networks and servers.

For this scenario, a company of 10,000 looking at the conservative annual turnover of 1,500 employees, this could account for easily 60 TB of data that is abandoned in the data center each year. Over 10 years this explodes to beyond half a petabyte.

Abandoned data is unstructured files, email and other data owned by ex-employees that languishes on networks and servers. Gartner estimates that the 2013 average Annual Storage Cost per Raw TB of capacity is $3,212 (Gartner: IT Key Metrics Data 2014: Key Infrastructure Measures: Storage Analysis: Current Year, Dec. 2013). This can account for millions of wasted expenses each year.

Abandoned data consists of old working documents that have long outlived their business value: revisions of letters, old spreadsheets, presentations and aged email. However, a small percentage of this content can easily contain sensitive files and email. It is this small percentage of contracts, confidential email exchanges,
client records and other similar documents, which adds a level of risk and liability for the corporation.

The bulk of the data is typically what is known as redundant, outdated and trivial content – or ROT – that is simply taking up space and resulting in unnecessary management and data center costs.

The following are data factors to consider…

Keep Reading Download our free whitepaper.

Defensible Deletion Methodology

How to Control Risk and Manage Expenses Associated with User Content

What is Defensible Deletion?

For decades, organizations have had strategies in place to protect and safeguard data. Enforcing those strategies, however, has been one of the greatest challenges faced by corporation over the past two decades. Users hoard data on desktops, including archiving email in local repositories known as pst’s. Every year, storage administrators add massive silos of disk to allow employees to save more and more files, even maintaining space for users that left the company years ago.

Archiving and records managers continually copy important documents and email into proprietary repositories for long term retention. Business continuity and backup administrators replicate all content on a weekly basis and archive data to offsite storage for safekeeping in case of a disaster.

Data is continually replicated, re-stored and hidden throughout an enterprise. Even with sound policies and procedures in place, simply finding data so that it can be managed is an enormous undertaking. Defensible deletion is a process, within an overall information governance policy, that provides comprehensive knowledge and access to all user data so that policy can be applied and data can be managed according to specific compliance requirements.

Implementing a defensible deletion methodology not only mitigates long term risks and liabilities related to enterprise data assets, but also saves time and expense in supporting ongoing litigation and eDiscovery efforts, while reducing data center budget used for storing and managing data that is no longer useful.

Keep Reading Download our free Defensible Deletion whitepaper.

Backing up the House: Why Backup isn’t Archive

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.

Backing Up the House

When meeting with a data protection manager at a client site recently, they summed up the world of backup in a nutshell stating: “In the past I was told to backup the house, now they want me to know about everything in the house, how many paintings, rugs, chairs, etc. and be able to get them out at a moment’s notice.”

Backup was never designed to provide the level of detail about the data to support today’s data governance requirements. Clients use backup as an archive of their sensitive data, but yet it does not provide the knowledge needed to support legal, eDiscovery, compliance and regulatory needs. How are you going to support an eDiscovery request from a 10-year-old backup tape when you no longer have the backup software that created the tape?

Backup is not archive – but it could be. Backup captures all enterprise data, user files, email, archives and etc. If it exists, backup knows about it. However, as my friend the data protection manager stated, there is no way to know what is contained in backup. Sure you have a catalog, but finding specific emails from a user is not an easy task. Additionally, as the backup data ages it becomes more and more complex to know what you have and get it back from tape or disk.

Extracting knowledge of what is in backup is the first step in leveraging this data for archiving; knowledge well beyond the backup catalog, such as detailed metadata of documents, and spreadsheets, presentations. Beyond metadata, certain data governance requirements require knowledge of content, including keyword search of email and files to find sensitive content.

Security and compliance also requires finding content based on patterns such as PII and PHI. Without this level of knowledge users backup the whole “house” and it becomes an assumptive archive once its disaster recovery role is complete. This results in a “save everything” strategy, which is not a smart or economical governance strategy.

The second step to leveraging backup for archiving is access to information from the proprietary backup formats. Restoring backup data from last week’s tapes or disk images is not very complex, however, finding a specific users mailbox, containing specific keywords is impossible.

So when legal calls and says to find all email related to a specific client, or set of keywords, the backup manager is forced to restore full backups just to find a small set of content. As the backup data ages it becomes even more complex. Companies change backup software, or transition to new backup strategies. Over time getting access to this legacy backup data is very time consuming and expensive, if not impossible.

Leveraging Backup for Archiving

Delivering knowledge of backup data is complex, however, Index Engines not only provides knowledge, including detailed content and keywords, but also provides access. Finding and restoring a specific email from a 10-year-old tape no longer requires the original backup software or a full restore. Index Engines has cracked the code and is able to leverage backup data to support archiving of data for legal and compliance needs.

Organizations have learned that backup is not an archive. Storing old tapes in a salt mine or accumulating backup images on disk will become problematic down the road.

Lack of knowledge and access to the data are not characteristics of a proper archive. Additionally, archiving everything, by storing all backup content, is not a sound strategy for organizations that face frequent lawsuits, regulatory requirements and strict data governance policies. These backup archives will result in risk, liabilities and fines down the road that tarnish the company’s reputation.

Eliminating the proprietary lock that backup software has on data, Index Engines delivers knowledge of what is in backup images and provides intelligent access to the data that has value and should be archived. Finding and archiving data without the need for the software that generated the backup is now possible. This allows backup to be leveraged for archiving and delivers support for today’s growing information governance requirements.

Index Engines supports direct indexing of backup tapes and disk images. Supporting all common backup formats data can be indexed at a high-level of metadata, or down to a full-text content capturing keywords from user email deep within Exchange and Notes databases. Beyond indexing, data can then be restored from backup, maintaining all the important metadata, without the need for the original software.

Two classic use cases of Index Engines technology are to clean up legacy backup data on tape or disk for clients that were using backup as an archive and to stop the need for making tapes from disk-based backups (or to stop archiving recent disaster recovery tapes out to offsite storage).

Index Engines delivers the intelligence and access to these environments to extract what is needed according to policy, which is typically a small volume of the total capacity, and archive it on disk according to retention requirements. Once data is archived and secured it is searchable and accessible to support even the most complex data governance requirements.

Index Engines Adds One-Click Data Profiling Reports to Catalyst Express, the Company’s Free 5 TB Enterprise Data Management Software

Catalyst Express gives organizations the ability to automate reports on file content and metadata including location, name, size, extension, dates, duplicates, PII and more

HOLMDEL, NJ – Index Engines has announced the addition of stored reports and automation to Catalyst Express, the information management company’s free user data management software.

These reports allow one-click access to detailed knowlesge of up to 5TB of user data, including aged data, abandoned and active data, duplicates, large files, PII, and more. Reports can be run on demand or scheudled to run as needed.

“Most organizations don’t know what they have, if it has value, if it’s stored in the correct place, if it poses a risk or liability, or if it’s employee vacation photos and music libraries,” Index Engines VP Jim McGann said. “Catalyst will give them this insight into their data and help them determine and execute data policies.”

These canned reports can be used to understand what exists and develop an appropriate disposition strategy, or they can be customized accordance to the users needs.

Customized reports can include file metadata attributes such as path, file name, size, extension, accessed date, modified date, host names, Active Directory group membership, as well as security metadata including read, write, and browse access to files.

Reports including in this new product include::
• Abandoned files, those not accessed in more than 3 years.
• Active files, those accessed or modified within 90 days
• Duplicate content, files with the same document signature
• Large files, files larger than 1GB or 4GB
• Multimedia files, all video, music and image files
• PII, files containing credit card and social security numbers

Index Engines’ Catalyst product line scales to large global enterprise data center environments consisting of petabytes of unstructured data. The new Catalyst Express software is a no-cost entry point that allows clients to leverage the value of the Catalyst platform and begin to control costs and risk associated with unstructured user data.

Leveraging the rich metadata or full-text indexing in conjunction with Active Directory integration and security analysis through indexing of file ACLs, content can be managed with a single click.

High-level reports allow instant insight into enterprise storage providing unprecedented knowledge of data assets so decisions can be made on disposition, governance policies and even data security.

Upgrade options for Catalyst Express include:
• Additional terabytes of capacity
• Advanced data management policies
• Integrated forensic archiving and eDiscovery workflows
• Detailed indexing of file system audit trails
• Metadata and full content indexing of Exchange, Notes, and Sharepoint
• Federated search for distribute environments
• Support for data within backup images (tape or disk)

“Catalyst is implemented worldwide to help manage petabytes of critical business data assets,” McGann said. “With this new product Index Engines is providing a great opportunity to begin managing risk and costs associated with user data at an attractive $0.”

Catalyst Express is available for download at http://www.indexengines.com/catalyst-express
###

5 Things I Found in My Garage that Suggest You Need a Data Center Intervention

When my car could no longer comfortably fit in the garage, I figured it was time to bite the bullet and see exactly what was forcing me to upgrade my garage capacity.

After I pulled everything from the garage out onto my driveway, I stood looking at my collection stuff, I realized I amassed exactly what I warn data center admins about keeping in their data center, stuff of value mixed in with redundant, outdated and trivial junk.

Sensitive documents. First there was a large box sitting out in the open. I remember rummaging through it last February. It has tax documents, pay stubs, doctor receipts, credit card bills and similar financial statements. Sure, it contains tons of my PII, but is it really at risk in my garage?

Of course it is. Most of this could be shredded and I’d never miss my June 2011 American Express bill. The documents I need – W2s, tax returns – easily fit into one folder that can get archived safely into the safety deposit box that I pay the bank for anyway. By organizing this, I can reclaim about six square feet of space and eliminate the risk of my nosey house sitter wandering into my garage and seeing the box labeled “Financial and Tax Records”.

Same thing for the data center, your networks and backup data is likely crawling with PII and PHI issues. Depending on age, industry, company policies; much of that should be remediated. The rest needs to go into a secure archive or encrypted.

Redundant, Outdated, Trivial Data. Then there was a four-shelf rack of stuff that I thought I needed, can’t use right now, but may use again one day: crock pots (two of them), tools, a snow blower, three shovels, old propane tanks and a few boxes of old household stuff.

I could use it. I likely won’t. I definitely don’t need all of it. Toss out the snow blower that doesn’t quite work, retire the boxes of old lamps, radios and other outdated items and relocate the three snow shovels out to the storage shed getting it out of the way and I start making progress. The crockpot came in handy last year and you can never have too many tools, right? Condensed to two shelves.

ROT (redundant, outdated, trivial data) isn’t active data. It’s a mix of junk, outdated files and some things that may need to be kept just in case. If it hasn’t been accessed in the last two or three years, it’s probably safe to move it offline and reclaim some server capacity. (I’m betting on your user share server.)

Active Data. There are some freshly placed bags from the local home improvement store. I have grass seed, some mulch, a few gallons of pool shock and some bath tub sealant. While the best place for it probably isn’t along the passenger side of my car, I need these products today and over the next few weeks.

Active data needs to be managed in place, so it is not lost and I can take advantage of it. Cleaning up all the junk around it makes it easier and allows me to leverage what has value.

Duplicate Data. A few garbage bags and shelves filled with bulk warehouse items: cases of water, toilet paper, canned vegetables, bags of charcoal and laundry detergent.

To me this is value, but when you have 96 of something that isn’t bottled water, it’s a waste of storage budget. Remediate these copies. I’ve seen organizations reclaim 25% of their network capacity just by getting rid of duplicates.

Aged and Former Employee Data. Behind the fourth case of water is a mystery box I haven’t seen in a while. It’s old training and marketing material from a former employment. It was outdated long before I left and is next to some old dry cleaning I haven’t worn in seven years… and will probably never wear again. Next to this are a dozen boxes from my kids room, old books and stuff they will never use. They moved out 5 years ago and have no plan to reclaim this stuff, nor does anyone know it exists.

It happens at data centers too. Employees move around within the organization. Others move on to different companies. Sometimes the data is just outdated and abandoned.

Aged and former employee data can make up to 50% of an organization’s network data. Find out how much of your data either hasn’t been accessed in three years or over two years and is owned by inactive or former employees. My aged and former employee stuff is going in the garbage. Yours may be better off remediated or at least moved offline.

Cleaning up. In one afternoon I was able to clear about over half the contents of my garage. While cleaning up the data center might take a little longer, it is just as simple.

Data profiling technology helps categorize and define user data based on metadata and individual file content so you can make decisions on it. Tier to the cloud. Archive. Remediate. Manage in place. Move offline.

I can even help you get started. Try Catalyst Express, it’s a free download from Index Engines that enables you to understand and manage up to 5TBs of LAN data. Start on a user share server or one used by the sales/services department. Those tend to be hot spots for ROT, ex-employee data and PII.

From there we can help get the rest of your LAN, email and legacy data in order.

As for your garage, you’re on your own.