Managing ESI to control Risk and Liability

Uncover how unstructured data profiling can provide true information governance

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at ESI.

In less than 60 minutes, you’ll:

– Explore how data profiling works to mitigate risks and control liability associated with stored data,
– Discover how others are using this new technology to solve complex compliance and regulatory problems, and
– Evolve your information governance and data policies with immediately actionable and implementable strategies.

[embedplusvideo height=”298″ width=”480″ standard=”″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep8474″ /]

The Enron PII Fallout: What dirty data really causes

Since we at Index Engines announced that Nuix’s re-release of the Enron PST data set still contained PII despite its press release’s claim it was ‘cleansed,’ a lot of questions have been posed and many reactions raised – ethically, legally and morally.

Our first reaction to finding PII was disappointment over the distribution of the PST data set before it was audited or validated by a third-party, especially since it was for public consumption. Despite what lawyers say about the legal accountability of republishing this set, we easily found names, addresses, birthdates and social security numbers in the SAME document. The eDiscovery community knows the ramifications of breaches better than anyone. Why allow this to happen?

We were confused why few really seemed to care that there was PII on a data set being promoted out. “It’s been around for a long time and I don’t think anyone’s been harmed, so oh well, it’s public.” That is the strangest logic and attitude I’ve ever seen come out of the legal community, no matter what some prior ruling stated. The world is a far different place than it used to be and we don’t believe in data breaches for the ‘greater good.’

Then our disappointment turned to fear. Much of what we found was buried deep within attachments, sent folders and Outlook notes, which happens as data ages – it becomes buried and harder to find. eDiscovery tools are supposed to make finding this information easier, but if they’re missing PII, they could be missing vital evidence. Is, for argument’s sake, finding 99% of the needed files enough? What about 95%? Or 97%? Where’s the accountability and what happens when what’s missing is the deciding factor in a case? The mortgage industry is likely going to be the first to experience this issue. Emails sent by loan originators that haven’t worked for the company in five or more years are going to be needed. How many tools can find ALL of them? There’s a difference between mitigating risks beforehand and missing some documents and not being able to produce all the information needed during eDiscovery.

Hindsight may be 20-20, but there’s some regret that this wasn’t vendor-blind community effort. EDRM is a great group that does a lot of good work. What if a handful of vendors could locate PII, then EDRM could remove it without vendors knowing who found what? Sure there may be a missed marketing opportunity or two, but that would have had the best chance of actually producing a truly cleansed data set. Until this clean data set can be achieved, we don’t support the publishing of any data breach and can’t figure out why it’s still published.

Then there’s a bit of advice for all the law firms and service providers. Use caution if you’re using a new vendor to uncover information for litigation readiness or eDiscovery. If you or another company you trust hasn’t audited this third party, get a second look. Depending on the depth of the job and the accuracy needed, the vendor you want to use may change. Every vendor has different strengths, just make sure you find a vendor with the right tools for the job. Ask the tough questions about validation, where their software comes from and if they can complete the job you need.

Index Engines finds more dirt on Nuix’s ‘cleansed’ Enron data set

Enron’s republished PST data set still contains numerous personally identifiable information violations despite Nuix’s ‘efforts,’ Index Engines finds

The Enron PST data set has been a point of controversy for the legal community and the latest self-touting of this data set being cleansed by information management company, Nuix, has rekindled the discussion – why facilitate and publish a data breach?

The Nuix-cleansed and republished document is still littered with many social security numbers, legal documents and other information that should not be made public as found after a simple review by Index Engines.

Index Engines indexed the cleansed data set through its Catalyst Unstructured Data Profiling Engine and ran one general PII search which looks for number patterns and different variations of the words “social security number.”

After a cursory review of the responsive hits it was easy to find many violations. Understanding that some could be false positives, a review of the first 100 records found dozens of confirmed data breaches. These breaches were buried deep in email attachments, sent folders and Outlook notes.

Examples of the missed breaches are below – but we took the liberty of blacking out PII. You don’t serve dinner on partially cleaned plates because people can get sick. You don’t release a partially cleaned data set because people’s identity can be stolen.

The most troubling part of how much PII Index Engines still found is the risk of identity theft these people face from having their information published. Already having their name, former employer and social security number, a quick search of social media can show their marital status, town, college, friends, current employer and make them an easy target for identity theft. If I was one of those people – I’d call a lawyer.

Then, there’s the troubling thought, legally, that even when you think your data’s clean, is it? In this case it wasn’t and should make companies, law firms and service providers question the tools they use for eDiscovery and litigation readiness.

In case you missed it, according to Nuix’s press release, they, along with EDRM, took the well known Amazon Web Services Public Data Set and used a series of investigative workflows to uncover and remove PII. The findings returned 60 credit card numbers, 572 social security or other national identity numbers and 292 birth dates, the release said, the uncovered items were then removed and a cleansed data set was republished.

It’s truly a scary thought when technology is supposed to do a job and can’t.

Enron 2

Enron 1

Index Engines’ data risk mitigation capabilities featured in the Wall Street Journal

Index Engines was interviewed for a story in the Wall Street Journal’s Risk and Compliance Section. The article looked at how data growth relates to compliance and regulatory issues and how Index Engines’ data profiling tools help mitigate these risks.

Read the Wall Street Journal Article here

Also, for more in Index Engines data profiling capabilities, go here


Webinar: Achieving Information Governance through Profiling

Records managers are faced with a seemingly never-ending challenge. They must: understand data to classify data, classify data to enforce policy, and enforce policy by managing data.

Join Index Engines Vice President, Jim McGann, as he shows you how records managers are leveraging the latest technology to understand, classify and govern data – and automating much of the process too.


Topic: Achieving Information Governance through Profiling

Speaker: Jim McGann, Index Engines vice president

Date: Thursday, May 23 at 2:00 pm ET

Duration: 60 minutes

Cost: Free


By the end of the webinar, you’ll have gained valuable insight to your data environment and walk away with practical strategies that can be incorporated in your data policies immediately including:

  • Understanding what data exists, where, for how long & who owns it,
  • Defining data so it can be classified for regulatory and business needs, and
  • Setting retention policies that mitigate risk, and reduce storage capacity.

Don’t miss out on how you can achieve true information governance through data profiling.

Where there’s smoke there’s fire: Cutting off the oxygen to big data

Discover how to reclaim your data center and storage budget while mitigating risk
To gain control of your data center, you need to understand what data exists and develop policies around that data.

Join Jim McGann, Index Engines vice president, and Lisa J. Berry-Tayman, Esq., Information Consulting founder, as they go through the best practices of uncovering unstructured data and creating sound policies to support your data center.

Topic: Cutting off the oxygen to big data
Jim McGann, Index Engines Vice President, and
Lisa J. Berry-Tayman, Esq., Information Consulting founder

Date: Tuesday, May 14, 2013 at 11:30 AM ET

Duration: 60 minutes


In less than 60 minutes, you’ll have the knowledge you need to develop:

• Comprehensive understanding of what data exists, where it lives and what risks it poses so decisions on its disposition can be made,
• The ability to reclaiming storage capacity by uncovering duplicate content, employee-owned multimedia files and other sources of wasted storage capacity, and
• Policies to mitigate regulatory and compliance risks by uncovering highly-sensitive documents and allowing them to be properly archived.

As an added bonus, attendees are eligible to receive a sample report of their unstructured data and an introductory consultation of what it means to their organization, compliments of Index Engines and Information Consulting.

now to take advantage of this exclusive offer.

New data profiling engine released, now the conversation can start between legal and IT

Today Index Engines released its Catalyst Unstructured Data Profiling Engine. You can find the press release here.

Basically the Catalyst Data Profiling Engine processes all forms of unstructured storage, email and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and what key terms are in it. Through this process, unknown – dark – or lost data is found and decisions can be made on its disposition.

But what this really does is provide a knowledge of what data exists and gives different departments a chance to have a balanced discussion about their data. Before data profiling it was nearly impossible to understand what exists, where and for how long.

Data profiling allows conversation to take place between IT and legal. These conversations allow disposition to be decided. Aged data that has no business value and not been accessed in more than a decade is easily classified and purged. Sensitive email such as PSTs that are hidden on the network can be easily uncovered and monitored in order to determine the best course of action. PII can be searched for and encrypted before a breach happens. Systems can be audited for compliance.

Legal can now view and profile data and collaborate with IT to determine the next step. Even when the next eDiscovery event occurs, legal can just ask IT where is “John Doe’s” email and IT can provide an quick answer and preserve the data on legal hold.

As legal and IT begin to collaborate and discuss polices and information governance strategies they will find that much of the data that they are spending significant money to store and maintain is of no value.

On-demand webinar: Managing ESI to control risk and liability

To control risk and liability within email communications and other documents, you need to understand what information exists.

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at and manage ESI.

Uncover how you can take the mystery out of unknown data to protect your organization and your clients.

In less than 60 minutes, you’ll:
• Explore how data profiling works to mitigate risks and control liability,
• Discover how others are solving complex compliance and regulatory problems,
• Evolve your information governance and data policies immediately

[embedplusvideo height=”298″ width=”480″ standard=”″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep4023″ /]

Leveraging Data Profiling For Achievable Projects

More than ever before, organizations want to know what kinds of information are stored within the IT infrastructure. Why? Because this information bogs down critical production systems like email and collaborative document management, costs money to store, and presents massive risk if not managed correctly. Few organizations truly understand the makeup of their digital landfills. But, that is soon to change. According to a recent eDiscoveryJournal survey, more than 50% of organizations plan shared drive migration and clean-up projects – more than any other named information governance projects.

These projects aim to defensibly delete unnecessary, outdated, or duplicative information while keeping valuable knowledge or content that is on Legal Hold. This is not just a nice corporate “house keeping” idea; it is now a necessity due to the high growth rate forecast by business analysts. McKinsey Global Institute, for example, projects data to grow at 40% per year; thus making it virtually impossible to effectively and economically store and manage organizational information without some form of culling.

In order to do this, organizations need to efficiently profile data. This insight into information can help get past analysis paralysis to clear digital landfills. In this webinar, eDJ Analyst Greg Buckles and Jim McGann will examine practical approaches to data profiling and how to set organizational goals. We will example the approaches to information governance, such as managing data in-place, dealing with Legal Holds, selecting targets for profiling, and information classification. We will examine case studies focused on PST audits and profiling for disposition. Finally, this webinar will offer pragmatic advice on how to use data profiling to achieve immediate results today while building out a larger information governance strategy and plan.

Space is limited. Register now.

7 signs a data breach could be looming

Data breaches have made the headlines much too often lately and left many IT, legal and compliance departments to wonder how they would react to a breach.

But instead of reacting, you can proactively assess your risk of a data breach and work to solve any vulnerable areas during a self audit. Look to see if any of these red flags live in your data environment.

  1. Mystery data. Do you know the type of data located on every server, backup tapes and even hidden email files such as PSTs? Different custodians within the organization create and maintain different types of data at different levels of sensitivity. By not knowing who created what and where it is, it leaves the door open for files to get lost and fall into the wrong hands.
  2. Poor archiving. Do you practice value-based archiving or an archive everything strategy? The latter leaves your important, sensitive data lost among a network of junk. Data gets lost and forgotten about until misplaced.
  3. Duplicates. How do you manage your duplicate data and do you know where your duplicates are? It doesn’t make much sense to protect one document when hundreds of copies of it exist in the enterprise. Understand and manage duplicate data.
  4. Personally Identifiable Information. Does your sales or service team routinely handle credit cards, Social Security numbers or other PII? Could any of that information have been sent over email by someone who does not understand the risks? Audit your system for PII.
  5. Un-interpretable data. Un-interpretable data is data that belonged to an ex-employee and was created a number of years ago likely has little business value, but it is a compliance risk. It can no longer be properly interpreted in its original context. Jokes can be crimes. Misunderstandings can become lawsuits. How much turnover does your business have?
  6. PSTs. These sensitive little email files don’t live with the rest of the emails, often creating copies or mini archives that go unmanaged. Where do they live, who owns them and when were they last accessed?
  7. Executive data. How the former CEOs email is handled and how last summer’s interns email is handled should be dramatically different. Are they held in an archive on retention policies with a set expiration dates or still on the computer they used?

You likely recognized at least one flag that exists in your data center and if you found four or five, you’re with the majority of large companies. There’s help out there. Email for more information or visit: