Index Engines finds more dirt on Nuix’s ‘cleansed’ Enron data set

Enron’s republished PST data set still contains numerous personally identifiable information violations despite Nuix’s ‘efforts,’ Index Engines finds

The Enron PST data set has been a point of controversy for the legal community and the latest self-touting of this data set being cleansed by information management company, Nuix, has rekindled the discussion – why facilitate and publish a data breach?

The Nuix-cleansed and republished document is still littered with many social security numbers, legal documents and other information that should not be made public as found after a simple review by Index Engines.

Index Engines indexed the cleansed data set through its Catalyst Unstructured Data Profiling Engine and ran one general PII search which looks for number patterns and different variations of the words “social security number.”

After a cursory review of the responsive hits it was easy to find many violations. Understanding that some could be false positives, a review of the first 100 records found dozens of confirmed data breaches. These breaches were buried deep in email attachments, sent folders and Outlook notes.

Examples of the missed breaches are below – but we took the liberty of blacking out PII. You don’t serve dinner on partially cleaned plates because people can get sick. You don’t release a partially cleaned data set because people’s identity can be stolen.

The most troubling part of how much PII Index Engines still found is the risk of identity theft these people face from having their information published. Already having their name, former employer and social security number, a quick search of social media can show their marital status, town, college, friends, current employer and make them an easy target for identity theft. If I was one of those people – I’d call a lawyer.

Then, there’s the troubling thought, legally, that even when you think your data’s clean, is it? In this case it wasn’t and should make companies, law firms and service providers question the tools they use for eDiscovery and litigation readiness.

In case you missed it, according to Nuix’s press release, they, along with EDRM, took the well known Amazon Web Services Public Data Set and used a series of investigative workflows to uncover and remove PII. The findings returned 60 credit card numbers, 572 social security or other national identity numbers and 292 birth dates, the release said, the uncovered items were then removed and a cleansed data set was republished.

It’s truly a scary thought when technology is supposed to do a job and can’t.

Enron4

 

Enron1

 

 

Enron2

 

 

Enron3

 

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Index Engines finds more dirt on Nuix’s ‘cleansed’ Enron data set

  1. Pingback: The Enron PII Fallout: What dirty data really causes |

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>