Using Data Profiling to Mitigate 7 “Red Flag” Information Risks

Data profiling technology can help an organization identify what electronic information it has and where it is located, which is the first step to ensuring that information governance policies are applied to it, reducing the organization’s costs and mitigating its seven greatest information risks.

Uncover these red flags in the summer edition of ARMA’s Information Magazine.

Read Using Data Profiling to Mitigate 7 “Red Flag” Information Risks

Read the entire July/August 2013 issue of Information Management

Whitepaper: Leverage Data Profiling to Support Intelligent Disposition

Only with an understanding of unstructured data – owner, age, last accessed, file type – can decisions be made on its value and disposition.

While file-level analysis of data was previously near impossible to achieve, new technology has enabled organizations to classify data into categories and allow for manageable and simplified disposition strategies to be implemented.

Download this complimentary whitepaper from Engines to learn more.

Why time to data matters more than ever

Time to data has always been a big push for us at Index Engines, as we know that service providers and counsel need to have confidence that the ESI they need can be found and delivered on deadline.

But as data amounts increased and queries became more in depth, it became a lot harder for some vendor technology to keep up with demand and deliver the needed information.

Now, more than ever, we see the legal ramifications of not being able to complete ESI culling as one vendor is being held financially and legally accountable.

This shines the light back on accelerating time to data. Some ESPs still consider 20GB/hour quick – it’s not when terabytes or even petabytes of data need to be processed. Then that data needs to be culled, deduped, deNISTed and compared across platforms before being moved into legal hold for review.

Time to data is not just reflective of technology, it’s reflective of the service provider. ESPs need to do their homework before accepting a job and partnering with a technology vendor as they will be linked to that technology’s performance. Poor performance from the technology will ultimately lead to less work for the ESP and less trust among the legal community.

The ability to provide defensible and auditable ESI in a timely, cost effective manner has never been more important, and neither has the technology vendor ESPs choose to work with.

Is internet and data privacy a thing of the past?

Privacy has been in the news and on our minds of late. The NSA entered in the privacy debate when Edward Snowden exposed the fact that they were monitoring cell phone calls in order to uncover terror plots. If the government monitors private citizen’s records in the name of safety, is this ok? What about when Google or Facebook is required to hand over records to find criminals? If records are accessed by the government in order to protect and secure our citizens, is that ok? Many people would welcome this and feel more secure.

Where is the line drawn on privacy? How do organizations manage private and sensitive data? People constantly submit private data to websites when they buy goods or services. When you obtain a mortgage significant details of your life are delivered to trusted providers. Is this data secure? What happens when this content gets in the wrong hands? Have we become too trusting with our personal information?

What about those that grew up on Facebook? Facebook owns everything you post on their site. Does the average Facebook user understand the contract they accepted when they created an account? Can you accept that contract at 13? Is Facebook chipping away at privacy and making it more acceptable to share private details of our lives? Is the information shared only bad when it gets in the wrong hands? Are we relying on complexity and technology to hide personal data and hope no one will ever see it?

The recent dialog regarding the Enron data set shows how our community treats privacy. Many stated that it was common knowledge that private data, including personal tax records, was in the data set. The difference is here we didn’t have an Edward Snowdon to blow the whistle. Was privacy an issue in this case? I would think if it was your credit card or social security number it would be. If not, then you can make statement like – the value of the data set outweighed any issues related to privacy.

As technology provides more streamlined access to all data, that which was created today and content created many years ago, privacy must be front and center. Without privacy and control we harm people. The NSA is using private data for the protection of citizens. Others would like to use private data they can hack for evil and harm.

Managing ESI to control Risk and Liability

Uncover how unstructured data profiling can provide true information governance

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at ESI.

In less than 60 minutes, you’ll:

– Explore how data profiling works to mitigate risks and control liability associated with stored data,
– Discover how others are using this new technology to solve complex compliance and regulatory problems, and
– Evolve your information governance and data policies with immediately actionable and implementable strategies.

[embedplusvideo height=”298″ width=”480″ standard=”http://www.youtube.com/v/4Q8KblI8TZg?fs=1″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep8474″ /]

The Enron PII Fallout: What dirty data really causes

Since we at Index Engines announced that Nuix’s re-release of the Enron PST data set still contained PII despite its press release’s claim it was ‘cleansed,’ a lot of questions have been posed and many reactions raised – ethically, legally and morally.

Our first reaction to finding PII was disappointment over the distribution of the PST data set before it was audited or validated by a third-party, especially since it was for public consumption. Despite what lawyers say about the legal accountability of republishing this set, we easily found names, addresses, birthdates and social security numbers in the SAME document. The eDiscovery community knows the ramifications of breaches better than anyone. Why allow this to happen?

We were confused why few really seemed to care that there was PII on a data set being promoted out. “It’s been around for a long time and I don’t think anyone’s been harmed, so oh well, it’s public.” That is the strangest logic and attitude I’ve ever seen come out of the legal community, no matter what some prior ruling stated. The world is a far different place than it used to be and we don’t believe in data breaches for the ‘greater good.’

Then our disappointment turned to fear. Much of what we found was buried deep within attachments, sent folders and Outlook notes, which happens as data ages – it becomes buried and harder to find. eDiscovery tools are supposed to make finding this information easier, but if they’re missing PII, they could be missing vital evidence. Is, for argument’s sake, finding 99% of the needed files enough? What about 95%? Or 97%? Where’s the accountability and what happens when what’s missing is the deciding factor in a case? The mortgage industry is likely going to be the first to experience this issue. Emails sent by loan originators that haven’t worked for the company in five or more years are going to be needed. How many tools can find ALL of them? There’s a difference between mitigating risks beforehand and missing some documents and not being able to produce all the information needed during eDiscovery.

Hindsight may be 20-20, but there’s some regret that this wasn’t vendor-blind community effort. EDRM is a great group that does a lot of good work. What if a handful of vendors could locate PII, then EDRM could remove it without vendors knowing who found what? Sure there may be a missed marketing opportunity or two, but that would have had the best chance of actually producing a truly cleansed data set. Until this clean data set can be achieved, we don’t support the publishing of any data breach and can’t figure out why it’s still published.

Then there’s a bit of advice for all the law firms and service providers. Use caution if you’re using a new vendor to uncover information for litigation readiness or eDiscovery. If you or another company you trust hasn’t audited this third party, get a second look. Depending on the depth of the job and the accuracy needed, the vendor you want to use may change. Every vendor has different strengths, just make sure you find a vendor with the right tools for the job. Ask the tough questions about validation, where their software comes from and if they can complete the job you need.

Index Engines finds more dirt on Nuix’s ‘cleansed’ Enron data set

Enron’s republished PST data set still contains numerous personally identifiable information violations despite Nuix’s ‘efforts,’ Index Engines finds

The Enron PST data set has been a point of controversy for the legal community and the latest self-touting of this data set being cleansed by information management company, Nuix, has rekindled the discussion – why facilitate and publish a data breach?

The Nuix-cleansed and republished document is still littered with many social security numbers, legal documents and other information that should not be made public as found after a simple review by Index Engines.

Index Engines indexed the cleansed data set through its Catalyst Unstructured Data Profiling Engine and ran one general PII search which looks for number patterns and different variations of the words “social security number.”

After a cursory review of the responsive hits it was easy to find many violations. Understanding that some could be false positives, a review of the first 100 records found dozens of confirmed data breaches. These breaches were buried deep in email attachments, sent folders and Outlook notes.

Examples of the missed breaches are below – but we took the liberty of blacking out PII. You don’t serve dinner on partially cleaned plates because people can get sick. You don’t release a partially cleaned data set because people’s identity can be stolen.

The most troubling part of how much PII Index Engines still found is the risk of identity theft these people face from having their information published. Already having their name, former employer and social security number, a quick search of social media can show their marital status, town, college, friends, current employer and make them an easy target for identity theft. If I was one of those people – I’d call a lawyer.

Then, there’s the troubling thought, legally, that even when you think your data’s clean, is it? In this case it wasn’t and should make companies, law firms and service providers question the tools they use for eDiscovery and litigation readiness.

In case you missed it, according to Nuix’s press release, they, along with EDRM, took the well known Amazon Web Services Public Data Set and used a series of investigative workflows to uncover and remove PII. The findings returned 60 credit card numbers, 572 social security or other national identity numbers and 292 birth dates, the release said, the uncovered items were then removed and a cleansed data set was republished.

It’s truly a scary thought when technology is supposed to do a job and can’t.

Enron4

 

Enron1

 

 

Enron2

 

 

Enron3

 

Index Engines’ data risk mitigation capabilities featured in the Wall Street Journal

Index Engines was interviewed for a story in the Wall Street Journal’s Risk and Compliance Section. The article looked at how data growth relates to compliance and regulatory issues and how Index Engines’ data profiling tools help mitigate these risks.

Read the Wall Street Journal Article here

Also, for more in Index Engines data profiling capabilities, go here

 

Webinar: Achieving Information Governance through Profiling

Records managers are faced with a seemingly never-ending challenge. They must: understand data to classify data, classify data to enforce policy, and enforce policy by managing data.

Join Index Engines Vice President, Jim McGann, as he shows you how records managers are leveraging the latest technology to understand, classify and govern data – and automating much of the process too.

Details

Topic: Achieving Information Governance through Profiling

Speaker: Jim McGann, Index Engines vice president

Date: Thursday, May 23 at 2:00 pm ET

Duration: 60 minutes

Cost: Free

Register: https://www4.gotomeeting.com/register/510965903

By the end of the webinar, you’ll have gained valuable insight to your data environment and walk away with practical strategies that can be incorporated in your data policies immediately including:

  • Understanding what data exists, where, for how long & who owns it,
  • Defining data so it can be classified for regulatory and business needs, and
  • Setting retention policies that mitigate risk, and reduce storage capacity.

Don’t miss out on how you can achieve true information governance through data profiling.