Why time to data matters more than ever

Time to data has always been a big push for us at Index Engines, as we know that service providers and counsel need to have confidence that the ESI they need can be found and delivered on deadline.

But as data amounts increased and queries became more in depth, it became a lot harder for some vendor technology to keep up with demand and deliver the needed information.

Now, more than ever, we see the legal ramifications of not being able to complete ESI culling as one vendor is being held financially and legally accountable.

This shines the light back on accelerating time to data. Some ESPs still consider 20GB/hour quick – it’s not when terabytes or even petabytes of data need to be processed. Then that data needs to be culled, deduped, deNISTed and compared across platforms before being moved into legal hold for review.

Time to data is not just reflective of technology, it’s reflective of the service provider. ESPs need to do their homework before accepting a job and partnering with a technology vendor as they will be linked to that technology’s performance. Poor performance from the technology will ultimately lead to less work for the ESP and less trust among the legal community.

The ability to provide defensible and auditable ESI in a timely, cost effective manner has never been more important, and neither has the technology vendor ESPs choose to work with.

Is internet and data privacy a thing of the past?

Privacy has been in the news and on our minds of late. The NSA entered in the privacy debate when Edward Snowden exposed the fact that they were monitoring cell phone calls in order to uncover terror plots. If the government monitors private citizen’s records in the name of safety, is this ok? What about when Google or Facebook is required to hand over records to find criminals? If records are accessed by the government in order to protect and secure our citizens, is that ok? Many people would welcome this and feel more secure.

Where is the line drawn on privacy? How do organizations manage private and sensitive data? People constantly submit private data to websites when they buy goods or services. When you obtain a mortgage significant details of your life are delivered to trusted providers. Is this data secure? What happens when this content gets in the wrong hands? Have we become too trusting with our personal information?

What about those that grew up on Facebook? Facebook owns everything you post on their site. Does the average Facebook user understand the contract they accepted when they created an account? Can you accept that contract at 13? Is Facebook chipping away at privacy and making it more acceptable to share private details of our lives? Is the information shared only bad when it gets in the wrong hands? Are we relying on complexity and technology to hide personal data and hope no one will ever see it?

The recent dialog regarding the Enron data set shows how our community treats privacy. Many stated that it was common knowledge that private data, including personal tax records, was in the data set. The difference is here we didn’t have an Edward Snowdon to blow the whistle. Was privacy an issue in this case? I would think if it was your credit card or social security number it would be. If not, then you can make statement like – the value of the data set outweighed any issues related to privacy.

As technology provides more streamlined access to all data, that which was created today and content created many years ago, privacy must be front and center. Without privacy and control we harm people. The NSA is using private data for the protection of citizens. Others would like to use private data they can hack for evil and harm.

Managing ESI to control Risk and Liability

Uncover how unstructured data profiling can provide true information governance

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at ESI.

In less than 60 minutes, you’ll:

– Explore how data profiling works to mitigate risks and control liability associated with stored data,
– Discover how others are using this new technology to solve complex compliance and regulatory problems, and
– Evolve your information governance and data policies with immediately actionable and implementable strategies.

[embedplusvideo height=”298″ width=”480″ standard=”http://www.youtube.com/v/4Q8KblI8TZg?fs=1″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep8474″ /]

The Enron PII Fallout: What dirty data really causes

Since we at Index Engines announced that Nuix’s re-release of the Enron PST data set still contained PII despite its press release’s claim it was ‘cleansed,’ a lot of questions have been posed and many reactions raised – ethically, legally and morally.

Our first reaction to finding PII was disappointment over the distribution of the PST data set before it was audited or validated by a third-party, especially since it was for public consumption. Despite what lawyers say about the legal accountability of republishing this set, we easily found names, addresses, birthdates and social security numbers in the SAME document. The eDiscovery community knows the ramifications of breaches better than anyone. Why allow this to happen?

We were confused why few really seemed to care that there was PII on a data set being promoted out. “It’s been around for a long time and I don’t think anyone’s been harmed, so oh well, it’s public.” That is the strangest logic and attitude I’ve ever seen come out of the legal community, no matter what some prior ruling stated. The world is a far different place than it used to be and we don’t believe in data breaches for the ‘greater good.’

Then our disappointment turned to fear. Much of what we found was buried deep within attachments, sent folders and Outlook notes, which happens as data ages – it becomes buried and harder to find. eDiscovery tools are supposed to make finding this information easier, but if they’re missing PII, they could be missing vital evidence. Is, for argument’s sake, finding 99% of the needed files enough? What about 95%? Or 97%? Where’s the accountability and what happens when what’s missing is the deciding factor in a case? The mortgage industry is likely going to be the first to experience this issue. Emails sent by loan originators that haven’t worked for the company in five or more years are going to be needed. How many tools can find ALL of them? There’s a difference between mitigating risks beforehand and missing some documents and not being able to produce all the information needed during eDiscovery.

Hindsight may be 20-20, but there’s some regret that this wasn’t vendor-blind community effort. EDRM is a great group that does a lot of good work. What if a handful of vendors could locate PII, then EDRM could remove it without vendors knowing who found what? Sure there may be a missed marketing opportunity or two, but that would have had the best chance of actually producing a truly cleansed data set. Until this clean data set can be achieved, we don’t support the publishing of any data breach and can’t figure out why it’s still published.

Then there’s a bit of advice for all the law firms and service providers. Use caution if you’re using a new vendor to uncover information for litigation readiness or eDiscovery. If you or another company you trust hasn’t audited this third party, get a second look. Depending on the depth of the job and the accuracy needed, the vendor you want to use may change. Every vendor has different strengths, just make sure you find a vendor with the right tools for the job. Ask the tough questions about validation, where their software comes from and if they can complete the job you need.

Index Engines finds more dirt on Nuix’s ‘cleansed’ Enron data set

Enron’s republished PST data set still contains numerous personally identifiable information violations despite Nuix’s ‘efforts,’ Index Engines finds

The Enron PST data set has been a point of controversy for the legal community and the latest self-touting of this data set being cleansed by information management company, Nuix, has rekindled the discussion – why facilitate and publish a data breach?

The Nuix-cleansed and republished document is still littered with many social security numbers, legal documents and other information that should not be made public as found after a simple review by Index Engines.

Index Engines indexed the cleansed data set through its Catalyst Unstructured Data Profiling Engine and ran one general PII search which looks for number patterns and different variations of the words “social security number.”

After a cursory review of the responsive hits it was easy to find many violations. Understanding that some could be false positives, a review of the first 100 records found dozens of confirmed data breaches. These breaches were buried deep in email attachments, sent folders and Outlook notes.

Examples of the missed breaches are below – but we took the liberty of blacking out PII. You don’t serve dinner on partially cleaned plates because people can get sick. You don’t release a partially cleaned data set because people’s identity can be stolen.

The most troubling part of how much PII Index Engines still found is the risk of identity theft these people face from having their information published. Already having their name, former employer and social security number, a quick search of social media can show their marital status, town, college, friends, current employer and make them an easy target for identity theft. If I was one of those people – I’d call a lawyer.

Then, there’s the troubling thought, legally, that even when you think your data’s clean, is it? In this case it wasn’t and should make companies, law firms and service providers question the tools they use for eDiscovery and litigation readiness.

In case you missed it, according to Nuix’s press release, they, along with EDRM, took the well known Amazon Web Services Public Data Set and used a series of investigative workflows to uncover and remove PII. The findings returned 60 credit card numbers, 572 social security or other national identity numbers and 292 birth dates, the release said, the uncovered items were then removed and a cleansed data set was republished.

It’s truly a scary thought when technology is supposed to do a job and can’t.

Enron 2

Enron 1

Index Engines’ data risk mitigation capabilities featured in the Wall Street Journal

Index Engines was interviewed for a story in the Wall Street Journal’s Risk and Compliance Section. The article looked at how data growth relates to compliance and regulatory issues and how Index Engines’ data profiling tools help mitigate these risks.

Read the Wall Street Journal Article here

Also, for more in Index Engines data profiling capabilities, go here


Webinar: Achieving Information Governance through Profiling

Records managers are faced with a seemingly never-ending challenge. They must: understand data to classify data, classify data to enforce policy, and enforce policy by managing data.

Join Index Engines Vice President, Jim McGann, as he shows you how records managers are leveraging the latest technology to understand, classify and govern data – and automating much of the process too.


Topic: Achieving Information Governance through Profiling

Speaker: Jim McGann, Index Engines vice president

Date: Thursday, May 23 at 2:00 pm ET

Duration: 60 minutes

Cost: Free

Register: https://www4.gotomeeting.com/register/510965903

By the end of the webinar, you’ll have gained valuable insight to your data environment and walk away with practical strategies that can be incorporated in your data policies immediately including:

  • Understanding what data exists, where, for how long & who owns it,
  • Defining data so it can be classified for regulatory and business needs, and
  • Setting retention policies that mitigate risk, and reduce storage capacity.

Don’t miss out on how you can achieve true information governance through data profiling.

Where there’s smoke there’s fire: Cutting off the oxygen to big data

Discover how to reclaim your data center and storage budget while mitigating risk
To gain control of your data center, you need to understand what data exists and develop policies around that data.

Join Jim McGann, Index Engines vice president, and Lisa J. Berry-Tayman, Esq., Information Consulting founder, as they go through the best practices of uncovering unstructured data and creating sound policies to support your data center.

Topic: Cutting off the oxygen to big data
Jim McGann, Index Engines Vice President, and
Lisa J. Berry-Tayman, Esq., Information Consulting founder

Date: Tuesday, May 14, 2013 at 11:30 AM ET

Duration: 60 minutes


In less than 60 minutes, you’ll have the knowledge you need to develop:

• Comprehensive understanding of what data exists, where it lives and what risks it poses so decisions on its disposition can be made,
• The ability to reclaiming storage capacity by uncovering duplicate content, employee-owned multimedia files and other sources of wasted storage capacity, and
• Policies to mitigate regulatory and compliance risks by uncovering highly-sensitive documents and allowing them to be properly archived.

As an added bonus, attendees are eligible to receive a sample report of their unstructured data and an introductory consultation of what it means to their organization, compliments of Index Engines and Information Consulting.

now to take advantage of this exclusive offer.

New data profiling engine released, now the conversation can start between legal and IT

Today Index Engines released its Catalyst Unstructured Data Profiling Engine. You can find the press release here.

Basically the Catalyst Data Profiling Engine processes all forms of unstructured storage, email and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and what key terms are in it. Through this process, unknown – dark – or lost data is found and decisions can be made on its disposition.

But what this really does is provide a knowledge of what data exists and gives different departments a chance to have a balanced discussion about their data. Before data profiling it was nearly impossible to understand what exists, where and for how long.

Data profiling allows conversation to take place between IT and legal. These conversations allow disposition to be decided. Aged data that has no business value and not been accessed in more than a decade is easily classified and purged. Sensitive email such as PSTs that are hidden on the network can be easily uncovered and monitored in order to determine the best course of action. PII can be searched for and encrypted before a breach happens. Systems can be audited for compliance.

Legal can now view and profile data and collaborate with IT to determine the next step. Even when the next eDiscovery event occurs, legal can just ask IT where is “John Doe’s” email and IT can provide an quick answer and preserve the data on legal hold.

As legal and IT begin to collaborate and discuss polices and information governance strategies they will find that much of the data that they are spending significant money to store and maintain is of no value.

On-demand webinar: Managing ESI to control risk and liability

To control risk and liability within email communications and other documents, you need to understand what information exists.

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at and manage ESI.

Uncover how you can take the mystery out of unknown data to protect your organization and your clients.

In less than 60 minutes, you’ll:
• Explore how data profiling works to mitigate risks and control liability,
• Discover how others are solving complex compliance and regulatory problems,
• Evolve your information governance and data policies immediately

[embedplusvideo height=”298″ width=”480″ standard=”http://www.youtube.com/v/4Q8KblI8TZg?fs=1″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep4023″ /]