Index Engines’ data risk mitigation capabilities featured in the Wall Street Journal

Index Engines was interviewed for a story in the Wall Street Journal’s Risk and Compliance Section. The article looked at how data growth relates to compliance and regulatory issues and how Index Engines’ data profiling tools help mitigate these risks.

Read the Wall Street Journal Article here

Also, for more in Index Engines data profiling capabilities, go here


Webinar: Achieving Information Governance through Profiling

Records managers are faced with a seemingly never-ending challenge. They must: understand data to classify data, classify data to enforce policy, and enforce policy by managing data.

Join Index Engines Vice President, Jim McGann, as he shows you how records managers are leveraging the latest technology to understand, classify and govern data – and automating much of the process too.


Topic: Achieving Information Governance through Profiling

Speaker: Jim McGann, Index Engines vice president

Date: Thursday, May 23 at 2:00 pm ET

Duration: 60 minutes

Cost: Free


By the end of the webinar, you’ll have gained valuable insight to your data environment and walk away with practical strategies that can be incorporated in your data policies immediately including:

  • Understanding what data exists, where, for how long & who owns it,
  • Defining data so it can be classified for regulatory and business needs, and
  • Setting retention policies that mitigate risk, and reduce storage capacity.

Don’t miss out on how you can achieve true information governance through data profiling.

Where there’s smoke there’s fire: Cutting off the oxygen to big data

Discover how to reclaim your data center and storage budget while mitigating risk
To gain control of your data center, you need to understand what data exists and develop policies around that data.

Join Jim McGann, Index Engines vice president, and Lisa J. Berry-Tayman, Esq., Information Consulting founder, as they go through the best practices of uncovering unstructured data and creating sound policies to support your data center.

Topic: Cutting off the oxygen to big data
Jim McGann, Index Engines Vice President, and
Lisa J. Berry-Tayman, Esq., Information Consulting founder

Date: Tuesday, May 14, 2013 at 11:30 AM ET

Duration: 60 minutes


In less than 60 minutes, you’ll have the knowledge you need to develop:

• Comprehensive understanding of what data exists, where it lives and what risks it poses so decisions on its disposition can be made,
• The ability to reclaiming storage capacity by uncovering duplicate content, employee-owned multimedia files and other sources of wasted storage capacity, and
• Policies to mitigate regulatory and compliance risks by uncovering highly-sensitive documents and allowing them to be properly archived.

As an added bonus, attendees are eligible to receive a sample report of their unstructured data and an introductory consultation of what it means to their organization, compliments of Index Engines and Information Consulting.

now to take advantage of this exclusive offer.

New data profiling engine released, now the conversation can start between legal and IT

Today Index Engines released its Catalyst Unstructured Data Profiling Engine. You can find the press release here.

Basically the Catalyst Data Profiling Engine processes all forms of unstructured storage, email and document types, creating a searchable index of what exists, where it is located, who owns it, when it was last accessed and what key terms are in it. Through this process, unknown – dark – or lost data is found and decisions can be made on its disposition.

But what this really does is provide a knowledge of what data exists and gives different departments a chance to have a balanced discussion about their data. Before data profiling it was nearly impossible to understand what exists, where and for how long.

Data profiling allows conversation to take place between IT and legal. These conversations allow disposition to be decided. Aged data that has no business value and not been accessed in more than a decade is easily classified and purged. Sensitive email such as PSTs that are hidden on the network can be easily uncovered and monitored in order to determine the best course of action. PII can be searched for and encrypted before a breach happens. Systems can be audited for compliance.

Legal can now view and profile data and collaborate with IT to determine the next step. Even when the next eDiscovery event occurs, legal can just ask IT where is “John Doe’s” email and IT can provide an quick answer and preserve the data on legal hold.

As legal and IT begin to collaborate and discuss polices and information governance strategies they will find that much of the data that they are spending significant money to store and maintain is of no value.

On-demand webinar: Managing ESI to control risk and liability

To control risk and liability within email communications and other documents, you need to understand what information exists.

Join eDiscovery Journal analyst Greg Buckles and Index Engines Vice President Jim McGann as they explore how unstructured data profiling technology is revolutionizing the way we look at and manage ESI.

Uncover how you can take the mystery out of unknown data to protect your organization and your clients.

In less than 60 minutes, you’ll:
• Explore how data profiling works to mitigate risks and control liability,
• Discover how others are solving complex compliance and regulatory problems,
• Evolve your information governance and data policies immediately

[embedplusvideo height=”298″ width=”480″ standard=”″ vars=”ytid=4Q8KblI8TZg&width=480&height=298&start=&stop=&rs=w&hd=0&autoplay=0&react=1&chapters=&notes=” id=”ep4023″ /]

Leveraging Data Profiling For Achievable Projects

More than ever before, organizations want to know what kinds of information are stored within the IT infrastructure. Why? Because this information bogs down critical production systems like email and collaborative document management, costs money to store, and presents massive risk if not managed correctly. Few organizations truly understand the makeup of their digital landfills. But, that is soon to change. According to a recent eDiscoveryJournal survey, more than 50% of organizations plan shared drive migration and clean-up projects – more than any other named information governance projects.

These projects aim to defensibly delete unnecessary, outdated, or duplicative information while keeping valuable knowledge or content that is on Legal Hold. This is not just a nice corporate “house keeping” idea; it is now a necessity due to the high growth rate forecast by business analysts. McKinsey Global Institute, for example, projects data to grow at 40% per year; thus making it virtually impossible to effectively and economically store and manage organizational information without some form of culling.

In order to do this, organizations need to efficiently profile data. This insight into information can help get past analysis paralysis to clear digital landfills. In this webinar, eDJ Analyst Greg Buckles and Jim McGann will examine practical approaches to data profiling and how to set organizational goals. We will example the approaches to information governance, such as managing data in-place, dealing with Legal Holds, selecting targets for profiling, and information classification. We will examine case studies focused on PST audits and profiling for disposition. Finally, this webinar will offer pragmatic advice on how to use data profiling to achieve immediate results today while building out a larger information governance strategy and plan.

Space is limited. Register now.

7 signs a data breach could be looming

Data breaches have made the headlines much too often lately and left many IT, legal and compliance departments to wonder how they would react to a breach.

But instead of reacting, you can proactively assess your risk of a data breach and work to solve any vulnerable areas during a self audit. Look to see if any of these red flags live in your data environment.

  1. Mystery data. Do you know the type of data located on every server, backup tapes and even hidden email files such as PSTs? Different custodians within the organization create and maintain different types of data at different levels of sensitivity. By not knowing who created what and where it is, it leaves the door open for files to get lost and fall into the wrong hands.
  2. Poor archiving. Do you practice value-based archiving or an archive everything strategy? The latter leaves your important, sensitive data lost among a network of junk. Data gets lost and forgotten about until misplaced.
  3. Duplicates. How do you manage your duplicate data and do you know where your duplicates are? It doesn’t make much sense to protect one document when hundreds of copies of it exist in the enterprise. Understand and manage duplicate data.
  4. Personally Identifiable Information. Does your sales or service team routinely handle credit cards, Social Security numbers or other PII? Could any of that information have been sent over email by someone who does not understand the risks? Audit your system for PII.
  5. Un-interpretable data. Un-interpretable data is data that belonged to an ex-employee and was created a number of years ago likely has little business value, but it is a compliance risk. It can no longer be properly interpreted in its original context. Jokes can be crimes. Misunderstandings can become lawsuits. How much turnover does your business have?
  6. PSTs. These sensitive little email files don’t live with the rest of the emails, often creating copies or mini archives that go unmanaged. Where do they live, who owns them and when were they last accessed?
  7. Executive data. How the former CEOs email is handled and how last summer’s interns email is handled should be dramatically different. Are they held in an archive on retention policies with a set expiration dates or still on the computer they used?

You likely recognized at least one flag that exists in your data center and if you found four or five, you’re with the majority of large companies. There’s help out there. Email for more information or visit:

Offshore data breach has dirty laundry flying

Offshore data breach has dirty laundry flying

The hottest story of the morning, and likely until the media takes North Korea a bit more seriously, is the exposure of secret files from offshore bank accounts held by some of the richest and most controversial people on the planet… and some ordinary Joe’s with a little extra cash, too.

Basically 2.5 million files were leaked from more than 120,000 offshore companies and trusts, exposing a lot of dirty laundry. The International Consortium of Investigative Journalists along with 38 other media partners collaborated to sort through this mess of cash transfers, incorporation dates and links between companies and individuals.

The whole thing leaves very mixed emotions. Data breaches are preventable, shouldn’t happen and causes a very concerned feeling that if it can happen to highly-sensitive accounts backed by tens and hundreds of million dollars – where else can it happen? (More on that later.)

There’s also the sympathy for the doctor, dentist, investor and other hard workers that were just trying to collect a better interest rate, not pay even higher taxes or are in fear of having their government take their money through no fault of its own. After seeing the going interest rates for Money Market Accounts, my sympathy is even higher.

The celebrities and big-name politicians, a little less sympathetic and a little less concerned – blame it on the Kardashians.

Then there’s the sense that cheaters/liars/thieves/crooks never prosper. The consortium allegedly uncovered laundering, organized crime and other financial indiscretions. According to the story, studies have estimated that cross-border flows of global proceeds of financial crimes total between $1 trillion and $1.6 trillion a year.

Now that we covered all the major facets of this particular leak, let’s get back to the concept of data breaches. What went wrong here?

Were documents not properly encrypted? Was this primarily older data that was stored away and forgot about? Could employees have let the information slip? How did this all happen?

Having seen a few data breaches in my lifetime, they are usually a result of one of a few things:

  • Data not secured properly behind the firewall, not encrypted, not kept where it’s supposed to be or it’s a duplicate that should not exist is easily leaked by people out to do nothing but access other people’s information for personal gain.
  • Data has become old and forgotten about. As other servers are upgraded, the one with information from five years ago remains untouched and become vulnerable. Sadly it’s quite preventable as long as you either protect the data or set the retention policy of old data to retire.
  • Data is being accessed by those in the company that should not have access to it. The data storage lacks proper permissions and records of who accessed what and when. This ability can be too tempting for some.
  • Archives meant to hold such documents contain everything, just in case. In doing that, data gets lost and forgotten about until leaked.

The good news, all can be properly managed with knowledge of what exists, strong information governance policies and a tool to make it all possible.

Discover how to keep your name from appearing in headlines like this. Download Achieving Effective Information Governance through Data Profiling

Unmanaged, unstructured emails are a fire waiting to start

Over time, email piles up in massive servers, archives, even users desktops and it becomes like a matchbook underneath a child’s bed. Alone, it causes no threat and just sits there, waiting. They can go years and even a lifetime without ever causing a problem.

While no one would leave a matchbook underneath a child’s bed, as it’s completely unfathomable, few think twice about their email servers.

But, why such a visceral reaction to leaving a matchbook in a kid’s room? The matches are not going to burst in to flames, they won’t just spark old comic books and baseball cards, and matches are not the easiest thing to start – even as an adult. We take precautions because of what could happen if those matches got into the wrong little hands.

So why do we just hoard email on servers, desktops and even on legacy backup tapes when there are harmful matches among them? Within the millions of email are Social Security numbers, contracts, legal documents, regulatory compliance papers and emails that can no longer be properly interpreted. Like the matchbook, this dark data just sits there. They don’t just expose themselves, they don’t just jump through firewalls and they aren’t just going to send themselves.

Yet, all it takes is one set of wrong hands and a fire can quickly develop. Thieves search for personally identifiable information that can cause loss of customers, FTC interference and identity theft. Legal and regulatory documents can’t be found or end up in the wrong hand causing fines and penalties. Plus, don’t forget all the money needed to repair and upgrade fire walls and pay legal fees associated with breaches.

Just like a parent sets the rules, compliance, legal, IT, records managers or another guardian needs to set policies surrounding emails. Retention policies, containing both archiving and deletion policies, should be in place to govern data. One leading analyst group recently estimated that less than one percent of companies actively have and enforce an information governance policy.

Much of this goes back to the tools – how do you set policy around data when you don’t know what exists or where? It’s near impossible to understand unstructured data and uncover all those pesky, hidden PST files. But now the technology exists in the form of unstructured data profiling.

Data profiling, sometimes called file analysis, is a process where all forms of unstructured files and email are analyzed and the user is provided a searchable ‘map’ and comprehensive summary reports of the metadata including type of information that exists, where it is located, who owns it, if its redundant, and when it was last accessed.

Optionally data profiling can look beyond metadata and go deep within documents and email for content supporting eDiscovery keyword searches or even personally identifiable information (PII) audits for sensitive content such as Social Security or credit card numbers.

Not only does the technology exist, but it exists at a price point that makes it affordable to deploy, leaving no room for excuses why the matches in the email server and hoping the wrong pair of hands doesn’t find it. Even for those that don’t want to throw out or move the matches – it’s imperative that you at least know the matches are there so they aren’t left next to the comic books.

Unfortunately, many won’t find the motivation to find, expose and isolate the matches until after a breach, but those that see the proactive importance of simply knowing what data is being stored, visit or contact

Data profiling webinar: Accelerating Time to Data

Data Profiling Webinar: Accelerating Time to Data

Discover how to make eDiscovery time and cost effective Identifying, culling and collecting online and offline ESI has grown exponentially as the volume of data has exploded.

But eDiscovery does not have to be a long, labor-intensive, expensive process – technology and streamlined workflows can accelerate time to data.

Discover more Wednesday, April 3 at 1pm – 2pm EST during an exclusive webinar focused on answering your most pressing eDiscovery and legal hold issues, including:

• Increasing defensibility while reducing the time to search and cull ESI
• Making legal hold archives flexible for multiple litigation events as queries and legal request change
• Reducing ESI identification time and costs through data profiling

Litigation support and archiving professionals struggle to meet tight deadlines and even tighter budgets for far too long.

Register now for this free webinar, brought to you by Index Engines and ACEDS, and learn how to keep your ESI collection and management costs in check while accelerating time to data.

Your presenter: Jim McGann. Jim is the Vice President of Marketing at Index Engines. Jim has extensive experience with eDiscovery and Information Management in the Fortune 2000 sector. He is a frequent writer and speaker on the topics of big data, backup tape remediation, electronic discovery and records management. He is a frequent speaker on Big Data management, eDiscovery, litigation readiness and data profiling.