Technolgy Changes the Cost of eDiscovery

Kahn Consulting’s Electronic Discovery Blog recently summarized the case of Major Tours, Inc. v. Colorel, 2009 U.S. Dist. LEXIS 97554 (D.N.J. Oct. 20, 2009). What is missed by both Kahn’s blog entry and the deciding parties in this case is that the discovery of data from 2,500 tapes for $1.5M is grossly overpriced.

There are a lot of assumptions in this case. It assumes all 2,500 tapes contain email. It assumes all email is unique and no duplicates exist on these backup tapes. It assumes all 2,500 tapes contain relevant custodian mailboxes. It assumes you will pay $600 per tape. Based on these assumptions, then yes – this would be a $1.5 million dollar collection project that would take weeks of manual restoration work.

Enter with new technology and all these assumptions go out the window. Technology identifies which tapes have email, what email is duplicate, and what is in a custodian mailbox, all without ever restoring a single byte, at much less than $600 per tape. In fact, Index Engines partners are processing this scale of job in the range of 90% less than $1.5M. The dynamics of backup tapes discovery and collection has definitely changed.

Legal teams try to paint worst case scenarios so they won’t have to produce data from backup tapes – that’s their job. However the courts are becoming more educated on technology, such as Index Engines, that make getting responsive data from tape much less costly and burdensome. The burden argument is becoming less and less successful as time goes on. And decisions such as this one will be few and far between as the industry becomes more educated on tape indexing technology.

Ideal Early Case Assessment

Early Case Assessment (ECA) is a hot topic at events (Masters Conference) and on blogs. In Christine Taylor’s recent Network Computing article she is recommending that ECA move up in the eDiscovery process, so that involved parties can understand what they have as quickly as possible. The goal is to control costs and “grasp the merits of a case from the beginning”. Some eDiscovery tools have attempted to support ECA using a sampling process. Why sampling? Because without a specific list of files and email requested by the legal teams (resulting from the meet and confer process) you would need to sort though large volumes of data to truly understand what you have. Most discovery tools have limited processing capability. They are not designed for large scale collection and review, so sampling small subsets of the data to “get a general idea of what exists” is the best they can do. Index Engines on the other hand, scales to perform cost effective and efficient processing of all ESI early in the process. In order to satisfy Christine’s recommendation, new tools need to be deployed, however the goal of these new tools should be to streamline the process and save significant dollars when litigation comes knocking.

Once a case is introduced, Index Engines’ customers are able to scan backup tapes and online file systems to get a true understanding of what exists. To be clear – Index Engines does not make a copy of the data, so the expense and pain of moving large volumes of data into a review platform is not a factor. Index Engines generates a searchable index of large volumes of ESI (both network and tape data), this data can be easily reviewed (ECA) without moving it off the existing storage platform (offline or online storage). When the lawyers determine what is required for the case, based on their comprehensive knowledge of what exists, they can easily collect it from the existing indexed ESI. Ideal ECA solutions need to push only the relevant data to the legal review platforms when the time is right. Index Engines clients are doing this today, and saving significant cost and time.

The Buzz from ARMA International

Index Engines is exhibiting at ARMA International in Orlando, Oct. 15-17. The talk on the show floor is largely about being overwhelmed with data – both current and historical. Records managers are experiencing first hand the vast amounts of information being generated. Data is being created faster than it can be processed. Index Engines processing speeds of up to 1 Tb/Hr is the solution that will allow the ARMA show-goers to keep up with data production. In order to catch up on the mountains of unprocessed historical data, the Index Engines capabilities to index and search backup formats are key. Records managers are in agreement that understanding what they have is Step One to creating a solid strategy to manage the data. Index Engines speed, scale and breadth of formats and containers will allow RIM managers to harness their enterprise’s data and move beyond this first step.

Masters Conference Commentary

Index Engines is sponsoring the 2009 Masters Conference in Washington DC, Oct. 13th-14th. The 1st day buzz from the event floor is all about early case assessment (ECA). The counsel in attendance are scoping out new approaches for in-house data review. The knee-jerk process of reactive eDiscovery is facing scrutiny. Rather than continuing to pay high fees for rushed, 3rd party collection, Masters’ attendees are deploying in-house data collection solutions for proactive discovery. This methodology enables ECA, reduces downstream review and empowers legal teams to make informed decisions when faced with a litigation event. With processing speeds up to 1 Tb/Hr and the capability to index backup data with out restoring it, Index Engines is uniquely positioned to facilitate in-house discovery, and make ECA an achievable milestone.

Storage Switzerland Reviews Index Engines 3.0

Storage Switzerland’s senior analyst, George Crump, recently reviewed Index Engines 3.0 platform. He was impressed by both the range of data and variety of containers the product indexes from, and how fast the data processing occurs (1 Tb/Hr). In his write up, Index Engines 3.0 – Data Discovery for IT, Crump recognizes that the speeds and feeds of the Index Engines platform take eDiscovery to a whole new level. Data discovery across the enterprise is feasible when an appliance can handle a billion objects at up to 1 Tb/Hr. Index Engines core technology has been built for just that; proactive data discovery and management – or what Crump terms IT Discovery.

Backup into an Archive

Recently Matthew Lodge from Symantec wrote an article outlining why data archives, not backup tapes, should be the source electronic data discovery. In his article, Archiving Is For E-discovery; Backup Is For Recovery, he suggests that IT, HR, Legal and Records Management all work together to design and implement a corporate archive policy and system. This is all good advice, but what about the years of data sitting on the backup tapes? Lodge sites huge costs for restoring and discovery backup data. And traditionally this was the case. But Index Engines Tape Discovery solution changes the situation.

New technology allows backup tapes to be scanned, indexed, searched and the pertinent data extracted into an archive for a fraction of the cost of traditional restore. Lodge’s idea of building an archive to prepare for eDiscovery is excellent. But without the historical data stored on backup tapes, this archive will only tell part of the story. For a truly valuable resource, index your backup tapes proactively, populate the archive, and be informed the next time a legal event occurs.

Index Engines at ILTA

Index Engines will be attending ILTA 09 next week, taking place outside of Washington, DC. As an eDiscovery market influencer, Index Engines will be attending the conference as a consultant rather than a vendor. We’ll be meeting with law firms, service providers and end users educating them about how the discovery of backup data is becoming more and more necessary, especially with the passage of the new California eDiscovery Act. Check back for an event recap after our time at ILTA next week.

Fast LAN Indexing Speed Explained – 1 TB/Hr/Node

Some say Index Engines LAN indexing is faster than a speeding bullet. This week Index Engines announced that the new 3.0 platform performs full content and metadata indexing of LAN data at sustained rates of up to 1 Terabyte per hour using a single indexing node. This same platform also indexes up to a billion files and email in one system. This speed was validated by BlueArc on their Titan network storage platform. Read the press release here.

These unprecedented metrics demonstrate the power of Index Engines’ purpose-built operating system for indexing enterprise-class data environments versus traditional data processing solutions. Already implemented by corporate clients across the U.S. and Canada, Index Engines LAN indexing platform is changing the way enterprise data is managed. Find out how these blazing fast speeds are achieved in Index Engines new technology overview.

Energy Giant Chooses Index Engines LAN Solution

A Global Energy Corporation was faced with complying with federal regulations around data retention and also accommodating data discovery requests to support their legal team. The data in question was 30 terabytes of unstructured backup data stored on network attached storage. After trialing well known indexing and data management platforms, and having these products fail, Index Engines technology was implemented. The other indexing solutions they evaluated required seven or more nodes and over 30 days to index the 30 terabytes. One Index Engines node was installed and the job was complete in a week. Read about this implementation & the cost savings and efficiency this IT team is experiencing in the full case study.

E-Discovery Rescue?

Earlier this month E-Discovery Bytes wrote an article suggesting that Google may come to the rescue for information management as it applies to E-Discovery. We hope that readers aren’t facing a true emergency if they attempt to follow this article’s advice. There are some key pieces of information that author Anthony Chan overlooks.

It is true that Google has solved search over the public Internet. However applying the same technology to the enterprise is a different story. In order to index the Internet Google has built massive data centers to process the data. In processing this data they make a cache copy of all web pages (perform a Google search and you will see a link “Cached” next to each search result. This process works for the web where you can spread the cost of this data center among billions of users. This process does not work for the enterprise. Having exabytes of enterprise data that would need to be cached (replicated) is not practical. It would require doubling the entire enterprise storage environment – something no company would undertake.

In order to make exabytes of ESI discoverable you need to build an affordable solution that is designed to be efficient (small index footprint), scalable (billions of objects per server), and fast (1TB/Hour/node processing speed). This is not what Google has delivered. It is a fine point solution for smaller projects, but when you talk terabytes or exabytes you need to look elsewhere.

Mr. Chan should take a close look at Index Engines approach to enterprise discovery before he attempts his next rescue mission. Index Engines is all that Google is not; efficient, scalable and fast. View our discovery solutions here.