Index Engines will be attending ILTA 09 next week, taking place outside of Washington, DC. As an eDiscovery market influencer, Index Engines will be attending the conference as a consultant rather than a vendor. We’ll be meeting with law firms, service providers and end users educating them about how the discovery of backup data is becoming more and more necessary, especially with the passage of the new California eDiscovery Act. Check back for an event recap after our time at ILTA next week.
Some say Index Engines LAN indexing is faster than a speeding bullet. This week Index Engines announced that the new 3.0 platform performs full content and metadata indexing of LAN data at sustained rates of up to 1 Terabyte per hour using a single indexing node. This same platform also indexes up to a billion files and email in one system. This speed was validated by BlueArc on their Titan network storage platform. Read the press release here.
These unprecedented metrics demonstrate the power of Index Engines’ purpose-built operating system for indexing enterprise-class data environments versus traditional data processing solutions. Already implemented by corporate clients across the U.S. and Canada, Index Engines LAN indexing platform is changing the way enterprise data is managed. Find out how these blazing fast speeds are achieved in Index Engines new technology overview.
A Global Energy Corporation was faced with complying with federal regulations around data retention and also accommodating data discovery requests to support their legal team. The data in question was 30 terabytes of unstructured backup data stored on network attached storage. After trialing well known indexing and data management platforms, and having these products fail, Index Engines technology was implemented. The other indexing solutions they evaluated required seven or more nodes and over 30 days to index the 30 terabytes. One Index Engines node was installed and the job was complete in a week. Read about this implementation & the cost savings and efficiency this IT team is experiencing in the full case study.
Earlier this month E-Discovery Bytes wrote an article suggesting that Google may come to the rescue for information management as it applies to E-Discovery. We hope that readers aren’t facing a true emergency if they attempt to follow this article’s advice. There are some key pieces of information that author Anthony Chan overlooks.
It is true that Google has solved search over the public Internet. However applying the same technology to the enterprise is a different story. In order to index the Internet Google has built massive data centers to process the data. In processing this data they make a cache copy of all web pages (perform a Google search and you will see a link “Cached” next to each search result. This process works for the web where you can spread the cost of this data center among billions of users. This process does not work for the enterprise. Having exabytes of enterprise data that would need to be cached (replicated) is not practical. It would require doubling the entire enterprise storage environment – something no company would undertake.
In order to make exabytes of ESI discoverable you need to build an affordable solution that is designed to be efficient (small index footprint), scalable (billions of objects per server), and fast (1TB/Hour/node processing speed). This is not what Google has delivered. It is a fine point solution for smaller projects, but when you talk terabytes or exabytes you need to look elsewhere.
Mr. Chan should take a close look at Index Engines approach to enterprise discovery before he attempts his next rescue mission. Index Engines is all that Google is not; efficient, scalable and fast. View our discovery solutions here.
A recent article in Law.com highlights the differences between the new California Electronic Discovery Act and FRCP. The article states “The new California legislation, by contrast (to FRCP), assumes that all electronically stored information is accessible. Rather than requiring the requesting party to bring a motion to compel in the first instance, as under the federal rules, it instead provides that the responding party may bring a motion for a protective order”.
With the cost of ESI discovery dropping, and ease of access to previously inaccessible ESI (backup tapes) via new technology from Index Engines, it is no surprise that these regulations are entering the market.
Here is a link to the full article.
The court just ordered production of disaster recovery backup tapes, despite defendant’s argument that ESI on the tapes is not reasonably accessible. This is happening more and more as technology is making tape discovery less painful. Index Engines automated approach saves significant time and expense when dealing with ESI from tape. No longer does the burden argument hold. This article from Law.com summarizes the case:
Preservation of Disaster Recovery Backup Tapes?
Do you need to preserve disaster recovery backup tapes that contain relevant ESI? Guidance from commentators and case law is mixed. The Federal Rules of Civil Procedure are silent on whether disaster recovery backup tapes need to be preserved when implementing a litigation hold. What we know, however, is that all relevant ESI must be preserved. Relevant ESI can be contained on backup tapes that a party deems not reasonably accessible. See FRCP 26(b)(2)(B). Assuming backup tapes are preserved and identified as not reasonably accessible, will the tapes ever be subject to discovery? In short, yes as demonstrated by Kilpatrick v. Breg, Inc., 2009 WL 1764829 (S.D. Fla. June 22, 2009).
In Kilpatrick the court ordered production of disaster recovery backup tapes, despite defendant’s argument that ESI on the tapes is not reasonably accessible. While the case does not address the question of preservation directly, it stands as a warning. Defendant repeatedly represented that active ESI met its discovery burden. Defendant also advised that additional relevant ESI might be contained on backup tapes, designated as not reasonably accessible because they were maintained for disaster recovery purposes only. Plaintiff was not buying it and moved to compel production of the backup tapes. The court agreed that the ESI produced so far seemed to have some holes and compelled limited production from the backup tapes.
Full article here.
Index Engines exhibited at LegalTech West in Los Angeles last week. And the buzz on the show floor was largely about Assembly Bill 5 – better known as AB 5. AB 5 is a bill that is sitting on Governor Schwarzenegger’s desk awaiting signature, that takes a much firmer approach surrounding the access to historical data that must be made available to support legal proceedings.
FRCP started the ball rolling by requiring that electronic evidence be made discoverable, unless an undue burden could be claimed. AB 5 takes a much harder line, effectively overrulling Zubulake’s position that data contained by disaster recovery systems was simply inaccessible. AB5, when passed into law, will not let the responsible party object to data discovery simply based on it’s location. Disaster recovery data will now be presumed fair game for discovery. Back up tapes contain the large majority of data stored for disaster recovery. As such, the Index Engines booth at LegalTech was hopping.
Index Engines is the only technology commercially available that allows fast, cost-effective, forensically sound access to files and email stored on backup tapes. Enterprise legal teams and high profile law firms are well aware of the high cost of traditional tape restoration services. These prohibitive costs are what spurred the Zubulake ruling initially. AB 5 will force the issue, tapes will become part of routine eDiscovery, and Index Engines will be there to help. We’re already helping enterprise clients process through their legacy disaster recovery tapes. Our white paper, The Anatomy of Tape, outlines how large scale tape discovery and remediation projects can be made manageable.
Update July 1, 2009: AB 5 – now officially called Chapter 5, Statutes of 2009 – was signed last night. It is now official. See this post on e-Discovery Insights (Perry L. Segal, Esq.) blog. Perry has been closely tracking AB 5.
Many eDiscovery projects are based only on the collection of individual custodian e-mailboxes. This raises some important preservation and process questions. Does ESI collection focused on user mailboxes preserve everything that is required? Will searching only those mailboxes find everything that the courts may eventually request?
If a user is acting covertly they might not leave critical evidence in their mailbox. They might be smart enough to remove it or even hide it. In that case collection that is narrow in focus will not be enough to ensure all appropriate ESI is preserved. There is an alternate approach to preserving and searching email that exists outside the limited scope of a user’s mailbox.
The most obvious location of objects that are no longer in the user’s mailbox is deleted email. When a user deletes an email it temporarily resides in the user’s deleted items folder. This folder is typically set to purge after a short period of time. After this period the email then moves to the Exchange dumpster. The dumpster is an independent component from the user’s mailbox. The email will live in the dumpster for a set number of days and is then purged. Therefore any email that a user deletes could be accessible in their deleted items folder for a short period of time. Then it would move outside their mailbox, and outside their control, to the dumpster and thus not included in the custodian’s mailbox. Therefore, ESI collection that is focused only on user mailboxes would not see any content residing in the dumpster.
Another alternate source of valuable email information is the Exchange Server transaction logs. These can be quite convoluted because they often have internal references to the specific EDB for which transactions are logged. By carefully parsing a full EDB and the subsequent log files it is possible to recreate all the emails that came in or out of the Exchange Server. Simple collection and preservation of Mailboxes via a tool that parses just the EDB will always miss this important secondary source of emails. Most importantly, the user has no ability to influence the content of the Exchange server logs. Thus, in an environment where the user is somehow bypassing the dumpster, the logs will still contain many of the emails.
Other valuable content that would not be captured in user mailboxes are email communications with other users. For example, user A, who is under investigation, initiates an email string that is relevant to the case at hand. The email was sent to user B, who is not under investigation. User A then deletes the email, and purges it from their deleted folder, and also over time it will be deleted from the dumpster. So from the perspective of user A’s mailbox, the email no longer exists. However, the email could still reside in user B’s mailbox if they did not delete it. When user A’s mailbox is collected the “smoking gun” email will be left behind even though it exists in the mail server. This scenario is common in the world of Exchange and it exposes an additional major flaw in the use of custodian mailboxes for collection, preservation, and searching purposes.
Index Engines technology allows access to a fully indexed Exchange image for deeper discovery and a more cost effective approach. Not only are full user mailboxes searchable, but also complete conversations that may reside in other user’s mailboxes and not with the custodian in question can be uncovered. Additionally the entire dumpster can be accessed and searched, making email available that was thought to be long gone. What’s more is that Index Engines technology gives access to this full set of Exchange data at a rate up to ten times faster than traditional email discovery approaches.
Late last week DFI News published an article co-authored by Index Engines and Norcross Group, outlining a new approach to eDiscovery of Exchange data. The article, entitled Forensically Sound Preservation and Processing of Exchange Data, takes the reader through the challenges of accessing a live Exchange database; it’s large, active and complex. The authors also examine a popular tool used for capturing data from a live Exchange environment. This tool, ExMerge, is a MAPI or Microsoft’s messaging API for Exchange, that was never intended for intensive data discovery.
The alternate approach for Exchange discovery outlined in this article, is a Forensic Scanning process. Norcross Group has elected to use Index Engines technology to implement this approach. By creating a snapshot of the Exchange database and scanning this image with Index Engines technology, Norcross is able to provide more thorough, cost-effective and forensically sound discovery services to their clients. The article provides a real-world case study of how this Forensic Scanning approach has been implemented and what the actual results have been.
Yesterday PC World featured an article by IDC’s Stephen Lawson entitled Why IT Should Start Throwing Data Away. Lawson discusses the pitfalls of keeping data dating back decades. Even though storage containers are cheap, the rate that data is accumulating will make these inconsequential costs add up fast. And then he points out that enterprises must also be concerned with the intangible risks of having to plow through all the legacy data for eDiscovery.
The challenge of overcoming the data stockpile is two-fold. Defining what needs to be saved, and then finding the valuable stuff among the junk. The first step, building a retention policy, is up to the organization. Each enterprise has different types, amounts and uses for their data. Although expert consultants can help, an internal records management group will need to own the definition and implementation of how their company’s data is sorted and then saved or disposed of. And as Lawson’s article points out, technology can help a RIM team understand how much and what types of data they have, but it can’t interpret what this means to the company and determine how to best to save (or not) the data.
Once the policy is written, the next hurtle is to collect the relevant information from the vast stores of old data. Where data resides can be broken down into three storage types; backup data, network data, and desktop data. Backup data, including tape and disk, contains tons of duplicate and useless data. Stores of network data are often tremendously large and ever changing. Desktop data is fragmented and difficult to trace. Index Engines offers a collection platform, that not only handles the specific challenges of accessing each type of data container, but also allows for a unified view and deduplication effort. By unlocking backup formats, Index Engines makes quick work of indexing and searching data on tape or disk. Using NDMP to index network data, Index Engines can keep pace with the ever changing volume of enterprise production data. User data can also be harnessed, indexed and searched with Index Engines platform. This technology links the indexes built from the various data stores and allows the search for unique content to be performed against them all – simultaneously. Then the true subset of valuable data can be extracted and archived. And the rest, as Lawson suggests, can be thrown away.