Pension Benefit Guaranty Corporation Addresses Offline Tape Liability with Index Engines Enterprise eDiscovery Platform

Pension Benefit Guaranty Corporation Addresses Offline Tape Liability with Index Engines Enterprise eDiscovery Platform

Leading Pension Benefits Company Selects Index Engines to Deliver Rapid Access to Content on 45,000 Offline Tapes to Support Discovery Search Requests

Holmdel, NJ—— December 4, 2007 —- Index Engines, the leader in enterprise discovery solutions, today announced that Pension Benefit Guaranty Corporation (PBGC) has selected the Index Engines Enterprise eDiscovery Platform to perform comprehensive data indexing for 45,000 tapes of offline data retained on up to 10 years of backup media. Index Engines, with electronic discovery services provided by ONSITE3, will enable PBGC to speed the electronic discovery process by rapidly indexing archived data without requiring the time consuming process of tape restoration. As a result, PBGC will only need to restore what they need, rather than the entire tape contents, significantly saving both time and money.

PBGC is a federal corporation created by the Employee Retirement Income Security Act of 1974. It currently protects the pensions of nearly 44 million American workers and retirees in 30,330 private single-employer and multi-employer defined benefit pension plans. PBGC receives no funds from general tax revenues. Operations are financed by insurance premiums set by Congress and paid by sponsors of defined benefit plans, investment income, assets from pension plans trusteed by PBGC, and recoveries from the companies formerly responsible for the plans.

PBGC has more than 45,000 offline backup tapes in its data archive which, over the past 10 years, have been backed up using both CA ARCserve and Symantec Backup Exec backup software formats. ONSITE3 will use the Index Engines solution to provide next-generation enterprise search technology that will reduce PBGC’s data discovery process from months to days. The Index Engines solution will index the full content contained in PBGC’s document files and emails, including nested files, as well as metadata information. The resulting indexes can be easily searched by PBGC paralegals, researchers or corporate counsel representatives and data retrieved as needed for ad hoc discovery requests. Ultimately, with the Index Engines solution, PBGC can search their tape archives, regardless of their format, without first restoring the data to find relevant files and emails.

PBGC will begin its discovery-readiness project this quarter by indexing 20 percent of its offline tapes, using representative samples from each historic quarter over the past 10 years. PBGC will subsequently begin to index remaining offline tapes in similar increments until its data is completely indexed.

PBGC selected the Index Engines Enterprise eDiscovery Platform because it offers the following benefits:

* The only solution to perform direct indexing of offline tapes without having to restore the contents.

* Enables rapid risk assessment prior to legal proceedings with an enterprise-wide index that can be used to determine exposure based on content of files and emails.

* Ensures accuracy of discovery results by searching against an electronic repository using full text and metadata.

* Delivers fast response to legal requests by automatically and continually indexing data to ensure comprehensive search results.

* Provides a long-term eDiscovery platform that can be used over and over again with a reusable searchable repository of files and emails that can save time and money versus collecting and reviewing data for each unique project.

The Index Engines Enterprise eDiscovery Platform

Index Engines’ Enterprise Discovery Edition is an appliance-based offering that collects and prepares data in support of electronic discovery faster and more accurately than any existing solution on the market. Index Engines’ patented technologies automate information access and retrieval by streamlining online discovery and eliminating the need to restore offline tape content in order to retrieve actionable data. This approach dramatically compresses the electronic discovery process. The industry’s most extensive and powerful indexing platform, the Index Engines Enterprise eDiscovery Edition is the only solution capable of indexing all online and offline content efficiently, allowing instant access and retrieval of all corporate data.

Index Engines Announces Affordable End-to-End Automation of the EDiscovery Process

New Index Engines eDiscovery Edition Adds Tape Sorting and Object Extraction Capabilities Which Eliminates Time Consuming and Expensive Discovery Processes

Holmdel, NJ—— November 20, 2007 —- Index Engines, the leader in enterprise discovery solutions, today announced the eDiscovery Edition of its Tape Engine which includes new features to automate the eDiscovery process for offline tape content. The eDiscovery Edition makes enterprise litigation readiness a reality by automating tape data sorting and object extraction – eliminating the time consuming and expensive process of restoring tapes in order to begin discovery. Using patent-pending technology, the Index Engines Edition directly indexes data on offline tapes without ever restoring it. As a result, corporate legal teams can now instantly search the metadata as well as full text content of difficult-to-access data archives and extract the relevant files and email in order to respond quickly to litigation requests and to mitigate risk.

“Many companies have volumes and volumes of unsearchable data locked in proprietary tape backup formats that they have accumulated over time or through mergers and acquisitions,” said Jeffrey Fehrman, president, Electronic Evidence Labs, a division of ONSITE3. “The tremendous risk of the unknown contents of this data has prompted many organizations to wisely implement proactive litigation readiness processes. However, the discovery of this archived data on offline tapes is extremely costly and time consuming. The Index Engines eDiscovery solution unlocks this data without the time and expense of restoring each backup tape to search it. As a result, companies can now ensure litigation readiness through an automated approach that is both cost effective and fast.”

Index Engines Enterprise eDiscovery Edition is the only solution on the market that can directly index offline tape content and make it fully searchable without having to restore the tapes. This platform understands common tape backup formats (ArcServe, TSM, NetBackup, Backup Exec, and NetWorker) and directly indexes unstructured files and email, even back five to 10 years. Once this data is indexed, it is immediately searchable in order to find relevant content enabling companies to quickly find “smoking guns” in minutes or hours rather than days, weeks or even months.

The Index Engines Enterprise eDiscovery Edition includes the following new features to automate the offline tape discovery process:

* Tape Library Support: The use of libraries in litigation support is critical due to the large volumes of tapes that have been generated over time, or inherited through mergers and acquisitions. The use of individual tape drives to support discovery projects is not practical. Index Engines solution supports all common tape libraries for the ingestion of the large volumes of offline tape data. Tape libraries are connected to the Index Engines appliance via a SCSI or fibre channel connection and an auto-configure utility recognizes the specifications of the library and its internal tape management software in order to automate the cataloging and indexing of tape cartridges.

* Tape Management Utility: Offline tapes are typically not well organized or even labeled, so it is difficult to know the proper order of tapes when inserting them into a tape library. The Index Engines platform contains a new tape management module that automatically generates a catalog of the tapes loaded in a library. Once a catalog is generated indexing will occur across all tapes in the library in a logical fashion, or on backup sets selected from the catalog.

* Automated Extraction Module: Retrieving relevant content from tape requires that the contents first be restored using the original backup software used to generate the tape. In many cases, the backup software may no longer be available, making this a complicated process. Email adds another layer of complexity because full mailboxes or databases must be extracted before having access to relevant emails. The Index Engines platform automates the process of restoring relevant tape data. Following a simple metadata and/or content search to determine the relevant content, files and/or email can be selected and extracted from tape without using the original backup software. The Index Engines solution eliminates the need for the original backup software application and enables companies to only restore relevant content versus significant volumes of useless data.

“With this new version our enterprise clients can now proactively address the liability contained in their offline tapes without having to spend $1,800 per gigabyte to process tapes,” said Jim McGann, vice president of marketing, Index Engines. “Automation of the complete process, from managing tapes to ripping the relevant data off these tapes, is now practical.”

Reducing Costly and Time Consuming Discovery Steps

Collecting evidence from offline tape is normally a very lengthy and expensive multi-step process. Using traditional methods, eDiscovery includes seven key phases: 1) organize tapes; 2) prepare the software and hardware environment for tape restores; 3) restore the tape data; 4) index the data contents; 5) clean the contents by eliminating duplicate information; 6) search the contents to find relevant data; and 7) extract the relevant content to deliver to legal counsel.

With the Index Engines Enterprise eDiscovery Edition three of the most costly and time consuming steps have been eliminated. This increases discovery time predictability and ensures rapid litigation readiness. Using the Index Engines Enterprise eDiscovery Platform, collecting evidence only requires the following steps: 1) automate tape organization; 2) directly index the data on tape without restoring it; 3) search the content to find relevant data; 4) extract the relevant content to deliver to legal counsel.

Pricing and Availability

The Index Engines Enterprise eDiscovery Platform available now with pricing starting at $50,000. The new Extraction Module is available as an add-on to the core product at a price of $25,000.

Index Engines Welcomes the LDM Group as Authorized Litigation Ready Partner for Enterprise Discovery Solution

Index Engines’ Information Discovery Capacity Jumps to 14,000 Backup Tapes a Month

Dallas TX and Holmdel, NJ—— October 9, 2007 —- The LDM Group, a leading electronic discovery and litigation support company serving major corporations and their outside counsel during litigation discovery, and Index Engines, the leader in enterprise discovery solutions, today announced that The LDM Group has become an authorized partner in the Index Engines Litigation Ready™ Partner Program.

As an authorized reseller of the Index Engines Enterprise Discovery Platform, The LDM Group will allow customers to eliminate the need to restore data from offline tapes in the search for actionable information by improving efficiencies in the online eDiscovery process and automating access and retrieval of data. With the addition of the LDM Group to the Index Engines network of Litigation Ready Partners, Index Engines now has the capacity to help customers perform information discovery on up to 14,000 backup tapes a month.

“The Index Engines Enterprise Discovery Platform demonstrates true technological innovation,” said Anil Keswani, CTO of The LDM Group, LLC. “This solution gives us the ability to provide our clients with a high level of Electronic Discovery capability, allowing us to stay ahead of the curve in the legal arena when it comes to leveraging technology to solve the many challenges presented by electronic discovery.”

“The LDM Group offers its clients top-notch litigation support services, and this partner program will give them the technological edge to surpass their clients’ expectations,” said Jim McGann, vice president of marketing, Index Engines. “Index Engines’ Enterprise Discovery Platform and The LDM Group’s litigation support services together make a powerful combination.”

The Index Engines Litigation Readyâ„¢ Partner Program provides superior technology and resources for partners such The LDM Group, giving them the ability to take advantage of continually growing opportunities in the high-demand market for eDiscovery. The program delivers compelling sales and marketing initiatives for lead generation, technology training, sales education and tools to assist during customer engagements. For more information please visit

About The LDM Group

The LDM Group is comprised of industry veterans whose knowledge of discovery and vast project experience coupled with their technology expertise provide corporations, law firms and governmental agencies with a powerful advantage during matters involving electronically stored information (ESI). The LDM Group is committed to creating an alliance with each client, fostering a collaborative partnership for efficient and effective litigation support. The LDM Group’s mission is to be the best resource for information and support services during litigation discovery and governmental investigations. The LDM Group provides advisory and support in the areas of: eDiscovery, Native File Review, Web-Hosted Review, Computer Forensics & Collection, and Database Design and Repair. For more information on The LDM Group, please visit the company’s website at

Norcross Group Joins the Index Engines Litigation Readyâ„¢ Partner Program for Enterprise Discovery

Index Engines Platform Allows Norcross Group to Offer Enterprise Wide Discovery of Data in Support of Electronically Stored Information, Litigation Triage, E-Discovery, and Investigations

Norcross, Ga. and Holmdel, N.J. October 2, 2007 — The Norcross Group, the trusted resource for locating critical information that supports lawsuits, subpoena compliance, and internal investigations; and Index Engines, the leader in enterprise discovery solutions, today announced that the Norcross Group has joined the Index Engines Litigation Ready Partner Program as an authorized reseller of the Index Engines Platform. Norcross Group will offer the Index Engines Enterprise Discovery Platform to deliver comprehensive discovery of online and offline data across the enterprise to support initiatives where detailed access to metadata and full text content is critical.

This partnership will enable the Norcross Group to offer the Index Engines Enterprise Discovery Platform to its customers to automate information access and retrieval, providing more efficient online discovery and completely eliminating the need to restore offline tape content in order to uncover actionable data.

No other discovery platform has the ability to integrate into both online and offline environments the way that the Index Engines solution does. We have been looking for this type of solution for some time. Now that we are a member of the Index Engines Litigation Ready Partner Program we now have an innovative edge in the e-discovery space.

“Integrating Index Engines’ comprehensive discovery solution into our consulting practice will allow us to deliver world class data intelligence solutions to our clients,” said Vickie Hill, president of the Norcross Group. “No other discovery platform has the ability to integrate into both online and offline environments the way that the Index Engines solution does. We have been looking for this type of solution for some time. Now that we are a member of the Index Engines Litigation Ready Partner Program we now have an innovative edge in the e-discovery space.”

“The Norcross Group is a trusted and dynamic firm in the litigation and compliance world, well positioned to take advantage of our technology on behalf of its growing list of clients,” said Jim McGann, vice president of marketing, Index Engines. “Those clients will benefit from the dynamic combination of the Norcross Group’s impressive expertise in litigation readiness, investigations and compliance supported by Index Engines’ Enterprise Discovery Platform.”

About the Index Engines Litigation Ready Partner Program
The Index Engines Litigation Ready Partner Program has been designed to provide superior technology and resources for partners, including the Norcross Group, as they embrace the growing revenue opportunities in the high-demand market for e-discovery. The program delivers compelling sales and marketing initiatives for lead generation, technology training, sales education and tools to assist during customer engagements. For more information, please visit

About the Norcross Group
Norcross Group provides a full range of electronic discovery services for litigation support and subpoena compliance, including complex digital forensics, all in accord with the recent Electronically Stored Information (ESI) amendment to the Federal Rules of Civil Procedure. The firm’s deep knowledge of digital investigation and discovery helps organizations simplify and streamline the retrieval and retention of critical information, whether in paper or digital format, including misplaced, erased, or damaged data. For more information on the Norcross Group, please visit:

About Index Engines
Founded in 2003, Index Engines is the leader in enterprise discovery solutions. The company’s mission is to organize enterprise data assets and make them immediately accessible and easily manageable. Businesses rely on Index Engines’ solutions for comprehensive insight into their data in order to streamline the discovery, classification and management of enterprise assets. The Index Engines discovery platform is the only solution on the market that understands storage protocols, enabling high-speed, efficient indexing of backup formats including IBM TSM, EMC NetWorker, Symantec NetBackup, and CA ArcServe. The Index Engines family of products includes a SAN Engine for ingestion of data in flight on the storage network, a LAN Engine for ingestion of data from NAS filers, and a Tape Engine for direct indexing of offline tape content with the need to restore files and email.

Index Engines is privately funded and headquartered in Holmdel, New Jersey. Its products are sold and serviced worldwide, directly and through Index Engines’ channel partners. For more information on Index Engines, please visit the company’s website at

Index Engines and Litigation Ready are trademarks or registered trademarks of Index Engines, Inc. All rights reserved. All product names mentioned are or may be trademarks or registered trademarks of their respective organizations.

Index Engines and BlueArc Partner to Drive EDiscovery Performance, Efficiency and Scalability

Index Engines, a leader in enterprise discovery solutions, today announced that it has partnered with BlueArc® Corporation, a leader in scalable, high-performance network storage to develop an optimized version of the Index Engines LAN engine to support BlueArc’s high performance Titan 2000 unified network storage systems for fast, economical search and discovery of unstructured online data. The new Index Engines/BlueArc solution was developed as part of BlueArc’s participation in the Index Engines Litigation Ready™ Partner Program. The resulting product reduces overall eDiscovery costs and turnaround time while giving companies the dynamic interaction with data that enables them to perform on-demand decision making for legal and regulatory compliance. Read more >>

“Gordian Knot of IT Complexity”

Taneja Group just wrote a research paper on us. “Every so often, we find a vendor that cuts through a Gordian knot of IT complexity with one well conceived reframing of the problem. The Index Engines Appliance does precisely this, and we believe customers will agree. For enterprises exploring the complex world of classification and indexing tools, we recommend an evaluation of the Index Engines Appliance.”

Get a free copy of the full report on our website – here is a link.

Efficient programming on multi-core systems

My last blog entry talked about multiple cores being the most cost effective way to improve the number of instructions processed per-clock cycle. With future processors having many cores standard, it is worthwhile to take a look at what application writers can do to maximize the performance of multi-threaded code.

First, Locks are very expensive to execute in that they stall pipelines and lock memory cache lines. A typical multi-threaded program with locks and no lock contention, can run as much as 30% slower when compared to a non multi-threaded version. This slowdown is the overhead of executing the exclusive memory access instructions which significantly reduces the instructions per-clock executed. Also, mutex’s consume a lot of space. In Linux, a POSIX pthread_mutex is 40 bytes.

One way to avoid this locking overhead is to lessen the use of locks as much as possible. Take advantage of some of the core guaranteed atomic operations of the processor. All modern processors guarantee that simple loads and stores to aligned memory addresses are atomic. The Linux atomic_t and atomic64_t types supports a set of simple increments and decrements that will perform much better than a lock/unlock pair. Use algorithms that are based on atomic fetch_and_add principles to resolve conflicts. There are many public algorithms for list management and other common algorithms that take advantage of these primitives. Do a quick search on the Web and keep those cores busy.

Pushing Processing

In my earlier entry I spoke about the challenges of indexing the Enterprise. The biggest challenge is the speed at which indexing has to occur. Enterprises are creating data faster than traditional indexing methods can index data. In order to process billions of files at high speeds we needed to implement a new approach to scraping words. This method, which utilizes our advanced text scanning algorithms, works best on CPU architectures with very high-speed memory bandwidth and low-latency to that memory. After much analysis we found that the AMD Opteron CPU was the best fit. Because of Opteron’s Direct Connect Architecture, the latency for accessing random data from main memory has been minimized. This problem has also benefited by the gaming market which has driven down the CAS latency of DDR400 memory to 2 clock cycles.

What can we expect in the future? It is clear that the future of processors is multiple-cores. Thermal issues have put a damper on increasing clock speeds so any new available real estate is being used to add more cores which effectively increases the number of instructions that get executed per clock cycle. Existing CPU intensive applications will need to be modified to take advantage of these new architectures. Even though multi-threading has been around for a long time it is still worth examining this issue on multi-core systems and I will address this issue in my next blog entry.

What would I love to see in a future processor? Like everyone else our application would benefit from more L1 and L2 cache. This is obviously important to AMD also as they recently licensed the Z-RAM high density memory IP from Innovative Silicon. Hopefully some of that technology will significantly increase cache space. We could also use a simple built-in hash instruction for hashing strings. The best public domain hash functions take about 20 operations per word. I would guess a 10 to 20 fold speed up for a silicon approach.

Overall I look forward to many more cores and integrated DDR2 memory controllers. Just keep in mind that your code has to be tuned to take advantage of the new CPU’s otherwise you will have a lot of idle cores.

Is Indexing the Enterprise Possible?

At first glance this seems like an impossible task. When Google last published their stats they had indexed approximately 8 billion web pages as of first quarter 2005 (they have since stopped publishing a number however speculation has it well above 10 billion). That sounds like a lot, until you consider that most of the global 500 can easily exceed this number with historical files and email alone (One investment bank told me they have well over 2 billion active emails not counting their archives). Therefore, a large enterprise would be burdened with the same level of indexing as Google faces every day. No wonder it is commonly considered an impossible task.

What does it take to index a billion objects? Well, if you apply technology similar to the internet, then you will need 20-80,000 computers to process the data and store the index. This amount of compute resources is not practical for an enterprise to apply to any problem. The enterprise indexing problem is so daunting that most search companies recommend enterprises first decide which information is important and searchable and which information is not in order to reduce the problem set.

Recently I read the transcript of an Interop speech from a vendor in this space who was quoted as saying that “before you attempt indexing of enterprise data you first need to determine what is important”. Of course, this is not practical as you need to get an understanding of all content before any segmentation can be done. The problem with indexing large volumes of enterprise data is that it needs to be approached much more efficiently as compared to traditional indexing, which works well on the Internet.

Index Engines’ attacked the two major inhibitors to enterprise wide indexing. The first challenge is the speed of indexing and the second is the size of the index. We found that traditional document scrapping tools were to slow, at around 2MB per second, and often required that a copy of the data is created for dedicated processing. We developed word scrapping technology that scrapes words at line speeds (approximately 200 MB/sec). Why is this important? Because an enterprise creates data constantly, much faster than users on the Internet create web pages and the indexing of this data needs to occur quickly in order to maintain currency and accuracy without burdening the IT infrastructure with extensive processing requirements.

The second, and most challenging inhibitor to enterprise wide indexing, is the size of the index itself. If you have 100 TBs of email and unstructured files and the index storage requirements of enterprise indexing solutions can be from 40 to well over 100% of the original document size. This results in terabytes, even petabytes to store the index! The Index Engines database can process 4 million words per-second with a resulting index of only 8% of the size of the content. Using the above example of 100 TB’s of data, this results in approximately 8TB’s of storage which is a far more realistic number and easier for a CIO to accept.

Indexing the enterprise must be done with speed and efficiency. Any IT manager will tell you they don’t want to solve one problem by creating new problems. Traditional indexing approaches may provide a solution for data discovery, but they also result in challenges related to processing power and storage that most firms will not accept. This is why a new approach was required, one that now makes enterprise wide indexing possible.