Norcross Group Joins the Index Engines Litigation Readyâ„¢ Partner Program for Enterprise Discovery

Index Engines Platform Allows Norcross Group to Offer Enterprise Wide Discovery of Data in Support of Electronically Stored Information, Litigation Triage, E-Discovery, and Investigations

Norcross, Ga. and Holmdel, N.J. October 2, 2007 — The Norcross Group, the trusted resource for locating critical information that supports lawsuits, subpoena compliance, and internal investigations; and Index Engines, the leader in enterprise discovery solutions, today announced that the Norcross Group has joined the Index Engines Litigation Ready Partner Program as an authorized reseller of the Index Engines Platform. Norcross Group will offer the Index Engines Enterprise Discovery Platform to deliver comprehensive discovery of online and offline data across the enterprise to support initiatives where detailed access to metadata and full text content is critical.

This partnership will enable the Norcross Group to offer the Index Engines Enterprise Discovery Platform to its customers to automate information access and retrieval, providing more efficient online discovery and completely eliminating the need to restore offline tape content in order to uncover actionable data.

No other discovery platform has the ability to integrate into both online and offline environments the way that the Index Engines solution does. We have been looking for this type of solution for some time. Now that we are a member of the Index Engines Litigation Ready Partner Program we now have an innovative edge in the e-discovery space.

“Integrating Index Engines’ comprehensive discovery solution into our consulting practice will allow us to deliver world class data intelligence solutions to our clients,” said Vickie Hill, president of the Norcross Group. “No other discovery platform has the ability to integrate into both online and offline environments the way that the Index Engines solution does. We have been looking for this type of solution for some time. Now that we are a member of the Index Engines Litigation Ready Partner Program we now have an innovative edge in the e-discovery space.”

“The Norcross Group is a trusted and dynamic firm in the litigation and compliance world, well positioned to take advantage of our technology on behalf of its growing list of clients,” said Jim McGann, vice president of marketing, Index Engines. “Those clients will benefit from the dynamic combination of the Norcross Group’s impressive expertise in litigation readiness, investigations and compliance supported by Index Engines’ Enterprise Discovery Platform.”

About the Index Engines Litigation Ready Partner Program
The Index Engines Litigation Ready Partner Program has been designed to provide superior technology and resources for partners, including the Norcross Group, as they embrace the growing revenue opportunities in the high-demand market for e-discovery. The program delivers compelling sales and marketing initiatives for lead generation, technology training, sales education and tools to assist during customer engagements. For more information, please visit

About the Norcross Group
Norcross Group provides a full range of electronic discovery services for litigation support and subpoena compliance, including complex digital forensics, all in accord with the recent Electronically Stored Information (ESI) amendment to the Federal Rules of Civil Procedure. The firm’s deep knowledge of digital investigation and discovery helps organizations simplify and streamline the retrieval and retention of critical information, whether in paper or digital format, including misplaced, erased, or damaged data. For more information on the Norcross Group, please visit:

About Index Engines
Founded in 2003, Index Engines is the leader in enterprise discovery solutions. The company’s mission is to organize enterprise data assets and make them immediately accessible and easily manageable. Businesses rely on Index Engines’ solutions for comprehensive insight into their data in order to streamline the discovery, classification and management of enterprise assets. The Index Engines discovery platform is the only solution on the market that understands storage protocols, enabling high-speed, efficient indexing of backup formats including IBM TSM, EMC NetWorker, Symantec NetBackup, and CA ArcServe. The Index Engines family of products includes a SAN Engine for ingestion of data in flight on the storage network, a LAN Engine for ingestion of data from NAS filers, and a Tape Engine for direct indexing of offline tape content with the need to restore files and email.

Index Engines is privately funded and headquartered in Holmdel, New Jersey. Its products are sold and serviced worldwide, directly and through Index Engines’ channel partners. For more information on Index Engines, please visit the company’s website at

Index Engines and Litigation Ready are trademarks or registered trademarks of Index Engines, Inc. All rights reserved. All product names mentioned are or may be trademarks or registered trademarks of their respective organizations.

Index Engines and BlueArc Partner to Drive EDiscovery Performance, Efficiency and Scalability

Index Engines, a leader in enterprise discovery solutions, today announced that it has partnered with BlueArc® Corporation, a leader in scalable, high-performance network storage to develop an optimized version of the Index Engines LAN engine to support BlueArc’s high performance Titan 2000 unified network storage systems for fast, economical search and discovery of unstructured online data. The new Index Engines/BlueArc solution was developed as part of BlueArc’s participation in the Index Engines Litigation Ready™ Partner Program. The resulting product reduces overall eDiscovery costs and turnaround time while giving companies the dynamic interaction with data that enables them to perform on-demand decision making for legal and regulatory compliance. Read more >>

“Gordian Knot of IT Complexity”

Taneja Group just wrote a research paper on us. “Every so often, we find a vendor that cuts through a Gordian knot of IT complexity with one well conceived reframing of the problem. The Index Engines Appliance does precisely this, and we believe customers will agree. For enterprises exploring the complex world of classification and indexing tools, we recommend an evaluation of the Index Engines Appliance.”

Get a free copy of the full report on our website – here is a link.

Efficient programming on multi-core systems

My last blog entry talked about multiple cores being the most cost effective way to improve the number of instructions processed per-clock cycle. With future processors having many cores standard, it is worthwhile to take a look at what application writers can do to maximize the performance of multi-threaded code.

First, Locks are very expensive to execute in that they stall pipelines and lock memory cache lines. A typical multi-threaded program with locks and no lock contention, can run as much as 30% slower when compared to a non multi-threaded version. This slowdown is the overhead of executing the exclusive memory access instructions which significantly reduces the instructions per-clock executed. Also, mutex’s consume a lot of space. In Linux, a POSIX pthread_mutex is 40 bytes.

One way to avoid this locking overhead is to lessen the use of locks as much as possible. Take advantage of some of the core guaranteed atomic operations of the processor. All modern processors guarantee that simple loads and stores to aligned memory addresses are atomic. The Linux atomic_t and atomic64_t types supports a set of simple increments and decrements that will perform much better than a lock/unlock pair. Use algorithms that are based on atomic fetch_and_add principles to resolve conflicts. There are many public algorithms for list management and other common algorithms that take advantage of these primitives. Do a quick search on the Web and keep those cores busy.

Pushing Processing

In my earlier entry I spoke about the challenges of indexing the Enterprise. The biggest challenge is the speed at which indexing has to occur. Enterprises are creating data faster than traditional indexing methods can index data. In order to process billions of files at high speeds we needed to implement a new approach to scraping words. This method, which utilizes our advanced text scanning algorithms, works best on CPU architectures with very high-speed memory bandwidth and low-latency to that memory. After much analysis we found that the AMD Opteron CPU was the best fit. Because of Opteron’s Direct Connect Architecture, the latency for accessing random data from main memory has been minimized. This problem has also benefited by the gaming market which has driven down the CAS latency of DDR400 memory to 2 clock cycles.

What can we expect in the future? It is clear that the future of processors is multiple-cores. Thermal issues have put a damper on increasing clock speeds so any new available real estate is being used to add more cores which effectively increases the number of instructions that get executed per clock cycle. Existing CPU intensive applications will need to be modified to take advantage of these new architectures. Even though multi-threading has been around for a long time it is still worth examining this issue on multi-core systems and I will address this issue in my next blog entry.

What would I love to see in a future processor? Like everyone else our application would benefit from more L1 and L2 cache. This is obviously important to AMD also as they recently licensed the Z-RAM high density memory IP from Innovative Silicon. Hopefully some of that technology will significantly increase cache space. We could also use a simple built-in hash instruction for hashing strings. The best public domain hash functions take about 20 operations per word. I would guess a 10 to 20 fold speed up for a silicon approach.

Overall I look forward to many more cores and integrated DDR2 memory controllers. Just keep in mind that your code has to be tuned to take advantage of the new CPU’s otherwise you will have a lot of idle cores.

Is Indexing the Enterprise Possible?

At first glance this seems like an impossible task. When Google last published their stats they had indexed approximately 8 billion web pages as of first quarter 2005 (they have since stopped publishing a number however speculation has it well above 10 billion). That sounds like a lot, until you consider that most of the global 500 can easily exceed this number with historical files and email alone (One investment bank told me they have well over 2 billion active emails not counting their archives). Therefore, a large enterprise would be burdened with the same level of indexing as Google faces every day. No wonder it is commonly considered an impossible task.

What does it take to index a billion objects? Well, if you apply technology similar to the internet, then you will need 20-80,000 computers to process the data and store the index. This amount of compute resources is not practical for an enterprise to apply to any problem. The enterprise indexing problem is so daunting that most search companies recommend enterprises first decide which information is important and searchable and which information is not in order to reduce the problem set.

Recently I read the transcript of an Interop speech from a vendor in this space who was quoted as saying that “before you attempt indexing of enterprise data you first need to determine what is important”. Of course, this is not practical as you need to get an understanding of all content before any segmentation can be done. The problem with indexing large volumes of enterprise data is that it needs to be approached much more efficiently as compared to traditional indexing, which works well on the Internet.

Index Engines’ attacked the two major inhibitors to enterprise wide indexing. The first challenge is the speed of indexing and the second is the size of the index. We found that traditional document scrapping tools were to slow, at around 2MB per second, and often required that a copy of the data is created for dedicated processing. We developed word scrapping technology that scrapes words at line speeds (approximately 200 MB/sec). Why is this important? Because an enterprise creates data constantly, much faster than users on the Internet create web pages and the indexing of this data needs to occur quickly in order to maintain currency and accuracy without burdening the IT infrastructure with extensive processing requirements.

The second, and most challenging inhibitor to enterprise wide indexing, is the size of the index itself. If you have 100 TBs of email and unstructured files and the index storage requirements of enterprise indexing solutions can be from 40 to well over 100% of the original document size. This results in terabytes, even petabytes to store the index! The Index Engines database can process 4 million words per-second with a resulting index of only 8% of the size of the content. Using the above example of 100 TB’s of data, this results in approximately 8TB’s of storage which is a far more realistic number and easier for a CIO to accept.

Indexing the enterprise must be done with speed and efficiency. Any IT manager will tell you they don’t want to solve one problem by creating new problems. Traditional indexing approaches may provide a solution for data discovery, but they also result in challenges related to processing power and storage that most firms will not accept. This is why a new approach was required, one that now makes enterprise wide indexing possible.