Direct Extraction vs. Cached Extraction – Your Choice

There has been talk lately in the eDiscovery space about two different approaches to extracting responsive data from backup tapes. One approach, referred to as “multi-pass” or “direct extraction”, is unique to Index Engines. It requires potentially more than one pass over tapes. The other, termed “single-pass” or “cached extraction” can extract data with only one pass. Cached extraction is also available from Index Engines

Let’s outline what’s involved in these processes. Assume a tape job of 100 tapes, and that on average one tape stores 200GB of data. In a cached extraction approach, the tapes are spun once and an image is extracted and cached to disk. Now the backup format must be interpreted, tape images cataloged, indexed and deduped. Depending on how sophisticated the process is, this can amount to anything from a completely manual and labor intensive process, to a completely automated one. Only after this process is complete can the search for responsive data begin. To store 20TB (100 x 200GB) of data a storage infrastructure capable of handling this load must be present. Of course, data compression techniques can be applied to these cached images to reduce the storage needed. Nevertheless, the expense of both buying and administering enough storage for holding all the tape images will add significantly to the total cost of this approach. And if the process isn’t fully automated, the storage costs will be dwarfed by the additional expense of the significant man hours required to catalog, index, dedupe and search the full 20TB tape data set.

With the direct extraction approach, the process is quite different. The 100 tapes from our example above are first automatically cataloged and indexed. Typically our service partners find that only 1–5 % of data contained on a tape is actually responsive. This relevant data is then extracted and stored for analysis. It is important to note that although this approach accesses the tapes several times, these “touches” are not at a full order of magnitude each time. If the single-pass or caching approach gathers a full tape image and equates to 100% access, Index Engines’ direct extraction approach would rate at approximately 106%. This is true because each touch by Index Engines is not a deep, full read of the tape. We access only the information needed at each stage of our process to enable the user to make intelligent choices.

For those of you who are troubled by the age and/or fragile quality of your tapes, rest assured that the Index Engines solution does indeed support a fully automated caching process for tape discovery. However, due to the increased infrastructure and administrative costs required to support this method, we only recommend pursuing it for the most delicate of tape jobs, since on average it only mitigates 6% of the risk. Index Engine’s fully automated cached extraction approach is more cost-effective than other less automated caching processes. It can also be faster than a direct extraction, because an operator is only required to load the tapes once. But the downside to this type of single-pass approach still remains. It requires storage and examination of largely useless data. In a worst case scenario, where there is no responsive data at all on the tapes, the difference in processing costs can be substantial.

Index Engines allows you to choose; our unique direct extraction method or the option to employ a caching approach – whichever fits your needs and budget. Your choice is really very simple.