Finding What to Keep

Yesterday PC World featured an article by IDC’s Stephen Lawson entitled Why IT Should Start Throwing Data Away. Lawson discusses the pitfalls of keeping data dating back decades. Even though storage containers are cheap, the rate that data is accumulating will make these inconsequential costs add up fast. And then he points out that enterprises must also be concerned with the intangible risks of having to plow through all the legacy data for eDiscovery.

The challenge of overcoming the data stockpile is two-fold. Defining what needs to be saved, and then finding the valuable stuff among the junk. The first step, building a retention policy, is up to the organization. Each enterprise has different types, amounts and uses for their data. Although expert consultants can help, an internal records management group will need to own the definition and implementation of how their company’s data is sorted and then saved or disposed of. And as Lawson’s article points out, technology can help a RIM team understand how much and what types of data they have, but it can’t interpret what this means to the company and determine how to best to save (or not) the data.

Once the policy is written, the next hurtle is to collect the relevant information from the vast stores of old data. Where data resides can be broken down into three storage types; backup data, network data, and desktop data. Backup data, including tape and disk, contains tons of duplicate and useless data. Stores of network data are often tremendously large and ever changing. Desktop data is fragmented and difficult to trace. Index Engines offers a collection platform, that not only handles the specific challenges of accessing each type of data container, but also allows for a unified view and deduplication effort. By unlocking backup formats, Index Engines makes quick work of indexing and searching data on tape or disk. Using NDMP to index network data, Index Engines can keep pace with the ever changing volume of enterprise production data. User data can also be harnessed, indexed and searched with Index Engines platform. This technology links the indexes built from the various data stores and allows the search for unique content to be performed against them all – simultaneously. Then the true subset of valuable data can be extracted and archived. And the rest, as Lawson suggests, can be thrown away.