Metadata accuracy is critical to ensuring accurate and reliable unstructured data classification. Many tools that exist in the market will corrupt metadata making the management of this content nearly impossible.
Organizations are learning that once metadata becomes unreliable it is difficult to make decisions about the data and it becomes lost and abandoned.
Take for example tools that crawl through networks and servers in order to index the metadata and content. In order to accomplish this task they access the document, thus changing the last accessed time. Some tools will reset this time, some may not.
As a result a key date field becomes inaccurate. In fact data that has been languishing on the network for a decade, untouched and long forgotten, can be suddenly indexed and the last accessed date become current. As a result data center and records managers would treat this content as valuable information rather than the classifying it as outdated and trivial which it is.
Another more common example is data center tools that change the owner of the document from a specific user to “administrator” or “admin.” These tools are common and will change one of the more critical metadata fields that is required for classifying content according to department and business unit.
As these tools scan the network they can change thousands of documents ownership to the useless “admin” owner and the document loses context and importance. One financial services company found that 50 percent of their unstructured data belonged to “admin” after being corrupted during consolidations and migrations – during this time last accessed dates also changed making the data useless and unmanageable.
Metadata is key to managing content and determining the disposition. As long as organizations continue to use tools that corrupt and cause spoliation of metadata content that has value or is sensitive will become lost among the complex infrastructure.
Understanding how a tool or platform extracts metadata and indexes content is important in ensuring long term metadata accuracy and confidence that user data will remain reliable.