Eating the Elephant. How to Get Started with Data Governance

by Tim Williams

Memory is malleable. Not only does our ability to recall the past degrade over time, but we often find ourselves remembering things that never happened. Our memory can even be altered by outside influences. According to memory scholar Elizabeth Loftus, it’s possible to produce false memories in a person either intentionally through overt manipulation, or unintentionally through prompting with misleading cues.

I was reminded of this while reading Exterro Content Marketing Manager Jim Gill’s blog post 4 Data Mapping Challenges and How to Overcome Them. Data maps, high level functional reports of a company’s data under management, are essential prerequisites for building a data compliance program. Gill’s focus is on data maps used to respond to a litigation, but the principles would apply to any other governance initiative. Gill warns that many companies start data mapping projects only to abandon them before completion. Nevertheless, he writes that systematic interviews with data stewards are the most efficient way to collect info for a data map.

That’s not my experience. The people that manage the storage have zero insight into the content of the data under management, and frequently weren’t even employed when the data store was implemented. The people that created the data have long ago lost track of most of it, and the creators may not even work there anymore. Trying to build a data map by relying on the memory of either will generate highly inaccurate results. And when you add Gill’s four challenges, too time consuming to build, impossible to keep information up to date, incomplete information, and not comprehensive and it understandable why most data mapping projects are considered failures.

How to get started? Well, start with the assumption that memory is at best an rough approximation, and that the best way to help someone recall the truth is to provide them with detailed and reliable cues grounded in facts. Start with the data. Organize it, classify it, present summary and detailed reports of it to your data stewards and data owners. Get them working directly with it, discovering what’s really there, and building the governance rules based upon what actually exists, rather than what they remember.

But don’t make the mistake of trying to do it all at once. Contrary to popular wisdom, the best way to eat an elephant is not one bite at a time. Really big problems resist solutions that involve breaking them up into small pieces and tackling each piece one by one. As Mike Martel warns, pretty soon, you are going to quickly get really sick of the taste of elephant and give up.

Start with a high level outline, and fill in the details iteratively. Let technology do the heavy lifting. Leverage a petabyte-class indexing and classification platform that can scale to meet the needs of massive data centers, one that can focus back and forth on the data like a camera does on the world, from wide angle landscapes to high resolution detailed shots.

The real problem with taking it step by step is that most people lose interest and end up quitting – Mike Martel

Your first pass should be focused on remembering…getting a rough idea of what types of data are stored where. Classify the data based upon just the file system meta data only. You can get an estimate of file types from their names, and sense of the amount of storage they consume from their sizes, an estimate of the storage wasted from the meta-data deduplication, and a sense of what’s valuable from last access and modification times. Share that information with the data stewards during your first interview with them and you will be surprised at how eye-opening the conversation will be.

Your second pass should be focused on reorganization. Go deeper and index the content. Identify the redundant, trivial and outdated data that can be deleted… the responsive, sensitive or personal data that needs to be protected…where hidden corporate intellectual property and historically valuable content that needs to be made more accessible…the archival-class data consuming primary storage that needs to be moved to cheaper long term storage.

Your third pass should be focused on risk. You should know where your key content is at this point. Support your legal and compliance teams with classified data that identifies data that should be on legal hold in support of eDiscovery, content that is sensitive and should be secured and preserved, email that are required for regulatory requirements, even content that contains personally identificable information (PII) that should be managed according to corporate governance polices. Your legal team will spend less time trying to find data, and more time protecting the organization from harm.

After each pass, show the results to your client. Help them get a better understanding of their data. Find out what they want to learn in the next pass. At the end, they will be able to develop informed governance policies derived from their actual data experience. And as the data changes over time, the data map you’ve created is just a easily updated.

When I was younger I could remember anything, whether it happened or not; but my faculties are decaying now, and soon I shall be so I cannot remember any but the latter – Mark Twain

