The Big Bang Theory of Bad Data

Rado Kotorov's picture
 By | April 26, 2013
April 26, 2013

According to the standard big bang theory, prior to any beginning there is simply nothing. Then a single even triggers a big chain reaction and a dramatic sequence of events. And so it is in the world of data. Data flows through all business processes and makes organizations function normally. But every now and then a small piece of bad data brings the entire system to a stop causing losses, liabilities, and other undesired consequences.

An incorrect setting can cause a blackout at a highly anticipated sporting event, for example, resulting in serious disappointment among the fans, expensive investigation, and protracted litigation. An incorrect conversion table can bring a multimillion dollar space exploration mission to a dead end. There are many cases where small bad data causes big bang chain reactions with serious adverse effects.

But this begs the question, why do many organizations see the consequences of bad data only in hindsight? Why not assess the risk realistically and deploy data quality programs proactively rather than reactively? Why do managers read about big bang bad data disasters and still fail to initiate data quality within their own organizations?

In my opinion, this is due precisely to the fact that big problems are caused by small bad data pieces. Thus the perceived risk in the organization is minimal and managers attribute such occurrences to randomness rather than to systematic causes. This is because we fail to see that while the accident can occur because of a small piece of bad data, there are many ways in which bad data gets into the system. So it is not a matter of one piece, but of which piece of bad data will make the problem obvious. Thus it is also not a question about whether such an event will occur, but when. The perceived rarity of such events blinds us to the need for a systematic approach.

Avoiding big bang disasters requires an assurance which can only be achieved with the institutionalization of a robust data quality and governance program.