Whether you can be data driven depends greatly on whether you can trust your data. While in the past we’ve covered how Numbers Lie and how Dirty Data can cause problems, one of the most common requests we get is to go over how exactly to audit your data and identify any data problems. This week we’ll do just that.
A data audit is a process where you verify and validate that your data is correct and complete, so that you can rely on it. In most audits, you are evaluated against a set of standardized criteria. However, in a data audit you typically have to provide your own criteria because your data (and how you use it) is unique. This can be tricky, as you’ll need to decide how much error you are willing to accept. No data is perfect, so how imperfect can it be?
Even after establishing your data audit criteria, testing your data to ensure it meets your criteria is just as tricky. To effectively audit your data you’ll need to tell the difference between anomalous changes in your business and real data quality problems, even though both may present with the same symptoms!
We’ll cover the basics of doing a data audit and give you a framework that you can use in your own company for doing the same. Specifically we will cover:
- Part 2 – Setting Your Audit Criteria
- Part 3 – Establishing the Baseline
- Part 4 – Identifying Quality Problems
- Part 5 – Validating Data Definitions
Tomorrow we’ll get started with the first step in any data audit, which is establishing a baseline of your data so you know what to expect.
Quote of the Day: “Quality means doing it right when no one is looking.” ― Henry Ford