Skip to content

Data Exploration: Validating Insights after finding them

This is part 4 of a 4 part series on Data Exploration.

The best part of data exploration is when you find insights hiding in your data. It’s a lot like finding a treasure chest hiding in the sand, and the feelings behind it can be a powerful emotional rush.

Unfortunately, many people get carried away by that emotion and report on insights without first validating they are real. The validation step is critical to avoid misleading people about what the data is truly saying. Even the experts are vulnerable to this kind of oversight. A group that attempted to reproduce the results of 100 published studies of psychological science was only able to reproduce 39 of them [1].

How do you avoid this trap? Let us count the ways:

1. Look for similar insights in the past.

If you find a shift in coupon redemptions that you think is the cause of a drop in revenue, look for similar shifts in the past. Has it happened before? If so, did it result in a drop in revenue every time? If not, it might be a red herring or maybe just a part of the answer. Many insights you find in your data happen more often than you think, you just didn’t know to look for them in the past.

2. Consider the Null Hypothesis

The null hypothesis is the assumption that there is no relationship between what you are measuring. Your job is disprove the null hypothesis to find evidence that the changes you observe couldn’t happen by mere chance. Ensure that whatever insights you find are not the result of data quality problems or analysis error. You would be surprised at how often a simple Excel formula can lead you to vastly incorrect insights. [2]

3. Use your judgement

If you find an insight that indicates that changing the color of the chairs in your conference room led to a drop in revenue, you would be right to be skeptical. The data itself cannot tell you the difference between correlation and causality, so you will need to decide for yourself if an insight truly represents a cause and effect relationship.

Next Week: Even with the methods we’ve reviewed this week, you need to know what you are looking for in order to succeed. We will cover the common forms of Data Insights which are going to be the building blocks of the findings you will produce from your exploration.

[1] This is known as the Reproducibility Project and was started due to a crisis in the reproducibility of studies in many fields. The pressures of academic research have created a severe incentive to skip validation since a two-year long data exploration that reveals no validated insights can be a career-ending event.

[2] One of the most influential studies in economics was later found to have significant errors in the Excel spreadsheet used to analyze the data. It’s unlikely any of your errors will cost billions of dollars, but you never know.

Quote of the Day: “Man cannot discover new oceans unless he has the courage to lose sight of the shore.” ― André Gide

The Data Exploration series