One simple truth makes data exploration easier: there is a root cause of every change. Our goal in data exploration is to find those causes that underlie the changes we care about.
Another simple truth that helps us: if everything remains the same, nothing changes. This requires us to establish what is “normal” in our data, giving us a framework from which to look for new and irregular changes. Combining these truths gives us a straightforward algorithm to find and triage our insights.
1. Establish what is “normal”
While I’m sure your intuition is very good, you need a mathematical model of what is normal for each metric to be able to objectively identify changes. There are many ways to do do this, ranging from simple averages to more advanced techniques like linear regression.
Whatever you choose, you should be sure to look across many different time ranges to ensure you account for seasonality.
2. Detect what is not normal
Once you have a model that describes “normal”, it should be straightforward to identify everything that is not normal. Depending on your definition of “normal” this may result in a large number of possible findings. Adjusting your model to have the right sensitivity will take some trial and error, but is worth the effort.
3. Group them together
Not all findings will be independent. If you see shifts in revenue among all 50 states in the US, there is probably just a change affecting the entire country. One of the best ways to do grouping is through clustering, although you can also do it using your knowledge of your business and common traits.
After detecting and grouping, you will likely still have a large set of findings. You will need to evaluate which are the most interesting insights and pull those out, as it’s not possible to share everything you found with everyone at your company. If possible, it is useful to keep a record of all the insights you find as it might save you time in the future.
Tomorrow we’ll talk about how to make sure the insights you found are true insights and not just mirages, when we cover validation.
Quote of the Day: “One of these things is not like the others / One of these things just doesn’t belong / Can you tell which thing is not like the others / By the time I finish my song?” – Big Bird on Sesame Street, proving that even kindergarten can be data driven