This is part 1 of a 5 part series on Numbers Lie.
What is Metrics Bias?
Numbers can lie to you. If you don’t believe me, keep in mind that some of the most data driven organizations in the world make the biggest mistakes. For example: Canarsie Capital, a hedge fund, lost $60M in only three weeks! Flint, Michigan relied on faulty sampling and data when making disastrous decisions regarding the city’s water.
Numbers lie not when they are clearly wrong (that is obvious), but when they are subtly incorrect and it is hard to know they are wrong. You can use good judgement and make the right decision based on bad data, which in the end will be a bad decision. There are a variety of causes for these subtle data problems and I refer to them all as Metrics Bias, because the metrics are telling you a biased story instead of an objective truth.
Sometimes your metrics are correct, but you read a biased story from them based on your misinterpretation of the data. I also consider this Metrics Bias because the effect is the same! It doesn’t matter whether the speedometer on your car is broken or if you confuse miles with kilometers – in either case you can get a speeding ticket without realizing what you were doing.
It’s hard to be data driven if you can’t trust your data, so this week we are going to focus on how to identify and eliminate bias in your metrics.
Tomorrow we’ll get started by talking about Collection Bias, which happens when you think you have all the data you need and some of it is missing!
Numbers Lie: Collection Bias
This is part 2 of a 5 part series on Numbers Lie.
What is missing?
The most common ways numbers will lie to you is through lies of omission. Your metrics might look great, but if there is missing data behind them then they are not telling you the whole story! If your metrics tools don’t collect all the necessary data to help you make informed decisions, I call it Collection Bias.
Lest you think that Collection Bias could never happen to you, consider these common examples:
- Website metrics. If you are tracking visitors to your website using popular web analytics tools, chances are you are undercounting your visitors and usage. Why? Many common ad blockers block analytics services as well as ads! One site’s analysis showed that anywhere from 5-25% of visitors might be invisible to hosted web analytics tools due to ad blockers.
- Email Opens. If you send emails to your customers, you are likely tracking how many people open your emails (Email Open rate). This tracking is done by inserting a small pixel image into the email and recording every time that image is loaded (the email opened). However, many email clients will block or inhibit these tracking pixels! Gmail, for example, will only load the pixel once even if the user opens the email many times which can lead to drastic under counting.
There are many more examples ranging from implementation bugs to database integrity problems. The reality is that it’s unlikely that you don’t have some kind of collection issue in your metrics systems.
Wow, that is scary
Yes, I agree. The good news is that if you assume that Collection Bias exists you can proactively avoid it using a few simple steps:
- Understand your tools. All analytics and metrics tools will disclose collection issues in their documentation. If they don’t, email their support and ask them about known data collection issues so that you are aware of them now.
- Test your data like your product. Most companies will test new features of their products and services whenever they make changes. But what about your data? You should get into the habit of testing your metrics and analytics just like you test your features, whenever anything changes.
- Avoid collecting everything. The more data you collect, the harder it will be for you to be sure it’s correct. While it is great to track everything, as soon as you find a collection bias problem in one metric your team will view all of your metrics with suspicion! This undermines your efforts to be data driven, so limit your metrics to those you can be sure of their accuracy.
Numbers Lie: Interpretation Bias
This is part 3 of a 5 part series on Numbers Lie.
One of the great challenges of making data driven decisions is letting the data shape your opinion. It is human nature to seek out data that reinforces our existing ideas and conclusions, something psychologists call Confirmation Bias. But using data to justify existing decisions is NOT data driven, because the decision was already made!
When you look at data and see what you want to see, I call it Interpretation Bias and it’s easy to do when your data sets have no clear message. Let’s assume the following chart is your corporate revenue and your boss asks you to determine if the business is growing:
This data is very inconclusive, as is most real world data. However, because it is inconclusive you can read into it whatever you want! In fact, I’ve recorded a short video (3:51) that shows how easy it is to misinterpret this chart and goes into Interpretation Bias in more detail. Give it a watch if you’d like to learn more.
To avoid Interpretation Bias, there are a few simple steps you can take:
- Self Awareness. Before even looking at data, admit to yourself what you want the data to say. Then do your best to see the opposite in the data, taking a devil’s advocate approach. It will force you to see the data from the other side at first.
- Peer Review. Have someone else look at your conclusions and verify they reflect the data. If possible, make sure that person has no direct incentive related to the conclusion!
- Triangulate. While a single source of data might easily fall victim to Interpretation Bias, it is much harder when you combine multiple data sources together.
Numbers Lie: Focus Bias
This is part 4 of a 5 part series on Numbers Lie.
Everyone hates bad news
Let’s all admit that we hate getting bad news. It is a lot easier to handle good news, especially first thing in the morning.
Bad news is not always as simple as “revenue is down”. Sometimes bad news is that you have yet another task to complete today when your todo list is already overflowing. There is only so much we can do in a given day, so many decisions we can make. This is why many organizations will focus on a few KPIs, metrics or customer segments that are important to them and largely ignore the rest. There are only so many numbers you can absorb and process.
However, this focus may exclude important data for decision making so I refer to it as Focus Bias (others call it “Tunnel Vision”). Focus Bias is dangerous for a few reasons:
- Too high level. KPIs are often very high level metrics and can hide changes and shifts going on in your business that are indicators of future problems.
- Businesses change. The assumptions and beliefs in your business that drive your focus one year may not hold true the following year. Businesses change over time and your focus may not keep up.
How do you avoid Focus Bias? The best way is to try to get outside of your metrics comfort zone as often as possible. Here are some tips:
- Every week… Make it a weekly habit to browse through data on customer segments and behaviors that are not on any of your dashboards. Data exploration is fun and can provide you with hints of hidden opportunities and problems.
- Every quarter… Ask yourself if your KPIs and metrics are still the best measurements of your business. Don’t keep them just for historical reasons, switch them out if there are better metrics!
- Every year… Ask yourself what you wish you had known the year before and do your best to fill any similar gaps in the coming year. You cannot make perfect decisions but you can always be better than you were.
You cannot look at every dimension of every metric of your business everyday without going insane. However, focusing on only a few metrics is just as dangerous!
Numbers Lie: Hidden Bias
This is part 5 of a 5 part series on Numbers Lie.
Just because you can’t see it, doesn’t mean it’s not there
We have covered a number of different kinds of Metrics Bias this week including collection, interpretation and focus. However, what about the other bias that is hiding in your data, a bias that is specific to your business and the assumptions you have made? How do you find those hidden biases that might color your decisions?
Obviously, it is not healthy to be paranoid about your data and your decisions all the time. However, it’s also impossible to avoid every possible kind of bias in your data. The good news is that to make data driven decisions you do not need certainty, you just need a preponderance of evidence.
For example, let us decide whether a hypothetical company called Tony’s Crabshack should drop any items from their menu to save some money. Below is all their data on purchases from the past week:
|Item||Total Sales||# Sold||Total Profit|
If we assume our data is fully accurate and complete we would immediately drop the Fried Clams and Fried Shrimp! They make the least profit for us.
However, if we assume we might have a hidden bias in our data then we might look a little further at the Fried Clams just to be sure it’s worth dropping. We might find that our cashiers were mistakenly recording an order of clams for an order of fries! Or we might find out that 90% of the people who order clams also order a lobster roll, our highest profit item. Or we might find out that customers who buy clams come back every day and are our most loyal customers. Any of those factors might be hiding in the data since the signal is not clear.
Handling hidden bias is as simple as verifying your conclusions, looking for strong signals in the data and double checking your work. By assuming there is bias in your data you’ll be more likely to find it or at least compensate for it when making decisions!
Those shrimp, though, have to go.
Next week we’re going to tackle a topic requested by a reader! How do I determine my customer acquisition cost, the price I pay to gain new customers? It can be a harder question to answer than it seems.