Trendlines: Detecting Real Trends
When Not to Use Trendlines
While you can, in theory, add a trendline to almost any data, it is not always a good idea. The techniques we have reviewed this week are simply mathematics, and mathematics can be applied to any group of numbers. Take, for example, the following random data:
This is completely random data. Does a trendline make sense for this data? Absolutely not, it’s random! Can I still add a trendline? Sure, why not, let’s add a 2nd degree polynomial regression:
If you can add a nice trendline like this to random data, the potential of using trendlines to mislead should be apparent. Using data to make decisions relies on clear communication of reliable results (see our series on Data Storytelling), so if the mathematics won’t stop you from adding trendlines you will need to use your judgement.
So, how do you decide if adding a trendline makes sense? Here are a few questions to ask:
- Is there really a trend? If you can’t see any trend at all without the trendline, chances are there is no real trend there at all. Trendlines are better at helping you understand trends that are already there, not finding trends hiding in noisy data.
- Does the math check out? R-squared (aka the Coefficient of Determination) is a measure of how well a line fits the actual data. The lower the R-squared value the worse the fit. The trendline in the above random data had an R-squared value of 0.03 which means it doesn’t really fit at all, so adding a trendline doesn’t make sense. Note, however, low R-squared do not immediately discredit models, it really depends on what you are studying.
The addition of trend lines to your data is, in the end, a judgement that you will make as a data user. Choose carefully.
In Review: Trendlines are a powerful tool for understanding data, enabling you to do things ranging from anomaly detection to predicting the future. There are a number of different ways you can calculate trend lines and you will need to use your judgement to decide which method you should use, and if a trendline is appropriate for your data at all.
Enjoy your 4th of July! Next Tuesday is the 4th of July holiday here in the United States, so we are going to take a break from sending you a daily email. Look for your next Data Driven Daily on Monday, July 10. Happy 4th!