Trendlines: Regressions

This is part 3 of our series on Trendlines, previous segments are available in our archives

Linear and Polynomial Regression Trendlines

Linear regressions are the most common approach to adding trendlines to data as they are both very flexible and have great predictive properties about the future. We’ve covered Linear Regressions previously so I won’t review how to calculate them here, instead we’ll discuss how to use them effectively for trendlines.

While polynomial regression requires that more coefficients be fit to the input set of values, they can be very effective in estimating trends. The biggest challenge is deciding what degree polynomial is the right fit for your data. A general rule of thumb is that every degree you add to your polynomial adds another bend into the curve of your trendline. The following is our raw data with a 2nd degree polynomial trendline.

And here is the same data with a 6th degree polynomial:

You can see the 6th degree version has significantly more curves in it. Whatever software you use to create a polynomial regression trendline will ask you to choose the degree and it’s important to choose wisely! If you notice, the first example (2nd degree) shows the trend increasing at the end, while the second (6th degree) shows it decreasing at the end. Depending on what you choose it can tell a different story about what’s likely to happen next!

Speaking of which, let’s cover the pros and cons of this approach to trendlines.

Pros

  • Linear and polynomial regressions emphasize the trend in the trendline, allowing you to see the overall movement of the data. If the data is increasing or decreasing, it should become clear.
  • The output of a linear and polynomial regressions is a formula to describe your data, so can easily predict future values (using that same formula). This means your trendline can double as a short term prediction!

Cons

  • As we mentioned, choosing the degree of the polynomial in your regression is critical. Too high and you will over-fit your data and it will be no better than a moving average. Too low and it might not accurately reflect the movement of the data.
  • If there are significant shifts in the middle of your data, such as changes in data definitions or collection practices, the regression model will have trouble adjusting. As a result you can only run a regression on fairly clean data to ensure good results.

Tomorrow we’ll cover an advanced way to add trendlines to data which combines both the moving average and the linear regression: the auto-regressive integrated moving average (ARIMA).

Quote of the Day: “Once you make a decision, the universe conspires to make it happen.” – Ralph Waldo Emerson