Seasonality: Identifying Seasonality

This is part 2 of our series on Seasonality, previous segments are available in our archives.

The hardest part of identifying seasonality is separating it from the myriad other influences that affect your business metrics. Seasonality is, as I mentioned yesterday, changes in customer behavior based on the time of year. Unfortunately, this means your seasonality is a side effect of some larger changes and can be hard to spot. Even if it is easy to spot, it can be hard to quantify!

Let’s look at an example data set that has some obvious seasonality:

raw

Without doing any analysis, it’s clear that the data is fairly consistent except for large spikes around Christmas (December 25th). We can analyze this data to detect that seasonality and quantify it, which will help you do the same even if the pattern is not as obvious as it is here.

Seasonality is really just a form of periodicity, meaning a repeating cycle based on time. You might have consistent behavior changes over the course of the day, the week or the month which are easy to see in your data. Detecting seasonality is just applying that same thinking with much longer time cycles.

If we apply a regression (trendline) to the data and look at the residuals of the data (deviations from the regression), we can see where and when the data deviates significantly from the overall trend and behavior. Here is that same data with a simple moving average trendline applied:

regression

 

The residuals are the differences between the actual values and the estimated values from the trendline. There is a lot of noise, so you will want to filter out residuals that are only minor deviations from the regression. In this case I used a high-pass filter that removes any residuals that are not at least 7% different from the trendline and you can see them charted below:

residualsHere the seasonality is obvious, as the biggest spikes happen every year at Christmas like clockwork! There are a number of other, smaller bumps that were not obvious in the original data that look like smaller cycles around July 4th.

By looking at the residuals against the trendline, you are removing overall trends and patterns which will make it easy to identify seasonality even in data where there is a lot of growth or changes. Be careful not to choose a regression that over fits your data, however, or else the seasonality itself will become part of the regression and not show up as a residual.

A more advanced technique is to compute the power spectrum for your data, which is a technique used in signal processing to identify the component cycles that contribute to a given time series. If there is any periodicity in your data, and seasonality in particular, it would stand out in your power spectrum analysis as cycles of the length between the seasons. So, for example, if you have big spikes on Christmas day every year then you would see cycles of 365 days (one year) which is the length of time between Christmas day every year. I will leave it to you to experiment with this technique if you are interested, it is extremely useful in analyzing any kind of time series data.

 

Quote of the Day: “There is nothing more deceptive than an obvious fact.” ― Arthur Conan Doyle, The Boscombe Valley Mystery