Outlier is automated data insights for your entire business.
Book a Demo
Outlier is automated data insights for your entire business.
Book a Demo
Outlier is automated data insights for your entire business.
Book a Demo
Outlier is automated data insights for your entire business.
Book a Demo



This is part 1 of a 5 part series on Seasonality.

The holiday season is upon us and regardless of what holidays you celebrate the season is likely to affect your business. In the US, some e-commerce companies will do more than half of their annual business in the fourth quarter. For companies that work in the ski industry, their entire annual revenue is earned between November and March!

Seasonality refers to the systemic changes in your business based on the time of year. It is something you need to take into account every time you analyze and interpret your metrics, since it can be a hidden driver of changes to your metrics. It is easy to forget that seasonality is a factor when there are much more immediate and apparent causes, such as competitive actions, product updates and advertising shifts, which means it can often be overlooked. However, seasonal shifts are like the tides and can have a huge impact.

Note that the season itself is not why your metrics change, it is because customer behaviors change with the season. This means that the way seasonality affects your business depends on who your customers are and how their behavior changes. Companies that sell to students see their seasonality tied to the school year calendar, while tourism businesses follow the weather.

This week we’ll discuss ways to account for seasonality in your data, and how to make effective predictions taking seasonality into account.

While I won’t be able to tell you what kind of seasonality affects your business, by the end of the week I’m hopeful you’ll be able to figure that out by yourself!

“Winter is coming.”

Seasonality: Identifying Seasonality

This is part 2 of a 5 part series on Seasonality.

he hardest part of identifying seasonality is separating it from the myriad other influences that affect your business metrics. Seasonality is, as I mentioned yesterday, changes in customer behavior based on the time of year. Unfortunately, this means your seasonality is a side effect of some larger changes and can be hard to spot. Even if it is easy to spot, it can be hard to quantify!

Let’s look at an example data set that has some obvious seasonality:

Without doing any analysis, it’s clear that the data is fairly consistent except for large spikes around Christmas (December 25th). We can analyze this data to detect that seasonality and quantify it, which will help you do the same even if the pattern is not as obvious as it is here.

Seasonality is really just a form of periodicity, meaning a repeating cycle based on time. You might have consistent behavior changes over the course of the day, the week or the month which are easy to see in your data. Detecting seasonality is just applying that same thinking with much longer time cycles.

If we apply a regression (trendline) to the data and look at the residuals of the data (deviations from the regression), we can see where and when the data deviates significantly from the overall trend and behavior. Here is that same data with a simple moving average trendline applied:

The residuals are the differences between the actual values and the estimated values from the trendline. There is a lot of noise, so you will want to filter out residuals that are only minor deviations from the regression. In this case I used a high-pass filter that removes any residuals that are not at least 7% different from the trendline and you can see them charted below:

Here the seasonality is obvious, as the biggest spikes happen every year at Christmas like clockwork! There are a number of other, smaller bumps that were not obvious in the original data that look like smaller cycles around July 4th.

By looking at the residuals against the trendline, you are removing overall trends and patterns which will make it easy to identify seasonality even in data where there is a lot of growth or changes. Be careful not to choose a regression that over fits your data, however, or else the seasonality itself will become part of the regression and not show up as a residual.

A more advanced technique is to compute the power spectrum for your data, which is a technique used in signal processing to identify the component cycles that contribute to a given time series. If there is any periodicity in your data, and seasonality in particular, it would stand out in your power spectrum analysis as cycles of the length between the seasons. So, for example, if you have big spikes on Christmas day every year then you would see cycles of 365 days (one year) which is the length of time between Christmas day every year. I will leave it to you to experiment with this technique if you are interested, it is extremely useful in analyzing any kind of time series data.

“There is nothing more deceptive than an obvious fact.”

Seasonality: Predicting Seasonality (from History)

This is part 3 of a 5 part series on Seasonality.

Once you understand the seasonality of your data, it can be a useful tool in predicting the future. Taking into account changes that happen at different times of year will make your predictions more accurate than if you just extrapolate from the data itself.

For example, if we return to our highly seasonable same data below:

If we were try to predict the future using a simple linear regression, it would fail to capture those regular, seasonal spikes. Below is a 6-degree polynomial regression on the above data:

As you can see, this regression would be very effective at predicting the value for almost any day of the year except for our big spikes around Christmas! This means that you need to use different approaches to take seasonality into account.

From what we covered yesterday, I will assume you have identified when you experience seasonality. On those dates, you need to calculate the average deviation from the regression which will become your expected change in the future. For our data above the deviations for the past few years break down as:

December 25, 201220.7%
December 25, 201320.8%
December 25, 201421.6%
December 25, 201522.0%
Mean Deviation21.3%

We can then use this factor (+21.3% or 1.213) to predict the value for any upcoming Christmas by using a simple regression and multiplying the value for Christmas by our deviation factor. If our regression prediction for Christmas this year is 100, then we would estimate 100 * 1.213 = 121.3.

This works well when you have a few years of history to use, but that won’t be true of new products and services! We’ll cover what to do in those cases tomorrow.

“In – five hundred twenty-five thousand / Six hundred minutes / How do you measure / A year in the life”

Seasonality: Predicting Seasonality (without Data)

This is part 4 of a 5 part series on Seasonality.

Yesterday we covered how to take seasonality into account with your predictions, but it relied heavily on having a few years worth of data. What if you just launched a new product or service and don’t have nearly that much history to fall back on?

You can still account for seasonality, but with significantly less accuracy. The key is to find market data that gives you some insight into the magnitude of customer behavior changes that you can use to replace the deviation factor we were able to calculate with historic data. Here are some examples of market data sources you might use:

  • Google TrendsGoogle Trends allows you to see how the use of search terms varies over time historically. If you sell shirts online, you can see how customers searches for “shirts” varies by time of year as a proxy for how interested they are in buying. This works very well if most of your traffic comes through SEO or SEM.
  • Government Data. Many governments publish monthly data on the consumption and price of various goods. The US Bureau of Labor and Statistics publishes the Consumer Price Index which includes data on consumption and pricing of many different kinds of goods. It can be easy to see seasonal changes in good consumption depending on your category.
  • Customer Fiscal Cycles. If you sell to businesses, you can ask your customers about their fiscal planning cycles which should tell you a lot about their buying patterns. Many organizations will buy at the beginning of the fiscal year (if they have allocated budget) or at the end of the year (if they have budget remaining) so you can build a calendar of customer behavior over the course of the year.

Your goal should be to determine how much your metrics will deviate from the common trend during specific days / weeks / months during the year. While it will not be as accurate as the predictions from actual data, it should allow you to prepare for changes. The good news is that after the first year you will have at least one year of historical data to use!

Tomorrow we’ll cover what you should do if you can’t trust your data completely.

“We demand rigidly defined areas of doubt and uncertainty!”

Seasonality: Bad Data

This is part 5 of a 5 part series on Seasonality.

One of the great challenges of seasonal analysis is that it, by definition, spans years of data. Over those years the type of data you collect, how you collect it and where you store it will likely change. This means you need to deal with changes in the nature of your data in addition to the analysis you are trying to perform.

This kind of data pollution can make it appear as if you have seasonality in your data even when you do not. For example, if you have an annual database migration that happens on the same day every year and takes down your service for most of the day, you might see an annual dip in your business metrics that is not seasonality but your own process! This can also be true of financial audits that freeze accounting or company warehouse closures.

So, how do you separate real seasonal changes from bad data?

  • First, you should eliminate other possible causes before attributing metric changes to seasonality.
  • Second, you should validate your findings about seasonality using market data (such as the data we discussed previously). While you might see seasonality unique to your business, it’s more likely that you share the same seasonality as similar businesses.
  • Finally, always store your metric definitions along with your data. That way, when you migrate to new systems and need to revisit older data in old systems you can easily remember the definitions relevant to a specific data store. Without this Rosetta stone it may be hard to translate between different years of data.

Beware, your data can include noise and pollution that might not be obvious at the surface. These can include tracking bugs and data definition errors that introduce systematic bias into your metrics. Those are more difficult to account for and require you to scrub the data before using it for modeling, predictions and decisions. Speaking of which…

Next Week: Dirty Data can make being data driven very difficult! If your data is obviously wrong you can use that as an input, but what if your data is just littered with small inaccuracies and inconsistencies? Next week we’ll cover data scrubbing, which will help you clean up your data for better analysis!

“On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.”
Charles Babbage, Passages from the Life of a Philosopher
Share this Post