Seasonality: Bad Data

This is part 5 of our series on Seasonality, previous segments are available in our archives.

One of the great challenges of seasonal analysis is that it, by definition, spans years of data. Over those years the type of data you collect, how you collect it and where you store it will likely change. This means you need to deal with changes in the nature of your data in addition to the analysis you are trying to perform.

This kind of data pollution can make it appear as if you have seasonality in your data even when you do not. For example, if you have an annual database migration that happens on the same day every year and takes down your service for most of the day, you might see an annual dip in your business metrics that is not seasonality but your own process! This can also be true of financial audits that freeze accounting or company warehouse closures.

So, how do you separate real seasonal changes from bad data?

  • First, you should eliminate other possible causes before attributing metric changes to seasonality.
  • Second, you should validate your findings about seasonality using market data (such as the data we discussed previously). While you might see seasonality unique to your business, it’s more likely that you share the same seasonality as similar businesses.
  • Finally, always store your metric definitions along with your data. That way, when you migrate to new systems and need to revisit older data in old systems you can easily remember the definitions relevant to a specific data store. Without this Rosetta stone it may be hard to translate between different years of data.

Beware, your data can include noise and pollution that might not be obvious at the surface. These can include tracking bugs and data definition errors that introduce systematic bias into your metrics. Those are more difficult to account for and require you to scrub the data before using it for modeling, predictions and decisions. Speaking of which…

Next Week: Dirty Data can make being data driven very difficult! If your data is obviously wrong you can use that as an input, but what if your data is just littered with small inaccuracies and inconsistencies? Next week we’ll cover data scrubbing, which will help you clean up your data for better analysis!


Quote of the Day: On two occasions I have been asked, ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ … I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.” — Charles Babbage, Passages from the Life of a Philosopher