Data Insights: What are Anomalies and why do they matter?

This is part 2 of our series on Data Insights, previous segments are available in our archives.

The simplest kind of insight is an anomaly, or a single point that does not fit the normal pattern of its own historic data. [1] Consider the following metric over time:

Raw data that contains hard to see anomalies

There is an obvious anomaly in May where the value drops significantly. However, it might not always be clear from the data what is “normal”. To help with this, we can add a linear model like a regression or a simple moving average, which shows us the general trend over time. Below is the same data with a moving average trendline (green line) and confidence interval (light green) added:

Data with modeling applied to help find anomalies

That makes the anomaly easy to see, but how can you detect it automatically? The easiest way is to calculate the residuals for each point. The residual is simply the difference between the actual value and the trendline value for the same time. Below are the residuals for each point:

Residuals that make the anomalies easy to see

Looking at the data this way, the anomaly in May is easy to detect as it has a significantly higher residual than any other point (highlighted in orange).

Anomalies are obvious indications that something is changing, because there must be some cause for the metric to shift so significantly away from the previous pattern.

Tomorrow we’ll talk about a more interesting insight which involves more than one data point, developing trends.


[1] We covered anomaly detection before, in our series on Ad Campaign Optimization, so if this is too advanced you are welcome to use the simpler approach we covered there.


Quote of the Day: “Any fool can know. The point is to understand.” ― Albert Einstein