One of the more advanced, generalized methods to analyze trends in time series data is called an ARIMA model, which is an acronym for Auto Regressive Integrated Moving Average. These models are particularly helpful in understanding how data over time can be predicted by controlling for cyclical, seasonal, and potentially irregular changes to the data. Let’s start by breaking down this model into its three components to understand what it is doing, look at an example, and then talk about the pros and cons.
Beginning in the middle, the term integrated (I) refers to the number of times the model takes a difference of data points. A difference of points is calculated by subtracting each value from the value before it. The reason this is done is to highlight the changes unexplained by trends and seasonality.
Next, the moving average (MA) model (which is actually different than the moving average calculation we discussed previously) attempts to explain the values, resulting from taking the difference of points, as movements around an average value. These movements are modeled as their own regression.
Finally, the autoregressive (AR) model is one where the values each day are assumed to be linked together. The structure of the model is very similar to the typical linear regression we talked about yesterday, where in an autoregressive model, the predicted value today is dependent on its value on the previous day(s), plus an error term to control for shocks to the system.
So, now that we know what each part of an ARIMA model does, let’s think about an example to make it all more clear. Suppose you had data for the sale of a particular good each day. This might be a good candidate for an ARIMA model because the sales of products are generally consistent each day, but have weekly cyclicality and seasonal changes. Plus, there are often shocks to the system that cause abnormal changes that persist for a short period of time, e.g., coupon offers or advertising campaigns.
- ARIMA models are extremely flexible and can be used in situations where your data has cyclical and seasonal changes, and the potential for irregular shocks.
- You need to determine the number of time periods for which to choose for each component. For example, do the sales today depend on just the sales yesterday, or both yesterday and the day before, or further back in time?
- These models only work on time series data.
Tomorrow we’ll wrap up our conversation on trendlines by talking about when not to use them.
Quote of the Day: “If it be right, do it boldly; if it be wrong, leave it undone.” – Bernard Gilpin