With your list of potential factors in hand, it’s time to start whittling them down to the highest likelihood causes of the change. It might not be possible to evaluate hundreds of potential factors, but if we can reduce that to a few dozen it becomes a manageable problem. The best place to start is the metric that indicated the change in the first place and the segments that compose it.
Let us return to our example of Sean’s Snowshoes, a retail store chain that saw a drop in revenue. Revenue itself is easy to understand since it is the total amount of cash we earn from our sales. However, revenue can be broken down by many different factors:
- Store Location
- Customer Location
- Day of the Week
By looking at revenue by every dimension (and all combinations of dimensions) it should become clear which segments contributed to the drop. Was it a particular store? Some specific products? A specific day or time? You want to identify which segments changed at the same time as the overall change, and of those which have the largest populations.
The following is a chart of some select dimensions of revenue for Sean’s Snowshoes: revenue in California, revenue from selling scarves and revenue from Store #456.
This plot helps us identify which dimensions were drivers of the change and which were likely side effects:
- Total Revenue (blue) dips significantly on the right. Our goal is to find the root cause of this change.
- Revenue in California (yellow) is clearly a significant part of total revenue, and it does dip at the same time as Total Revenue. However, it does not drop nearly as much as total revenue and it recovered quickly, so it doesn’t look like the source of the drop.
- Revenue from Scarves (purple) dropped significantly at the same time as total revenue, but it’s a small fraction of total revenue. Such a small fraction of total revenue is more likely to be a side effect than a root cause.
- Revenue from Store #456 (green) is a significant portion of overall revenue and dropped significantly when total revenue dropped. This is a clear candidate for the source of the drop.
Once you have the top segments that seem related to the change, you can use it to select the factors that are most likely to have affected those segments. If we revisit our list of potential factors from yesterday, we can eliminate the ones that could not have affected Store #456. Specifically, any changes that would affect all stores are less likely to contribute than factors which might have affected this store in particular:
January 21st: We started a new marketing campaign. (Internal)
- January 20th: Construction started at some of our largest stores. (External)
January 20th: One of our major marketing campaigns ended. (Internal) January 19th: We started a new online coupon promotion. (Internal) January 18th: A massive snowstorm hits all of our locations. (External)
- January 17th: New managers hired at Stores #421, #439 and #456. (Internal)
- January 10th: The competition lowered their prices in select locations. (External)
Of course, in your analysis you will have hundreds or thousands of different dimensions and combinations to examine. Plotting them as I did here would be infeasible as the number of dimensions related to each metric would be too large. To help simplify the problem, you can use a technique from our series on clustering called Hierarchical Clustering. The goal of hierarchical clustering is to summarize your data in the form of most significant clusters in a hierarchy, which is a great way to highlight which of your dimensions are likely related to a change. As long as you cluster by a combination of percentage of total and magnitude of change, the hierarchy should be able to do what we did through observation above. 
Tomorrow we will cover how to take the handful of high likelihood factors we have identified and identify the root cause within them.
 If you are interested in learning more about this, Proctor & Gamble published an interesting paper on the use of clustering in root cause analysis.
Quote of the Day: “The search for a scapegoat is the easiest of all hunting expeditions.” ― Dwight D. Eisenhower