The Case of the Disappearing Users
Today we’ll investigate median metrics, which are more robust than mean metrics as they are less sensitive to the distribution of values they are summarizing (see our post on means and medians for more information). However, they are complex in other ways, as will become clear when we start solving this case. We’ll start with the following evidence, which is a chart of the Median Session Length (in seconds) of users on our website:
Clearly something happened on March 4th that is causing users’ average time on our site to drop by almost 25% from around 160s to under 120s. Where shall we start? When we looked at mean metrics, the most useful view was of the largest population component (and smallest) which we can view below for the Source dimension (where did the users come from). In this case, our largest source was Google and our smallest was Twitter:
Hmmm… that is not as useful as it was for mean metrics. In fact, neither of those moved in ways even remotely like the overall metric! This is because medians are not calculated in the same way, and in fact aren’t really calculated at all. The median is simply the middle value  after sorting a set of values:
Since Medians are only concerned with the order of values, the top or bottom values can change significantly and not affect the Median because they don’t change the order. For example, Google might account for a large portion of users but if their values are all in the top quartile any changes that occur will not change the ordering of the middle value, so it does not affect the median. With medians, we need to focus on the composition of our users first and the metrics second.
Let’s look at how the population of sources (share of users) has changed around the time of the drop in Median Session Length:
|Source||Share as of March 3rd||Share as of March 4th|
Ah, this gives us more insight. Between March 3rd and 4th there was a big shift in the share of users between Facebook and Pinterest. Let’s look at those two sources compared to the overall metric (which is in faint blue so you can see the overlap):
Ah-ha! It looks like the Median Session Length shifted from the Facebook metric to the Pinterest metric on March 4th because Pinterest users became the median users instead of Facebook users. he population analysis was critical since median metrics are so dependent on populations.
Case closed! These kinds of mysteries can be fun, but only if you find the right answer. Finding that right answer gets harder as our metrics get more complex, so tomorrow we’ll cover how to do this kind of breakdown with even more complex metrics.
 If the data has an even number of values, then the median is computed as the mean of the two middle values.
Quote of the Day: “Think of how stupid the average person is, and realize half of them are stupider than that.” ― George Carlin