A/B Testing: Population Sampling

This is part 2 of our series on A?B Testing, previous segments are available in our archives.

When running an A/B Test, the most important question to answer is how big your groups need to be so that the result is reliable. In other words, how many observations do the control and experimental groups each need to be so that you can confidently trust the result?

Spoiler Alert: Generally speaking, the answer is, a lot more than you think!

Statisticians have developed a numerical technique, called Power Analysis (also known as a sensitivity analysis), in order to determine the number of observations needed. Power Analysis relies on two inputs, the size of the change you expect to measure and the numerical confidence you want in your results. For example, you might expect to detect a 5% change in customer behavior that you want to determine with 95% confidence. However, I won’t go into the mathematics of how Power Analysis works (there is a great example here) because you should use some software to calculate it for you.

One of the sobering parts of Power Analysis is that you quickly learn that you need a LARGE sample to have trustworthy results. Much larger than you would have guessed! This is one reason that email providers recommend that, when testing email subjects, each of your control and experiment groups be at least 5,000 customers! That means, to run a single A/B Test, you need to have at least 10,000 total email recipients.

What do you do if you don’t have 10,000 customers? You need to relax one of your constraints. Either you need to look for bigger changes (20% instead of 5%) or lower your confidence requirement (80% instead of 95%). It will mean your results are less reliable but may be the best you can do with the audience you have.


Quote of the Day: “Now everybody’s sampling.” – Missy Elliot