A/B Testing: Multi-armed Bandit Tests

This is part 3 of a 5 part series on A/B Testing.

One of the weaknesses of A/B Testing is the time required to test only two different options. If you have many different options to choose from it can take a long time to run A/B Tests for each option (against the control) one at a time. Why can’t you run all of them at the same time?

You can, using a Multi-armed Bandit Test (also known as a multivariate test). These tests are similar to A/B Tests but are designed to test many options at the same time and to quickly move to the most effective option. It does this by dividing the problem into two parts:

Explore: This phase tests the possible options to see which performs best.

Exploit: This phase uses the best option to get the best performance.

There are a few different ways that you can run Multi-armed Bandit Tests, but here I will focus on a common method known as Epsilon-Greedy, which runs both Explore and Exploit at the same time! It does this by dividing your customers into two groups (Explore and Exploit). Typically, your Explore group will be 20% of your activity and in that group all available options will be tested side by side. The Exploit group will be the other 80% of your activity and use whichever option is performing best in the Explore tests.

Of course, at first, you have no best option so all activity will be used for Explore. However, as soon as one option shows progress it will be used for both Explore and Exploit. For this reason, the progression of your Epsilon-greedy test will look like the following:

As you can see, the test initially tests all three options at the same time, but as soon as it’s clear that Option B is the best it moves more customers to that option.

The advantage of Multi-armed Bandit Tests is that you can more quickly choose an option and avoid wasting time with sub-optimal options since it will identify and use the best options on its own. However, you need to have completely interchangeable options for this testing technique! That means it works well for testing email subject lines and colors on a website, but will be hard to apply to things like pricing and product features.

The A/B Testing series