Acquisition & Engagement

A/B Testing With the Multi-Armed Bandit Approach: What It Is and Why You Should Be Using It

Jennifer Wong

Have you ever run an A/B test where you realized early on that one of your variations was underperforming — yet had to helplessly stand by, knowing you were losing traffic to that variation until the test was over? If so, today’s blog post is going to show you a superior way of testing so you can get more traffic to the best variations earlier (and make money doing so).

Multi-Armed Bandit Testing: What It Is and How It Works

Multi-armed bandit testing is a form of A/B testing that allows you to find the best version by:

  • Running multiple variations (i.e. arms) simultaneously
  • Potentially including different expected conversion rates for your different arms
  • Updating how much traffic is allocated to the arms during the test based on performance
  • Exploiting arms that have performed well in the past
  • Exploring new or seemingly inferior arms in case they might perform even better
    (i.e. Thompson sampling)

Here’s how it works. When you start your A/B experiment:

  • Interactions (e.g. views, clicks, and conversions) are measured for all A/B variations
  • Twice per day, variations are assessed to see how they have performed up until then
  • Based on the performance of each variation, the amount of traffic sent to that variation is adjusted going forward. Variations that are doing well receive more traffic, while underperforming variations receive less.)
  • The amount of traffic continues to adjust until one variation has achieved 95% probability to win. At this point, that variation is declared the winner and you can be confident in using that variation as your best option.

Why Multi-Armed Bandit Testing Is Your Best Option

With classical statistical hypothesis testing, you’re assuming that all variations have an equal chance at performing well. However, by now you’ve likely run enough A/B tests or read enough industry materials to know that certain variations are likely to perform better than others.

The multi-armed bandit approach takes into account that you likely have some previous experience giving you a hunch that some variations will perform better than others, and it allows you to specify some prior probability among your A/B variations.

With both methods are valid, with multi-armed bandit testing, you can:

  • Increase your efficiency: Traffic is re-allocated toward winning variations over time, so you don’t have to wait until the end of the experiment for the final answer.
  • Have a shorter testing duration: Traffic that would have gone to underperforming variations in a classical test can be redirected to high-performing variations, which helps you separate the good and bad A/B tests faster.
  • Decrease your cost: If you have an underperforming arm in your experiment, classical tests cause you to lose potential clicks and/or new customers because an equal amount of traffic will be sent to them. With multi-armed bandit testing, you can avoid this costly situation by constantly funneling traffic toward winning variations.

Never miss a thing!

Want the goods delivered straight to your inbox?
Sign up for our blog recap emails to stay in-the-know about digital marketing, analytics, and optimization.

Author
Jennifer Wong

Jennifer is the VP of Marketing at TUNE. Weekdays, she's all about helping marketers better measure their mobile campaigns. Weekends include brunch and blogging. Find all her thoughts on marketing here.