Introduction to A/B and MVT: Optimization Testing 101
Part 1 provides brief overview of what site testing is and describes the differences between usability testing, A/B split testing and multivariate methods.
What is optimization testing?
First let’s talk about what it’s not.
Optimization testing is not the same as “user testing” or “usability testing.” With usability testing, you ask a small sample of Web users from your target market to perform various tasks under observation in a “lab” environment. The goal is to use qualitative research to uncover usability issues on your site, and to understand how people navigate your website.
The insight you can glean from user tests is very valuable. Often test subjects talk out loud as they perform tasks, so you can discover not just what people get stuck on but why they get stuck. For example, a comment like “I hate when sites ask for my email address in the checkout. I don’t want to receive spam” may prompt you to put a brief explanation of why you ask for an email address (to send confirmation email and receipt, rather than promotional messages).
The shortcomings of usability testing, however, are many. User testing requires time to select test participants, write questionnaires, conduct tests, analyze them and compile meaningful data. You may also need to compensate test subjects monetarily. Users are also under observation performing prescribed tasks, which may not be how your users really behave in the wild. And working with a small sample, you cannot collect statistically significant data. Quantifying the impact of a design on conversion rate, and more importantly, revenue, is impossible with usability testing.
Enter A/B and MVT
A/B split and MVT (multivariate testing) allow you to quantitatively test the impact of changes to your website on your key performance metrics. Rather than implement changes based on your gut-feel, the design bias of an agency or the opinions of a HiPPO (highest paid person in the organization), you can test your ideas on your real customers and use hard numbers to prove or disprove your hypotheses.
A classic A/B test sends 50% of traffic to a “control” version (existing page, element or process on your site), and 50% to a “treatment” or test version.
If you want to be conservative, you might show 80% your control and 20% your treatment to reduce the negative impact on your success metrics. If the experiment is a disaster, you’ve only exposed 20% of your visitors to it. However, when you veer away from a 50/50 split, you no longer have a 1-to-1 comparison. Your results will not be as reliable. Your test will also run longer to achieve statistical significance.
You may wish choose to include only a percentage of visitors in your test, or restrict your test to a certain user segment (e.g. only new visitors who are less likely to have seen your existing version).
A/B testing is not limited to version A and version B. You can test a control against up to 9 different versions. So, an A/B/C/D/E/F/G/H/I/J test is still an A/B test. The expert mathematicians have determined this maximum (I just choose to take their word for it)!
A univariate test involves multiple versions of one variable, such as a headline or shopping cart button. Univariate is an A/B split test, though you are testing multiple versions of the variable.
When multiple variables are tested at once (e.g. the combination of thumbnail image, size, and color of cart button), it’s called a multivariate test.
The number of test versions depends on A) the number of variables and B) the number of versions of each variable. For example:
Variables = thumbnail image, cart button size, cart button color
Thumbnail images = 2 (one on model, one flat image)
Button size = 2 (small, large)
Button color = 4 (red, orange, green, blue)
Total test versions = 2 x 2 x 4 = 16
V1: Model image, small, red button
V2: Model image, small, orange button
V3: Model image, small, green button
V4: Model image, small, blue button
V5: Model image, large, red button
V6: Model image, large, orange button
V7: Model image, large, green button
V8: Model image, large, blue button
V9: Flat image, small, red button
V10: Flat image, small, orange button
V11: Flat image, small, green button
V12: Flat image, small, blue button
V13: Flat image, large, red button
V14: Flat image, large, orange button
V15: Flat image, large, green button
V16: Flat image, large, blue button
Using every possible combination in your test is called “full-factorial” testing. Otherwise, you are using a "fractional factorial” design. This includes the Taguchi method and others. Fractional factorial tests may save time, as it’s quicker to reach statistical significance – but they also are not as reliable. Any version you exclude from your test is possibly the best performing. Fractional factorial testing methods were developed for the manufacturing industry, where prototypes were expensive to develop. That’s not the case with websites.
A “radical redesign” tests multiple variables at once, but is not be confused with multivariate testing. A radical redesign tests one different look-and-feel vs. another (or another and another and another…) therefore is an A/B test.
For example, you might test your existing site design against a proposed new design with a completely different navigation menu, search box, home page merchandising, calls to action and home page content.
It’s recommended that you begin with a radical redesign test before you start tweaking individual elements with univariate or multivariate testing. Your goal is to further optimize a top-performing design, rather than invest time in a sub-optimal one. It’s also a good idea to split test redesigns over time rather than just flip the switch on your customers, like Amazon did with its latest major makeover.
Now you know the differences between user testing, A/B split and multivariate testing. Join us next time for Part 2: Why Test? Discover how site testing could be the single most profitable marketing activity you could invest in.