A/B Test Case Study: Can Split Test Results Be Trusted?
Testing the Product List Page
The image below is the control version:
We looked at our previous tests and after several hypotheses and investigations we produced two alternative variations with following changes:
A) Introduced a vertical menu that shows all subcategories for easier access to other products
B) Provided color thumbnails to products that have alternative colors
C) With help of recommendation engine (cross-sell), showed most popular products from that specific category to increase revenue from top selling items
D) Introduced a minor face-lift to filtered navigation to improve usability
What We Learned
During the experiment all variations were equally distributed to 100% of all traffic. This was another tough experiment where even 2272 transactions and 10 days did not provide a statistically significant winner. But we gathered just enough visitor and ecommerce data for a decision to be made.
Variation B was chosen because, according to Google Website Optimizer (GWO), it was converting better than the control by 7.74%
What Surprised Us: Control vs. Control
Additionally, we wanted to do a little test on GWO itself. We created another variation which was an exact copy of the control variation. Maybe it was a "statistical coincidence," but this alternative "exact" variation performed 4.97% better! We didn't do this for any other tests, and, thus, can't confirm if there is a pattern for such behavior. So this is up for discussion. Have you tried a similar A/A test and found similar results?