New Metrics for Binary Outcome AB Tests

June 15th 2016

As an engineer designing and building a system for AB tests, or as a user (Product Manager, Analyst, HiPPO) of one, you have confidence in the performance of your serving engine as well as its validity -its ability to track and eliminate events that could invalidate the outcome of the test. In spite of this technical confidence, do you have the nagging feeling that though "A is statistically better than B", the difference isn't large enough to matter? Do you have doubts about the business impacts of your test outcomes?

Are you, in addition, fed up of receiving answers like, "We are 90% confident that A is 11.1% better than B"? Wouldn't you rather set a single-valued business or revenue goal — e.g. that A be 10% better than B — and receive a definite recommendation? Do you feel overwhelmed by tech's "test everything" approach?

If you answered yes to any of these questions, then read more on the Analysis of A/B Test Results for Experiments with Binary Outcomes.

Note that the article is focussed on the analysis of the results, specifically, to generate a clear recommendation (of A over B, or not) based on observing an effect - big enough to justify a business action - in a metric that is driving net revenue.

Obviously, the first step is for the concerned parties to establish such a business metric (which may not be simply related to the measured metric) and a minimum acceptable difference between the variants being tested and compared.

The key element in our work is the implementation of a Bayesian approach to analysis, which is general enough to handle non-zero differences in a broad class of metrics and that is applicable even when the number of successful trials is small.

Finally, the business goals are recast in terms of a minimum acceptable return, which is compared to the maximum expected return from experimental observations to yield a yes or no recommendation.

At NerdWallet, we are using this approach to decide between variants tested, based on minimum increases in the average page visits per session rather than the measured "bounce rate" on the site.