A/A Test
An A/A test, (also known as an uniformity trial), is an experiment where two populations are exposed to the same experience (in contrast to an A/B test where the groups are exposed to different experiences). The technique is primarily useful to validate the testing infrastructure and cohort assignment for A/B testing.
Running A/A tests is a critical part of establishing trust in an experimentation platform. The idea is so useful because the tests fail many times in practice, which leads to re-evaluating assumptions and identifying bugs.
– Ron Kohavi, Diane Tang and Ya Xu, Trustworthy Online Controlled Experiments
How to perform an A/A test?
Running an A/A test is straightforward if you have the capability to run A/B tests. To run an A/A test, you should use the exact same infrastructure and tooling that you use for A/B testing, but not introduce any changes into your product.
Unlike an A/B test that generally runs only once, A/A tests require many successive re-runs. A single test might not catch intermittent or subtle issues. Likewise, an individual failure might not point to any infrastructure issues. Expect an A/A test to fail regularly. If you evaluate tests based on a 95% confidence interval, A/A tests should fail around 5% of the time.
In order to prove trustworthiness of the A/B testing system, Kohavi, Tang and Xu suggest running an A/A test ideally a thousand times, then plotting the distribution of p-values and checking if the distribution is close to uniform.
When to use A/A tests?
The primary goal of an A/A test is to ensure that the testing infrastructure, such as cohort assignment and data collection methods, works properly before proceeding with an A/B test. Although A/A tests are not necessary to start with experimentation, they are an important diagnostic metric for the testing infrastructure and enable A/B tests to run with more confidence. Adding the capability to run A/A tests is critical for scaling experimentation capabilities. In the Experimentation Growth Model, A/A tests are typically introduced in the “Walk” phase.
Periodically running A/A tests can ensure that the testing and experimentation environment is still operational, and that the experimental data corresponds to other data sources. For example, running an A/A test and comparing the total number of visitors in a test to the main product visitor analytics can show potential problems with experiment result capture.
A/A tests are also useful to check assumptions about test variables and experiment evaluation, so they are worth running every time a new type of variable is introduced into testing. If A/A tests fail at an unexpected rate, then the test evaluation calculations might be incorrect, or some variables under test do not fall into the expected distribution.
A/A tests can also be useful for detecting long-term biases or external effects. For example, in Focusing on the Long-term: It’s Good for Users and Business, Henning Hohnhold, Deirdre O’Brien and Diane Tang suggest running A/A tests before and after an A/B test to check for User Learning Effects.
In Trustworthy Online Controlled Experiments, Kohavi, Tang and Xu suggest continuously running A/A tests in parallel with other experiments, to monitor and check for “distribution mismatches and platform anomalies”.
Learn more about A/A Testing
- Focusing on the Long-term: It's Good for Users and Business, from the Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining by Henning Hohnhold, Deirdre O'Brien, Diane Tang (2015)
- Trustworthy Online Controlled Experiments: A practical guide to A/B testing, ISBN 978-1108724265, by Ron Kohavi, Ya Xu (2020)