A/A Test

An A/A test, (also known as an uniformity trial), is an experiment where two populations are exposed to the same experience (in contrast to an A/B test where the groups are exposed to different experiences). The technique is primarily useful to validate the testing infrastructure and cohort assignment for A/B testing.

Running A/A tests is a critical part of establishing trust in an experimentation platform. The idea is so useful because the tests fail many times in practice, which leads to re-evaluating assumptions and identifying bugs.

– Ron Kohavi, Diane Tang and Ya Xu, Trustworthy Online Controlled Experiments

How to perform an A/A test?

Running an A/A test is straightforward if you have the capability to run A/B tests. To run an A/A test, you should use the exact same infrastructure and tooling that you use for A/B testing, but not introduce any changes into your product.

Unlike an A/B test that generally runs only once, A/A tests require many successive re-runs. A single test might not catch intermittent or subtle issues. Likewise, an individual failure might not point to any infrastructure issues. Expect an A/A test to fail regularly. If you evaluate tests based on a 95% confidence interval, A/A tests should fail around 5% of the time.

In order to prove trustworthiness of the A/B testing system, Kohavi, Tang and Xu suggest running an A/A test ideally a thousand times, then plotting the distribution of p-values and checking if the distribution is close to uniform.

When to use A/A tests?

The primary goal of an A/A test is to ensure that the testing infrastructure, such as cohort assignment and data collection methods, works properly before proceeding with an A/B test. Although A/A tests are not necessary to start with experimentation, they are an important diagnostic metric for the testing infrastructure and enable A/B tests to run with more confidence. Adding the capability to run A/A tests is critical for scaling experimentation capabilities. In the Experimentation Growth Model, A/A tests are typically introduced in the “Walk” phase.

Periodically running A/A tests can ensure that the testing and experimentation environment is still operational, and that the experimental data corresponds to other data sources. For example, running an A/A test and comparing the total number of visitors in a test to the main product visitor analytics can show potential problems with experiment result capture.

A/A tests are also useful to check assumptions about test variables and experiment evaluation, so they are worth running every time a new type of variable is introduced into testing. If A/A tests fail at an unexpected rate, then the test evaluation calculations might be incorrect, or some variables under test do not fall into the expected distribution.

A/A tests can also be useful for detecting long-term biases or external effects. For example, in Focusing on the Long-term: It’s Good for Users and Business, Henning Hohnhold, Deirdre O’Brien and Diane Tang suggest running A/A tests before and after an A/B test to check for User Learning Effects.

In Trustworthy Online Controlled Experiments, Kohavi, Tang and Xu suggest continuously running A/A tests in parallel with other experiments, to monitor and check for “distribution mismatches and platform anomalies”.

Learn more about A/A Testing

Focusing on the Long-term: It's Good for Users and Business, from the Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining by Henning Hohnhold, Deirdre O'Brien, Diane Tang (2015)
Trustworthy Online Controlled Experiments: A practical guide to A/B testing, ISBN 978-1108724265, by Ron Kohavi, Ya Xu (2020)