A/A Test

An A/A test, (also known as an uniformity trial), is an experiment where two populations are exposed to the same experience (in contrast to an A/B test where the groups are exposed to different experiences). The technique is primarily useful to validate the testing infrastructure and cohort assignment for A/B testing.

Running A/A tests is a critical part of establishing trust in an experimentation platform. The idea is so useful because the tests fail many times in practice, which leads to re-evaluating assumptions and identifying bugs.

– Ron Kohavi, Diane Tang and Ya Xu, Trustworthy Online Controlled Experiments

How to perform an A/A test?

Running an A/A test is straightforward if you have the capability to run A/B tests. To run an A/A test, you should use the exact same infrastructure and tooling that you use for A/B testing, but not introduce any changes into your product.

Unlike an A/B test that generally runs only once, A/A tests require many successive re-runs. A single test might not catch intermittent or subtle issues. Likewise, an individual failure might not point to any infrastructure issues. Expect an A/A test to fail regularly. If you evaluate tests based on a 95% confidence interval, A/A tests should fail around 5% of the time.

In order to prove trustworthiness of the A/B testing system, Kohavi, Tang and Xu suggest running an A/A test ideally a thousand times, then plotting the distribution of p-values and checking if the distribution is close to uniform.

When to use A/A tests?

The primary goal of an A/A test is to ensure that the testing infrastructure, such as cohort assignment and data collection methods, works properly before proceeding with an A/B test. Although A/A tests are not necessary to start with experimentation, they are an important diagnostic metric for the testing infrastructure and enable A/B tests to run with more confidence. Adding the capability to run A/A tests is critical for scaling experimentation capabilities. In the Experimentation Growth Model, A/A tests are typically introduced in the “Walk” phase.

Periodically running A/A tests can ensure that the testing and experimentation environment is still operational, and that the experimental data corresponds to other data sources. For example, running an A/A test and comparing the total number of visitors in a test to the main product visitor analytics can show potential problems with experiment result capture.

A/A tests are also useful to check assumptions about test variables and experiment evaluation, so they are worth running every time a new type of variable is introduced into testing. If A/A tests fail at an unexpected rate, then the test evaluation calculations might be incorrect, or some variables under test do not fall into the expected distribution.

A/A tests can also be useful for detecting long-term biases or external effects. For example, in Focusing on the Long-term: It’s Good for Users and Business, Henning Hohnhold, Deirdre O’Brien and Diane Tang suggest running A/A tests before and after an A/B test to check for User Learning Effects.

In Trustworthy Online Controlled Experiments, Kohavi, Tang and Xu suggest continuously running A/A tests in parallel with other experiments, to monitor and check for “distribution mismatches and platform anomalies”.

Learn more about A/A Testing