Novelty Effect
A Novelty Effect is a short-term effect causing bias in A/B test results when checking the impact of a new feature. If the feature is noticeable, users will be interested to try it out, so early usage data might not be representable of general trends. The A/B test variant containing that feature might seem to perform well at first, but the effect quickly declines over time.
Ron Kohavi and co-authors describe the novelty effect in Trustworthy Online Controlled Experiments, as one of the two key threats to external validity of a controlled experiment (the second being the opposite, the Primacy Effect, causing bias because existing users are accustomed to how things already work).
Examples of Novelty Effects
A common cause of novelty effects is introducing new features based on competing applications with different demographics (for example, a business-oriented social media platform introducing a new “story” feature, typical for consumer social media platforms). During the first few days, eager users try out the new story format to experiment with it, increasing engagement metrics. However, after the initial exploration phase, many users return to their usual habits, as the story format is a poor fit for the business audience, causing the engagement boost to disappear. The A/B test results, if taken too early, might incorrectly suggest the feature has a positive impact.
Another common cause of the novelty effect are significant intrusive visual changes, such as new product recommendation layouts. Users might find these changes fresh and exciting, leading to a short-term increase in clicks and purchases. However, as users get accustomed to the new layout, the excitement subsides, and the metrics return to the same levels as before the changes. In such cases, interpreting A/B test results early may overstate the success of these visual updates, causing the business to misjudge their value.
Gamified features and gimmicks often trigger a novelty effect. A fitness app, for example, might add a point-based reward system for daily activities. Initially, users may engage more frequently, eager to collect points and explore the new gamification feature. However, as the system becomes familiar, particularly if the gamified feature does not provide any intrinsic rewards, the novelty wears off and users may start to ignore the new feature, leading to a significant drop in engagement. Relying on early A/B test data in this scenario might give the false impression that the feature is a major long-term driver of engagement when it’s actually a temporary surge due to novelty.
Detecting Novelty Effects
A good way to spot the bias caused by the novelty effect is to check if the usage of a new feature is increasing or decreasing over time. If the difference between the variant and control is decreasing over time, a potential cause is that some users try it out of general interest but do not find it useful enough to continue engaging. The novelty effect is wearing off.
Another option to detect possible novelty effect biases is to separately track the difference between the variants in an A/B test for the users who appeared in the first few days of a test, and visualise that over time. Initial users will likely suffer from the novelty effects more than later users.
Running a hold-out experiment for a longer period of time (usually months) can prove (or disprove) that initial test conclusions were biased. It’s an expensive but reliable way for checking the time-base effects and biases.
Learn more about the Novelty Effect
- Trustworthy Online Controlled Experiments: A practical guide to A/B testing, ISBN 978-1108724265, by Ron Kohavi, Ya Xu (2020)