Experimentation Growth Model
The Experimentation Growth Model (also called the Experimentation Maturity Model and the Experimentation Evolution Model) describes the typical phases that product teams and organizations follow while developing technical and organizational capabilities for experimentation (such as A/B testing), in particular in the context of online software products. It can be useful for teams and organizations both for self-assessment and as a set of guidelines to improve their experimentation capabilities.
The model was developed by Aleksander Fabijan, Pavel Dmitriev, Helena Holmström Olsson and Jan Bosch, based on product teams at Microsoft, Intuit, Booking.com and several other companies, with the conclusions presented at the 2017 International Conference on Software Engineering in Buenos Aires, Argentina. Fabijan’s doctoral thesis from 2018 expands on the model, in particular by presenting research from a wider range of companies. Fabijan and Dmitriev also published an online self-assessment survey that helps teams rate themselves based on the model. Ron Kohavi and Ya Xu popularised the model in the book Trustworthy Online Controlled Experiments.
Four stages of experimentation maturity
There are four phases in the Experimentation Growth Model, depicting teams and organizations on different levels of maturity regarding experimentation capabilities:
- Crawl: In the crawl phase, experiments are mostly done through ad-hoc development, and the results are collated by logging signals. Experiments are typically limited to design choices. Teams usually work on basic instrumentation of the product to support experimentation, and need the assistance of external experts to evaluate the results. The logs gradually evolve from an ad-hoc format to something that can be used more systematically for data-driven development. The overall evaluation criteria is initially defined using a few key signals.
- Walk: In the walk phase, teams approach experimentation more systematically. Experiments are usually done through an experimentation platform, and no longer manually coded. The experimentation platform starts to support power analysis and A/A tests. Product managers get the capability to set up experiments themselves, and use the platform for pre-experiment and result analysis. External experts are no longer critical for evaluating the results, but they can act in a supervisory role. Experiments expand to more types of features, such as performance or infrastructure improvements. Product teams start to define standard metrics for their experiments. The frequency of experiments increases to about once per week. The overall evaluation criteria evolves from key signals to a structured set of metrics.
- Run: In the run phase, the product teams can run experiments at scale. They establish comprehensive metrics. Most new features and changes are evaluated through experimentation, and the frequency of experiments increases to almost daily. Teams often develop their own experimentation platforms, extending them with alerting and monitoring for bad experiments, long-term effect monitoring and support for iterative experiments, as well as introducing different types of ramp-up and shut-down criteria. The teams focus on codifying an overall evaluation criteria, usually as a single metric that captures trade-offs between multiple metrics.
- Fly: A/B tests become a norm, and get applied for almost every change. Teams can analyse most results themselves without data scientists. Experimentation platforms usually evolve for automated experiment control, such as shutting down harmful experiments. Teams start building up institutional knowledge, learning from past experiments, sharing surprising results with other teams and building a culture of experimentation across the organisation. This phase usually involves running dozens of experiments daily. The overall evaluation criteria becomes stable, and only periodically changes.
Alternatives to the Experimentation Growth Model
Brooks Bell presented a 5-stage model:
- Level 1 involves using a testing tool, typically running fewer than 10 tests per year, without a dedicated experimentation team. The tests are mostly simple.
- Level 2 is when experimentation gains some cultural traction, and teams can run 10-50 tests per year. The process and strategy for running the tests are not set or centralized. One person usually drives experimentation.
- Level 3 involves setting up a center of excellence for experimentation, and standardizing process and strategy for testing. The teams are capable of running 50-200 tests per year, and there is a high internal demand for experimentation.
- Level 4 involves making the experimentation strategy more sophisticated, and experimentation starts to gain a high level of visibility in the company. Experiments are partially decentralized, and the team is usually running 200-1000 tests per year.
- Level 5 involves fully decentralized testing, built around internal platforms, and individual teams are running thousands of tests per year. Teams have “data in their DNA”
Learn more about the Experimentation Growth Model
- The Evolution of Continuous Experimentation in Software Product Development, International Conference on Software Engineering in Buenos Aires, Argentina by Aleksander Fabijan, Pavel Dmitriev, Helena Holmström Olsson, Jan Bosch (2017)
- Data-Driven Software Development at Large Scale, from Ad-Hoc Data Collection to Trustworthy Experimentation, Doctoral thesis, Malmö university, Faculty of Technology and society by Aleksander Fabijan (2018)
- Trustworthy Online Controlled Experiments: A practical guide to A/B testing, ISBN 978-1108724265, by Ron Kohavi, Ya Xu (2020)
- Experimentation Growth Survey