How We Chose Growthbook as Our New Experimentation Platform
Oct 25, 2023
Travis White and Julian Castor, Engineers
For the past two years, Alto has relied on Optimizely — along with an internal feature flag tool — to run experiments and control rollouts. While testing features across our mobile and web apps, backend services, and internal tooling, we discovered key pain points with our current experimentation solutions:
Optimizely centers on users, rather than generic resources. To roll out a feature to a specific facility or clinic, we relied on our internal feature flag tool instead, which meant there wasn’t a single source of truth for both experimentation and feature flagging.
Optimizely focuses on simple conversion metrics, such as “Did the user click the button?” However, many of the metrics important to Alto are more nuanced, requiring additional platforms like Amplitude and Looker for metric analysis.
Protecting PHI and maintaining HIPAA compliance are fundamental to Alto’s mission. Because Optimizely does not provide sensitive data handling as a feature, we had to develop custom solutions to prevent sensitive data from being sent to Optimizely.
Given these pain points, we determined that our ideal framework would provide:
An easy interface for developers to set up experiments and feature flags across platforms
An intuitive UI for test configuration and analysis
The necessary security to protect PHI
Since Optimizely could no longer accommodate our needs as an organization, we set out to find a better fit.
To start, a group of engineers from across Alto’s product teams researched top experimentation frameworks and their implementation costs. For each platform, one engineer investigated the following questions:
What is the platform’s estimated cost, based on the size of our product and technology organization?
What is the estimated effort of integrating into our backend services and frontend clients?
How easily can it integrate into our current data pipeline?
Is it an easy tool to use? How quickly can a new developer set up an experiment and start monitoring results?
Does it only support A/B test bucketing? Or does it also include experimentation, metrics, and evaluation?
How scalable is the platform? What failsafes does it have against client network connection failure, stale results caching, and other risks?
How secure is the platform with PHI? Does the company offer a business associate agreement (BAA)?
After consulting with platform representatives and reading documentation and developer reviews, we built a matrix to compare each option against our most important needs.
We quickly eliminated several options that did not meet our requirements and moved on to building something locally. Our top candidates to replace Optimizely included Amplitude Experiment, Split, LaunchDarkly, and Growthbook. Each engineer investigated a platform with three objectives:
Writing a draft pull request with a test experiment
Taking notes on the developer experience
Documenting their opinion on the platform’s effectiveness as a replacement
Throughout these trials, we actively communicated with representatives from each platform to ensure testing went well.
When we reconvened a few weeks later, each engineer presented their findings and made a case for or against their assigned platform. Their arguments were based on both the objective data gathered during the initial exploration as well as their subjective experience of implementation. From the presentations, two clear frontrunners emerged: Amplitude Experiment and Growthbook.
Strengths of Amplitude Experiment:
Our product and engineering organization already uses Amplitude Analytics.
Experiment and analytics are integrated and access the same data to create one all-encompassing tool.
Alto and Amplitude already have a BAA in place.
Strengths of Growthbook:
As a self-hosted open source tool, it is cost-efficient and presents the lowest possible risk to data security.
Our engineering team is already comfortable supporting similar open source tools.
Trials on production
There is no substitute for a test in prod — and fortunately, both Growthbook and Amplitude Experiment support trials with live production data. We knew building a proof of concept for each platform would be time-consuming: We needed collaboration from security, data, infrastructure, and platform engineering teams; time to implement; and at least two weeks to run a useful sample experiment. In particular, Growthbook required us to set up a self-hosted instance of the application, as well as a new Snowflake data warehouse with metrics data.
However, the month of trialing was well worth the firsthand understanding of how each platform would integrate with Alto’s existing architecture and what a migration would take. Ultimately, we aligned on Growthbook as the experimentation team’s preferred A/B testing and feature flag platform for a few reasons:
Other than the self-hosted application, it would not cost anything to run Growthbook. (The application itself is bundled together in one Docker image here.)
Given the flow of data, Growthbook itself does not store any sensitive patient information, and all PHI remains in Alto’s existing Snowflake warehouses.
Using an open source technology also aligns with the engineering organization’s broader movement toward self-hosted open source tooling. We benefited from our infrastructure and platform team’s recent migration from third-party vendors to self-hosted Grafana tooling.
At key points during our decision-making process, we presented our findings to Alto’s wider engineering organization, so that every team had an opportunity to weigh in. As a result of the feedback we received, for example, we decided to add an abstraction layer on top of the Growthbook implementation. If Alto needs to move to a different A/B testing platform at any point in the future, the abstraction layer will allow us to easily swap out the underlying implementation without changing all the call sites. As long as any future implementation uses the same interface, any equally large migration would have a negligible impact on product engineering.
With our final proposal in hand, we revisited feedback and made sure all questions and concerns were addressed. We were able to align the engineering organization around our preferred choice, a self-hosted Growthbook instance with a frontend and backend abstraction layer.
On the backend, we’d spin up a new Experimentation Rails Engine — a miniature application that provides functionality to the host application. On the frontend, we’d spin up a new Typescript Library to be used in all our React and React Native applications. The engine and library would declare abstract interfaces that any implementing class would need to adhere to, establishing a contract for functionality and naming conventions.
Today and beyond
We now have our self-hosted Growthbook application up and running with engineering teams adding their own feature flags and experiments. We also have a healthy backlog of exciting changes, including:
Annotations on our Grafana dashboards based on experiment changes
Slack webhooks to notify channels when it’s time to clean up an experiment
Improved metrics to measure experiment success
With the current set-up and abstraction layer, we can easily add, modify, and remove other experimentation platforms if needed. As our engineering organization becomes more familiar with the platform, the foundation of a strong, experiment-first culture and tailored tools will enable our teams to iterate and learn even faster.