CUPED stands for Controlled-experiment Using Pre-Experiment Data. It's a technique developed at Microsoft that reduces the variance in your test metric by removing noise that pre-experiment behavior already explains — making your A/B tests more sensitive without requiring more traffic.
The core insight: much of the variation in user behavior during an experiment is predictable from how those same users behaved before the experiment started. If you can measure and remove that pre-existing noise, you end up with a tighter, cleaner signal from the experiment itself.
Standard A/B vs. CUPED
| Standard A/B Test | CUPED | |
|---|---|---|
| Variance reduction | None | 20–60% typical |
| Sample size impact | Baseline | 20–60% smaller |
| Data required | Experiment period only | Pre-experiment period + experiment |
| Implementation complexity | Low | Medium |
| Works best for | Any metric | High-autocorrelation metrics |
How It Works
-
Collect pre-experiment data. Before the test starts, record the same metric you plan to measure — typically over the past 1–4 weeks — for each user who will enter the experiment
-
Compute the adjustment coefficient. Fit a regression of the in-experiment outcome on the pre-experiment covariate. This gives you a coefficient θ that captures how much of the outcome is predicted by prior behavior
-
Adjust each user's outcome. For each user in the experiment, compute:
adjusted_outcome = outcome - θ × (pre_metric - mean(pre_metric))where θ is the regression coefficient from step 2 — it scales how much weight to give the pre-experiment correction -
Run your test on adjusted outcomes. Use the adjusted values exactly as you would the raw metric — compute means per variant, run a t-test or z-test, report confidence intervals
The adjustment doesn't change the expected value of the metric; it only reduces the variance. This means your point estimates stay the same, but your confidence intervals shrink — which is equivalent to needing less data to reach the same level of confidence.
When to Use CUPED
CUPED is most valuable when:
- A stable pre-experiment period exists (users have prior sessions or transaction history)
- Your metric has high autocorrelation — meaning past behavior predicts future behavior. Revenue per user and session engagement typically qualify; one-time signup events typically don't
- You're traffic-constrained and need to run faster experiments
- Your testing platform supports it (Statsig, Eppo, and Experimentation Platform at large companies often have it built in)
Limitations
- Requires historical data. New users or new products without prior behavioral data can't benefit from CUPED. You need a lookback window long enough to be meaningful
- Adds implementation complexity. You need to join pre-experiment data to experiment assignments, which requires an engineering step beyond standard A/B test setup
- Metric-dependent gains. If your outcome metric has low autocorrelation — meaning past behavior doesn't predict the metric well — CUPED provides little benefit. Test it empirically before relying on it
- Not a substitute for sample size. CUPED reduces the sample you need, but doesn't eliminate the need for sufficient traffic. At very low volumes, you still need to combine it with other strategies like Bayesian testing or qualitative research
For teams running experiments at the margins of statistical viability, CUPED is one of the highest-ROI investments available — especially when paired with sequential testing for valid continuous monitoring.