Regression to the Mean

Regression to the mean is the statistical tendency for extreme measurements to move closer to the average on subsequent observations, which can mislead CRO teams into attributing natural variation to a test change.

Regression to the mean describes a simple but frequently misunderstood phenomenon: if you measure something at an extreme value, the next measurement will likely be closer to the long-run average — not because anything changed, but because the first measurement was partly luck.

In CRO, this creates a trap. Pages with unusually low conversion rates tend to recover on their own. If you run a test during the low period, the natural recovery can look like your change caused a lift.

A Concrete Example

Your checkout page has a 3.2% average conversion rate. Last week it dropped to 2.1% — probably due to random variation, a seasonal dip, or an unusual traffic mix. You launch a test with a new CTA button.

Week 2: conversion rate climbs to 2.9%. Your test shows a 38% lift. Significant?

Not necessarily. The recovery from 2.1% toward the 3.2% mean would have happened regardless of your test. You're observing regression to the mean, not a causal effect.

Why It Matters for CRO

ScenarioWhat looks likeWhat's actually happening
Low-traffic week triggers a testVariant "wins"Traffic normalizes naturally
Worst-performing pages targetedAll variants improveBase rates regress to average
Seasonal low followed by testLift attributed to variantSeasonal recovery

How to Guard Against It

  • Randomize test timing — Don't launch tests specifically because a metric just hit a low point
  • Use pre-test periods — Check whether the metric was already trending back up before your test started
  • Require a stable pre-test baseline — Run the test only when the control metric has been stable for at least one week
  • Trust the A/A test — Running control vs. control shows you how much variation is normal, helping calibrate whether observed lifts are meaningful

Regression to the mean is one reason CRO programs should use controlled experiments rather than simple before/after comparisons.