The peeking problem occurs when you check a test's p-value repeatedly during its runtime and stop the experiment as soon as it crosses the significance threshold. This practice — common in practice, wrong in theory — dramatically inflates your false positive rate.
A fixed-horizon test run to completion at α = 0.05 has a 5% chance of a false positive. A test checked daily and stopped at first significance can have a false positive rate of 25–30% — even though the displayed p-value shows 0.05.
Why Peeking Inflates False Positives
Statistical significance thresholds assume you collect all data, then run one test. Each time you peek and compute a p-value, you're running another test on the same data. Multiple comparisons inflate the chance of seeing an extreme result by chance.
Think of it like flipping a coin 100 times and stopping whenever you see 6 heads in a row — you'll stop far more often than the probability of any single 6-head sequence suggests.
How Often Peeking Happens
The false positive rate scales with how often you peek:
| Peeks at intermediate points | True false positive rate (nominal α = 0.05) |
|---|---|
| 0 (run to completion) | 5% |
| 5 | ~14% |
| 10 | ~19% |
| 20+ | ~25%+ |
Solutions
Sequential testing / always-valid p-values — Methods like mSPRT (mixture Sequential Probability Ratio Test) or e-values are designed to be checked at any time without inflating error rates. Several modern platforms use these by default.
Pre-commitment — Decide your sample size and runtime upfront. Write them down. Don't stop the test until the planned endpoint, regardless of interim results.
Bayesian A/B testing — Bayesian methods produce probabilities (e.g., "85% chance variant beats control") that can be interpreted at any point without the same peeking inflation, though they have their own assumptions.
Alpha spending — Formal interim analysis procedures that "spend" portions of the Type I error budget at each checkpoint, maintaining the overall rate at α.
The easiest fix is also the most discipline-demanding: set your runtime before you start, and don't look at the results until it's done.