Type I Error | Surface AI Hub

A Type I error occurs when a test declares a winner that isn't actually better. You observed a difference in the data, but that difference was due to random noise — not a real effect of your change.

In A/B testing, a Type I error means you ship a "winning" variant that produces no improvement (or actively hurts performance) in production.

The Significance Level Controls Type I Error Rate

The significance level (α) is the maximum acceptable probability of a Type I error. At α = 0.05, you accept a 5% chance of falsely declaring a winner on any given test.

Significance level	Type I error rate	Confidence level
α = 0.10	10%	90%
α = 0.05	5%	95%
α = 0.01	1%	99%

The stricter your threshold, the larger the sample you need to cross it.

What Inflates Type I Errors

Several common practices push the real false positive rate above the nominal α:

Peeking — Stopping a test early when it looks significant inflates false positives dramatically. A test checked daily at α = 0.05 can have a true false positive rate above 25%.
Multiple metrics — Testing 10 metrics simultaneously at α = 0.05 gives roughly a 40% chance of at least one false positive (use Bonferroni correction or a primary metric).
Multiple variants — Each pairwise comparison between variants adds a chance of a false positive (use a multiplicity correction or hold α constant across all comparisons).

Reducing Type I Errors

Pre-register your primary metric before launching
Run the test for its planned duration without early stopping
Use a multiplicity correction when testing several metrics
Raise your confidence threshold (e.g., α = 0.01) for high-stakes decisions

Type I errors are the most common mistake in CRO programs. The cure is discipline: commit to a runtime, commit to a primary metric, and don't peek.