Type II Error

A Type II error is a false negative in hypothesis testing — failing to detect a real effect because the test lacked sufficient power or sample size.

A Type II error happens when a test concludes "no significant difference" — but a real improvement actually existed. You missed it. In CRO, this means discarding a variant that would have genuinely increased conversions if you'd given the test enough data.

The probability of a Type II error is denoted β. Statistical power is its complement: power = 1 − β.

Type I vs. Type II Errors

Test declares: no differenceTest declares: significant difference
No real effectCorrect (true negative)Type I error (false positive)
Real effect existsType II error (false negative)Correct (true positive)

Both errors have costs. Type I errors waste engineering effort shipping changes that don't help. Type II errors cause you to miss real wins.

Common Causes

  • Underpowered tests — Not enough sample size to detect the effect size you care about. This is the most frequent cause.
  • Too short a runtime — Stopping the test before reaching the required sample size.
  • High baseline variability — Metrics with high variance (like revenue) require larger samples than stable metrics (like click rate).
  • MDE set too conservatively — If you planned to detect a 30% lift but the true effect is 12%, you won't see it.

How to Avoid Type II Errors

  1. Use a sample size calculator before launching — Input your baseline rate, desired MDE, α, and target power (typically 0.80)
  2. Never stop a test early — Let it run to the planned sample size
  3. Choose a realistic MDE — Base it on historical test results and business thresholds
  4. Segment high-variance metrics — For revenue-based metrics, consider running on a sub-segment with more stable behavior

The Practical Cost

A CRO program running underpowered tests accumulates Type II errors silently. You see a long list of "no significant results" and conclude the site is optimized — when in fact you've been running tests that couldn't detect the real improvements you were making.