Win probability is the estimated chance that a variant is truly better than the control (or other variants) based on the data collected so far. It's expressed as a percentage — a variant with 92% win probability has a 92% chance of being the real winner.
How Win Probability Works
As an experiment runs and collects data, the platform continuously calculates the likelihood that each variant is the best performer. Early in the test, win probabilities fluctuate as data is sparse. As more visitors are included, the estimates stabilize.
For example, in a test with two variants:
- Day 1: Variant A 55%, Variant B 45% — Too early to call
- Day 5: Variant A 78%, Variant B 22% — Variant A is pulling ahead
- Day 12: Variant A 96%, Variant B 4% — High confidence that A is the winner
Win Probability vs. Statistical Significance
Both measure confidence in a result, but they express it differently:
| Win Probability | Statistical Significance | |
|---|---|---|
| What it says | "There's a 94% chance variant A is better" | "We can reject the null hypothesis at 95% confidence" |
| Scale | 0–100% per variant | Binary threshold (significant or not) |
| Updates | Continuously as data arrives | Evaluated at a fixed sample size |
| Intuition | Easy to understand | Requires statistical background |
Win probability is often more intuitive for non-technical stakeholders. Saying "there's a 94% chance this variant wins" is easier to act on than "p = 0.03."
How Platforms Calculate It
Most platforms use Bayesian statistics to compute win probability. The process:
- Start with a prior assumption (e.g., both variants are equally likely to win)
- Update the probability as conversion data comes in
- Simulate thousands of possible outcomes based on the observed data
- Report the percentage of simulations where each variant wins
This approach naturally accounts for uncertainty — with little data, probabilities stay near 50/50. With lots of data, they converge toward the true winner.
When to Act on Win Probability
There's no universal threshold, but common guidelines:
- Below 80% — Too early. Keep collecting data.
- 80–90% — Directionally confident. Consider acting for low-risk changes.
- Above 95% — High confidence. Safe to ship for most decisions.
The right threshold depends on the stakes. A minor copy change might be worth shipping at 85% win probability. A major pricing overhaul should probably wait for 95%+.