Bayesian A/B Testing | Surface AI Hub

Bayesian A/B testing is an alternative to the classical (frequentist) approach. Instead of asking "is there enough evidence to reject the null hypothesis?", it asks "given what we've observed, what is the probability that variant B is better than control A?"

The output is an intuitive probability: "There is an 87% chance variant B beats control."

Frequentist vs. Bayesian

	Frequentist A/B Testing	Bayesian A/B Testing
Primary output	p-value, confidence interval	Probability B > A, credible interval
Null hypothesis	Required	Not used
Prior knowledge	Ignored	Incorporated as prior distribution
Peeking	Inflates false positives	More robust to continuous monitoring
Interpretability	Counterintuitive	Directly interpretable
Sample size	Fixed upfront	Can be flexible

How It Works

Set a prior — Encode your belief about the baseline conversion rate before the test starts (often a weak, uninformative prior)
Collect data — Observe conversions in control and variant
Update the posterior — Using Bayes' theorem, combine the prior with observed data to get a probability distribution over possible conversion rates
Read the result — The posterior gives you "probability variant beats control" and an expected lift estimate

When to Use Bayesian Methods

Bayesian testing works well when:

You need to make decisions continuously rather than at a fixed endpoint
You want intuitive outputs stakeholders can act on without statistics training
You have a prior sense of the baseline rate and want to incorporate it
You're running many tests and want to learn across experiments

Limitations

Results depend on the choice of prior — a strong incorrect prior can mislead
Harder to audit and reproduce than a simple p-value calculation
"Probability B > A" is not the same as "B will beat A in production" — there's still uncertainty in the estimate
Not universally accepted as a replacement for frequentist methods in regulated industries

Many modern experimentation platforms (Optimizely, VWO, Google Optimize successors) offer Bayesian modes alongside frequentist tests.