Thompson Sampling | Surface AI Hub

Thompson sampling is a strategy for solving the multi-armed bandit problem: given several variants with unknown conversion rates, how do you allocate traffic so that you maximize total conversions while still learning which variant is best?

Instead of splitting traffic evenly (as an A/B test does) or always serving the current leader (which risks locking in a false winner), Thompson sampling serves each variant in proportion to the probability that it is the best one. As evidence accumulates, that probability shifts, and traffic follows.

How It Works

Thompson sampling is a Bayesian method. For each variant it maintains a probability distribution over the variant's true conversion rate, then repeats a simple loop for every visitor:

Sample — Draw one random conversion rate from each variant's current distribution.
Serve — Show the visitor the variant with the highest sampled value.
Observe — Record whether the visitor converted.
Update — Revise that variant's distribution using the new observation.

Early on, the distributions are wide and uncertain, so sampling produces a lot of variety — that is the exploration. As data accumulates the distributions sharpen, the best variant gets drawn most often, and traffic concentrates on it — that is the exploitation. The balance is automatic; there is no exploration rate to tune, unlike epsilon-greedy.

Thompson Sampling vs. Other Allocation Strategies

	A/B Test	Epsilon-Greedy	Thompson Sampling
Traffic allocation	Fixed (even)	Mostly leader + random ε	Proportional to P(best)
Exploration	None (fixed)	Constant rate ε	Adaptive, decays naturally
Tuning required	Sample size	Choose ε	None
Handles uncertainty	No	Crudely	Explicitly (Bayesian)
Opportunity cost	High	Medium	Low

When to Use Thompson Sampling

You want to minimize the cost of testing — fewer visitors are sent to losing variants than in a fixed split.
You are running continuous optimization rather than a one-time experiment with a fixed endpoint.
You have three or more variants and don't want to manually manage exploration.
You value an approach with few knobs to tune — it works well out of the box.

Limitations

Less interpretable than a clean A/B test. Because allocation shifts over time, computing a single classical confidence interval at the end is not straightforward.
Sensitive to non-stationarity. If conversion rates drift (seasonality, a campaign ending), a naïve implementation can over-commit to a variant that was best in the past. Production systems add decay or windowing to stay adaptive.
Needs reasonable volume. With very little traffic the distributions stay wide and allocation stays noisy.

Thompson sampling is one of the workhorse algorithms behind autonomous optimization platforms, where the goal is to keep earning conversions while learning, rather than pausing to run discrete experiments.