How to Run Your First A/B Test

Running your first A/B test doesn't require a data science team or a specialized testing platform. It requires a clear hypothesis, a measurable goal, and enough traffic to reach a statistically valid result. This guide walks you through the process end to end.

Step 1: Pick the Right Page to Test

Not all pages are equal candidates for testing. Focus your first test on a page where:

Traffic is high enough — You'll need several hundred conversions per variant to reach statistical significance. Low-traffic pages take too long.
A conversion event is clearly defined — Sign-up clicks, purchase completions, form submissions. Not pageviews.
Impact is meaningful — A 1% lift on your pricing page is worth far more than a 10% lift on your blog sidebar.

Good starting points: landing pages, homepage hero sections, pricing pages, and checkout flows.

Step 2: Form a Specific Hypothesis

A good hypothesis has three parts: what you're changing, what outcome you expect, and why. This structure forces clarity and makes it easier to learn from the result — whether you win or lose.

Weak: "Let's try a different headline."

Strong: "Changing the hero headline from 'Grow Your Business' to 'Double Your Conversion Rate in 30 Days' will increase trial sign-ups because it's more specific and speaks to a measurable outcome."

Document your hypothesis before you build anything.

Step 3: Define Your Primary Metric

Pick one primary metric before the test runs. Choosing your metric after seeing results — also called p-hacking — invalidates the experiment.

Goal	Primary Metric
More sign-ups	Sign-up completion rate
More demo bookings	Demo form submission rate
More purchases	Add-to-cart or checkout rate
More engagement	Click-through rate on CTA

Secondary metrics (time on page, scroll depth) are fine to observe, but don't let them override your primary metric when declaring a winner.

Step 4: Calculate Your Required Sample Size

Running a test for too short a period is the most common A/B testing mistake. Use a sample size calculator — most testing platforms include one — before you start.

Inputs you'll need:

Baseline conversion rate — Your current rate before the test
Minimum detectable effect — How big a lift is meaningful? (Usually 10–20% relative improvement)
Statistical significance threshold — Aim for 95% confidence

A typical result: if your baseline is 3% and you want to detect a 20% relative lift (to 3.6%), you'll need roughly 8,000–10,000 visitors per variant.

Step 5: Build the Variant

Create your B variant with exactly one change from the hypothesis. Testing one variable at a time makes it clear what caused the result. If you change the headline, CTA, and button color simultaneously, you won't know which element made the difference.

If you want to test multiple elements at once, that's multivariate testing — a separate methodology that requires proportionally more traffic.

Step 6: Run the Test

Split incoming traffic randomly between A and B — ideally a 50/50 split. Let the test run until:

You've reached your pre-calculated sample size, and
You've run for at least one full business cycle (typically 1–2 weeks) to account for day-of-week variation

Do not stop the test early because it looks like B is winning. Peeking at results and stopping early inflates your false positive rate.

Step 7: Analyze and Document

Once you've hit sample size and duration, read the results:

Winner with 95%+ confidence — Ship the winning variant, document the insight
No statistically significant difference — The change didn't matter; document that too and move on
Negative result — The variant hurt conversions; document what you learned about your users

Every result — win, loss, or flat — teaches you something. The teams that compound the fastest are those with the highest-quality testing programs, not the most tests.

The Fastest Path to Results

Manual A/B testing is valuable, but it's inherently sequential — one test at a time, weeks per experiment. AI-driven platforms like Surface AI run continuous multivariate bandit experiments across your pages, automatically allocating traffic to better-performing variants and compressing months of manual testing into days.