Sample Size & MDE Calculator for A/B Testing

Sample Size & MDE Calculator

Figure out how many visitors you actually need before launching your next A/B test

Test Parameters

Baseline Conversion Rate (%)

Your current conversion rate. Check your analytics.

Minimum Detectable Effect (%)

Smallest relative lift worth detecting. Smaller = more traffic needed.

Confidence Level

Statistical Power

Number of Variants

Test Type

Daily Visitors (optional)

To estimate test duration. Leave blank to skip.

Results

Sample Size Per Variant

—

Enter your parameters

Total Visitors Required

—

Across all variants (incl. control)

Expected Variant Rate

—

Absolute Difference

—

🔄 Reverse Mode: What MDE Can I Detect?

Already know your traffic and timeline? Find out the smallest effect you can reliably detect.

Baseline CR (%)

Available Sample Per Variant

Result

Detectable MDE

—

What Is Sample Size in A/B Testing?

Sample size is the number of visitors each variant of your test needs to see before you can trust the results. Run a test too short, and you might celebrate a “winner” that was just random noise. Run it too long, and you burn time and traffic that could have been optimized.

For e-commerce, getting this right is especially important. Most online stores have conversion rates between 1% and 5%, which means you need more data to detect meaningful lifts than, say, an email signup form converting at 30%.

Significance Level (α)

The probability of declaring a winner when there isn’t one. At 95% confidence, α = 5%. In other words, a 1-in-20 chance of a false positive.

Statistical Power (1 − β)

The probability of detecting a real effect when it exists. 80% power means a 20% chance of missing a genuine winner. For high-stakes tests, use 90%.

Minimum Detectable Effect

The smallest relative change worth finding. A 10% MDE on a 3% baseline means you’re looking for a jump from 3.0% to 3.3%. Smaller MDE = bigger sample needed.

📊

Baseline Conversion Rate

Your current metric (CR, ARPU, add-to-cart rate). Lower baselines require proportionally larger samples because the signal-to-noise ratio is weaker.

The Formula Behind the Calculator

This calculator uses the standard two-proportion z-test formula for sample size determination:

Sample Size Per Variant n = ( Z_α ⋅ √(2p̄(1−p̄)) + Z_β ⋅ √(p₁(1−p₁) + p₂(1−p₂)) )² / (p₂ − p₁)²

Where p₁ is the baseline rate, p₂ is the expected variant rate, p̄ is the pooled rate, and Z values come from the standard normal distribution.

E-Commerce Benchmarks

Metric	Typical Range	Realistic MDE	Why It Matters
`Purchase CR`	1%–5%	5%–15% relative	Primary revenue metric. Low base = large samples.
`Add-to-Cart Rate`	5%–15%	3%–10% relative	Higher base rate, easier to detect changes.
`Checkout Start Rate`	30%–60%	2%–5% relative	High volume funnel step — tests run faster.
`Revenue per Visitor`	$1–$10	5%–20% relative	High variance. Needs bigger samples than CR tests.
`Email Signup CR`	2%–8%	10%–20% relative	Micro-conversion. Good for faster iteration.

Best Practices

✓ DO

Decide sample size before the test starts
Use your actual conversion rate from the last 30 days
Account for weekly cycles — run in full-week increments
Set a realistic MDE (5-15% relative for e-commerce)
Use 95% confidence and 80% power as defaults
Apply Bonferroni correction for 3+ variants

✕ DON’T

Stop the test early because results “look significant”
Use site-wide CR if testing on a specific segment
Expect to detect 1-2% relative lifts on purchase CR
Ignore seasonality — Black Friday traffic ≠ February traffic
Run tests shorter than 7 days regardless of sample size
Change the test design (traffic split, goals) mid-flight

Frequently Asked Questions

For purchase conversion rate, 10-15% relative is realistic. That means if your CR is 3%, you’re aiming to detect a jump to ~3.3%-3.45%. Smaller effects (5%) are real but require massive sample sizes. If you have limited traffic, consider testing higher-funnel metrics like add-to-cart rate where you’ll have more volume.

A/B tests typically randomize at the visitor (cookie/user ID) level, not the session level. One visitor may create multiple sessions during the test, but they should always see the same variant. Use unique visitors per day for the most accurate duration estimate.

Two-tailed is the industry standard and the safer choice. It detects both positive and negative effects. One-tailed gives you more power (smaller sample) but only looks for improvement — you’d miss it if your variant actually hurts conversion. Use one-tailed only when a negative result would lead to the same decision as no result.

This means the true effect is likely smaller than your MDE — or there’s no effect at all. That’s a valid result. Don’t extend the test hoping for significance (that inflates false positives). Instead, log the result, archive the variant, and move on to a higher-impact hypothesis.

More variants means more comparisons and a higher risk of false positives. The calculator applies Bonferroni correction, dividing your significance level by the number of comparisons. With 3 variants (A/B/C), the per-comparison alpha goes from 5% to ~1.67%, which increases the required sample per variant by roughly 30-40%.

This calculator is designed for proportion-based metrics (conversion rates). Revenue metrics have continuous distributions with higher variance, so they typically need 2-5x larger samples. For ARPU tests, use a dedicated continuous-metric calculator or add a 2-3x multiplier to the result here as a rough estimate.