Sample Size & MDE Calculator

Sample Size & MDE Calculator

Figure out how many visitors you actually need before launching your next A/B test

Not sure where to start? Load a realistic e-commerce example to see how the calculator works.

Test Parameters

Your current conversion rate. Check your analytics.
Smallest relative lift worth detecting. Smaller = more traffic needed.
To estimate test duration. Leave blank to skip.

Results

Sample Size Per Variant
Enter your parameters
Total Visitors Required
Across all variants (incl. control)
Expected Variant Rate
Absolute Difference

🔄 Reverse Mode: What MDE Can I Detect?

Already know your traffic and timeline? Find out the smallest effect you can reliably detect.

Detectable MDE

What Is Sample Size in A/B Testing?

Sample size is the number of visitors each variant of your test needs to see before you can trust the results. Run a test too short, and you might celebrate a “winner” that was just random noise. Run it too long, and you burn time and traffic that could have been optimized.

For e-commerce, getting this right is especially important. Most online stores have conversion rates between 1% and 5%, which means you need more data to detect meaningful lifts than, say, an email signup form converting at 30%.

α

Significance Level (α)

The probability of declaring a winner when there isn’t one. At 95% confidence, α = 5%. In other words, a 1-in-20 chance of a false positive.

β

Statistical Power (1 − β)

The probability of detecting a real effect when it exists. 80% power means a 20% chance of missing a genuine winner. For high-stakes tests, use 90%.

Δ

Minimum Detectable Effect

The smallest relative change worth finding. A 10% MDE on a 3% baseline means you’re looking for a jump from 3.0% to 3.3%. Smaller MDE = bigger sample needed.

📊

Baseline Conversion Rate

Your current metric (CR, ARPU, add-to-cart rate). Lower baselines require proportionally larger samples because the signal-to-noise ratio is weaker.

The Formula Behind the Calculator

This calculator uses the standard two-proportion z-test formula for sample size determination:

Sample Size Per Variant n = ( Zα ⋅ √(2p̄(1−p̄)) + Zβ ⋅ √(p1(1−p1) + p2(1−p2)) )2 / (p2 − p1)2

Where p1 is the baseline rate, p2 is the expected variant rate, p̄ is the pooled rate, and Z values come from the standard normal distribution.

E-Commerce Benchmarks

Metric Typical Range Realistic MDE Why It Matters
Purchase CR 1%–5% 5%–15% relative Primary revenue metric. Low base = large samples.
Add-to-Cart Rate 5%–15% 3%–10% relative Higher base rate, easier to detect changes.
Checkout Start Rate 30%–60% 2%–5% relative High volume funnel step — tests run faster.
Revenue per Visitor $1–$10 5%–20% relative High variance. Needs bigger samples than CR tests.
Email Signup CR 2%–8% 10%–20% relative Micro-conversion. Good for faster iteration.

Best Practices

✓ DO
  • Decide sample size before the test starts
  • Use your actual conversion rate from the last 30 days
  • Account for weekly cycles — run in full-week increments
  • Set a realistic MDE (5-15% relative for e-commerce)
  • Use 95% confidence and 80% power as defaults
  • Apply Bonferroni correction for 3+ variants
✕ DON’T
  • Stop the test early because results “look significant”
  • Use site-wide CR if testing on a specific segment
  • Expect to detect 1-2% relative lifts on purchase CR
  • Ignore seasonality — Black Friday traffic ≠ February traffic
  • Run tests shorter than 7 days regardless of sample size
  • Change the test design (traffic split, goals) mid-flight

Frequently Asked Questions

For purchase conversion rate, 10-15% relative is realistic. That means if your CR is 3%, you’re aiming to detect a jump to ~3.3%-3.45%. Smaller effects (5%) are real but require massive sample sizes. If you have limited traffic, consider testing higher-funnel metrics like add-to-cart rate where you’ll have more volume.

A/B tests typically randomize at the visitor (cookie/user ID) level, not the session level. One visitor may create multiple sessions during the test, but they should always see the same variant. Use unique visitors per day for the most accurate duration estimate.

Two-tailed is the industry standard and the safer choice. It detects both positive and negative effects. One-tailed gives you more power (smaller sample) but only looks for improvement — you’d miss it if your variant actually hurts conversion. Use one-tailed only when a negative result would lead to the same decision as no result.

This means the true effect is likely smaller than your MDE — or there’s no effect at all. That’s a valid result. Don’t extend the test hoping for significance (that inflates false positives). Instead, log the result, archive the variant, and move on to a higher-impact hypothesis.

More variants means more comparisons and a higher risk of false positives. The calculator applies Bonferroni correction, dividing your significance level by the number of comparisons. With 3 variants (A/B/C), the per-comparison alpha goes from 5% to ~1.67%, which increases the required sample per variant by roughly 30-40%.

This calculator is designed for proportion-based metrics (conversion rates). Revenue metrics have continuous distributions with higher variance, so they typically need 2-5x larger samples. For ARPU tests, use a dedicated continuous-metric calculator or add a 2-3x multiplier to the result here as a rough estimate.