Sample Size & MDE Calculator
Figure out how many visitors you actually need before launching your next A/B test
Test Parameters
Results
🔄 Reverse Mode: What MDE Can I Detect?
Already know your traffic and timeline? Find out the smallest effect you can reliably detect.
What Is Sample Size in A/B Testing?
Sample size is the number of visitors each variant of your test needs to see before you can trust the results. Run a test too short, and you might celebrate a “winner” that was just random noise. Run it too long, and you burn time and traffic that could have been optimized.
For e-commerce, getting this right is especially important. Most online stores have conversion rates between 1% and 5%, which means you need more data to detect meaningful lifts than, say, an email signup form converting at 30%.
Significance Level (α)
The probability of declaring a winner when there isn’t one. At 95% confidence, α = 5%. In other words, a 1-in-20 chance of a false positive.
Statistical Power (1 − β)
The probability of detecting a real effect when it exists. 80% power means a 20% chance of missing a genuine winner. For high-stakes tests, use 90%.
Minimum Detectable Effect
The smallest relative change worth finding. A 10% MDE on a 3% baseline means you’re looking for a jump from 3.0% to 3.3%. Smaller MDE = bigger sample needed.
Baseline Conversion Rate
Your current metric (CR, ARPU, add-to-cart rate). Lower baselines require proportionally larger samples because the signal-to-noise ratio is weaker.
The Formula Behind the Calculator
This calculator uses the standard two-proportion z-test formula for sample size determination:
n = ( Zα ⋅ √(2p̄(1−p̄)) + Zβ ⋅ √(p1(1−p1) + p2(1−p2)) )2 / (p2 − p1)2
Where p1 is the baseline rate, p2 is the expected variant rate, p̄ is the pooled rate, and Z values come from the standard normal distribution.
E-Commerce Benchmarks
| Metric | Typical Range | Realistic MDE | Why It Matters |
|---|---|---|---|
Purchase CR |
1%–5% | 5%–15% relative | Primary revenue metric. Low base = large samples. |
Add-to-Cart Rate |
5%–15% | 3%–10% relative | Higher base rate, easier to detect changes. |
Checkout Start Rate |
30%–60% | 2%–5% relative | High volume funnel step — tests run faster. |
Revenue per Visitor |
$1–$10 | 5%–20% relative | High variance. Needs bigger samples than CR tests. |
Email Signup CR |
2%–8% | 10%–20% relative | Micro-conversion. Good for faster iteration. |
Best Practices
- Decide sample size before the test starts
- Use your actual conversion rate from the last 30 days
- Account for weekly cycles — run in full-week increments
- Set a realistic MDE (5-15% relative for e-commerce)
- Use 95% confidence and 80% power as defaults
- Apply Bonferroni correction for 3+ variants
- Stop the test early because results “look significant”
- Use site-wide CR if testing on a specific segment
- Expect to detect 1-2% relative lifts on purchase CR
- Ignore seasonality — Black Friday traffic ≠ February traffic
- Run tests shorter than 7 days regardless of sample size
- Change the test design (traffic split, goals) mid-flight
Frequently Asked Questions
For purchase conversion rate, 10-15% relative is realistic. That means if your CR is 3%, you’re aiming to detect a jump to ~3.3%-3.45%. Smaller effects (5%) are real but require massive sample sizes. If you have limited traffic, consider testing higher-funnel metrics like add-to-cart rate where you’ll have more volume.
A/B tests typically randomize at the visitor (cookie/user ID) level, not the session level. One visitor may create multiple sessions during the test, but they should always see the same variant. Use unique visitors per day for the most accurate duration estimate.
Two-tailed is the industry standard and the safer choice. It detects both positive and negative effects. One-tailed gives you more power (smaller sample) but only looks for improvement — you’d miss it if your variant actually hurts conversion. Use one-tailed only when a negative result would lead to the same decision as no result.
This means the true effect is likely smaller than your MDE — or there’s no effect at all. That’s a valid result. Don’t extend the test hoping for significance (that inflates false positives). Instead, log the result, archive the variant, and move on to a higher-impact hypothesis.
More variants means more comparisons and a higher risk of false positives. The calculator applies Bonferroni correction, dividing your significance level by the number of comparisons. With 3 variants (A/B/C), the per-comparison alpha goes from 5% to ~1.67%, which increases the required sample per variant by roughly 30-40%.
This calculator is designed for proportion-based metrics (conversion rates). Revenue metrics have continuous distributions with higher variance, so they typically need 2-5x larger samples. For ARPU tests, use a dedicated continuous-metric calculator or add a 2-3x multiplier to the result here as a rough estimate.