Create A/B tests by chatting with AI and launch them on your website within minutes.

Try it for FREE now

Simpson's Paradox

Simpson’s Paradox is a statistical phenomenon in which a trend appears in several different groups of data but disappears or reverses when those groups are combined. It occurs when the relationship between variables is influenced by a hidden or confounding variable that changes the overall outcome when data is aggregated.

Meaning & Context

Simpson’s Paradox often arises in real-world datasets where variables are not independent—for example, when different groups have uneven sample sizes, or when a third variable affects both the grouping and the outcome. When data is viewed at the aggregate level, the confounding variable can distort the overall trend, producing a conclusion that contradicts the subgroup trends.
This phenomenon is especially relevant in product analytics, experimentation, medical research, social science, and any field where decisions rely on segmented vs. aggregated data.

Why It Matters

Simpson’s Paradox can lead teams to draw incorrect conclusions from experiments or analyses if they rely solely on aggregated results.
In A/B testing, it may cause a variant to appear to win overall while losing in every major segment—or vice versa—if underlying user distributions shift or if segments behave differently.
Recognizing and checking for Simpson’s Paradox is essential for:

  • Validating experiment integrity
  • Avoiding misleading business decisions
  • Ensuring segment-weight imbalances don’t distort results
  • Understanding why a variant performs differently across user groups

This awareness helps teams distinguish true causal effects from artifacts of data aggregation.

Example

A product team runs an A/B test on a new onboarding flow.

  • In every individual user segment (new users, returning users, mobile users), Variant B shows a higher completion rate.
  • But in the aggregated data, Variant A appears to win.

Upon investigation, the team discovers that one segment with significantly lower completion rates is overrepresented in Variant B due to random imbalance. When weighted properly, the paradox disappears—confirming that Variant B is actually better.

This example illustrates how Simpson’s Paradox can obscure true performance unless segmented analysis is performed alongside aggregate metrics.

Download our free 100 point Ecommerce CRO Checklist

This comprehensive checklist covers all critical pages, from homepage to checkout, giving you actionable steps to boost sales and revenue.