Simpson’s Paradox is a statistical phenomenon in which a trend appears in several different groups of data but disappears or reverses when those groups are combined. It occurs when the relationship between variables is influenced by a hidden or confounding variable that changes the overall outcome when data is aggregated.
Simpson’s Paradox often arises in real-world datasets where variables are not independent—for example, when different groups have uneven sample sizes, or when a third variable affects both the grouping and the outcome. When data is viewed at the aggregate level, the confounding variable can distort the overall trend, producing a conclusion that contradicts the subgroup trends.
This phenomenon is especially relevant in product analytics, experimentation, medical research, social science, and any field where decisions rely on segmented vs. aggregated data.
Simpson’s Paradox can lead teams to draw incorrect conclusions from experiments or analyses if they rely solely on aggregated results.
In A/B testing, it may cause a variant to appear to win overall while losing in every major segment—or vice versa—if underlying user distributions shift or if segments behave differently.
Recognizing and checking for Simpson’s Paradox is essential for:
This awareness helps teams distinguish true causal effects from artifacts of data aggregation.
A product team runs an A/B test on a new onboarding flow.
Upon investigation, the team discovers that one segment with significantly lower completion rates is overrepresented in Variant B due to random imbalance. When weighted properly, the paradox disappears—confirming that Variant B is actually better.
This example illustrates how Simpson’s Paradox can obscure true performance unless segmented analysis is performed alongside aggregate metrics.
This comprehensive checklist covers all critical pages, from homepage to checkout, giving you actionable steps to boost sales and revenue.