Simpson's Paradox "proves" smoking is good for you

How do you prove that smoking is beneficial to your health? By employing Simpson's Paradox, of course. This paradox shows that a large grouping of data can be worth much less than the sum of its parts.

If I were at a tobacco company, and I wanted to prove that smoking was good for you, I would only have to do two things. First, I would have to wrap my soul in a paper bag, throw it to the ground, and stomp on it. Next, I would have to look at a study done in the UK in the early 1970s.

The study was meant to study how a number of different factors affected people's health. Among other things, it took a look at smoking, and whether it has any health affects. In particular, it looks at women and their survival rates over the next twenty years. Amazingly, forty-three percent of the nonsmokers died, whereas only thirty-eight percent of the smokers died. Clearly cigarettes saved their lives!

Or perhaps it was Simpson's paradox. Simpsons paradox is named after Edward Simpson, but was noted by many people. Sometimes there are clear trends in individual groups of data that disappear when the groups are pooled together. In this case, when the women were broken down by decade, each single group shows smokers had a higher mortality rate than nonsmokers. However, many more of the young women smoked than the older women. Although cigarette smoking increased mortality across the board, more young smokers than significantly older nonsmokers will live for the next twenty years. Add all the groups together and, although tobacco is bad for people, it won't take forty years off their lives and so in the aggregate appears beneficial.

Simpson's paradox, also known as the reversal paradox, works whenever an unacknowledged third factor is thrown into the mix. Sometimes that third factor is a difference in sample size between the many groups. Sometimes it's a factor, like age or general health, that affects the results more dramatically than the factor being tested. There are examples of the paradox from numerous medical studies, performance analysis, and gender bias cases. Sometimes the whole doesn't reflect its parts. It's the perfect statistical way of keeping people from seeing the forest for the trees.

[Via Ohio State University, American Journal of Epidemiology.]