Dear Science,

In the recent HIV-vaccine trial, the reports said the study showed that about 30 percent fewer people who got the vaccine got HIV, and that this difference was statistically significant. What does it mean for something to be "statistically significant"? Does that make it true?

Math Monster

Let's think about that HIV-vaccine study. We wanted to know if a combination of vaccines could prevent HIV infection. To answer the question, 16,000 people were recruited from the Thai population and divided into two groups; one half received the vaccination series, the other half received a placebo. Both groups were then followed for a period of five years, with new HIV infections recorded. Seventy-four of the placebo recipients were newly HIV positive versus 51 of the vaccine recipients, or 23 fewer people. With this raw information, we want to know if the vaccine reduced the risk of HIV infection.

The question boils down to this: Are the HIV infection rates between these two groups of 8,000 people really different? (Most medical research often boils down to asking if two groups are really different.) On the surface this seems simple. Twenty-three fewer people got HIV after being vaccinated! Surely the vaccine protected those 23 people. Of course it works.

Not so fast. Every measurement is subject to noise and error. If we repeated this study again, the numbers of newly infected with HIV in both groups would almost certainly be different. This noise can make the two groups look different even if they were the same. Before embarking on a study, a scientist is expected to pick some risk they're willing to accept of finding a false difference between the group; for almost all medical studies, that threshold is a 1 in 20 chance, or 5 percent of getting it wrong. Math—glorious, lovely, and wondrous mathematics—allows us to estimate this noisiness. Using these mathematical estimates of the range of the true counts, we can figure out if the difference between the groups we observed has less than a 5 percent chance of being due to noise alone. For this study, the risk that the difference between the groups was due to noise was just under five percent. Therefore, the study is statistically significant.

That's not to say it's true. We still have a 1 in 20 chance of getting it (honestly) wrong, detecting a difference that isn't actually there. Further, statistical significance does not consider the quality of the study—whether the groups initially picked really represent the general population, if the randomization was fairly done, and so on. Statistical significance is the starting place of a valid study. Calling something true takes a bit more thought.

Samplingly Yours,

Science

Send your science questions to dearscience@thestranger.com.