## 字幕表 動画を再生する

• In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger.

• Their study involved real people, truthfully reported data, and commonplace statistical analyses.

• So how did they do it?

• The answer lies in a statistical method scientists often use to try to figure out whether their results mean something or if they're random noise.

• In fact, the whole point of the music study was to point out ways this method can be misused.

• A famous thought experiment explains the method: there are eight cups of tea, four with the milk added first, and four with the tea added first. A participant must determine which are which according to taste.

• There are 70 different ways the cups can be sorted into two groups of four, and only one is correct. So, can she taste the difference? That's our research question.

• To analyze her choices, we define what's called a null hypothesis: that she can't distinguish the teas.

• If she can't distinguish the teas, she'll still get the right answer 1 in 70 times by chance. 1 in 70 is roughly .014. That single number is called a p-value.

• In many fields, a p-value of .05 or below is considered statistically significant, meaning there's enough evidence to reject the null hypothesis.

• Based on a p-value of .014, they'd rule out the null hypothesis that she can't distinguish the teas.

• Though p-values are commonly used by both researchers and journals to evaluate scientific results, they're really confusing, even for many scientists.

• That's partly because all a p-value actually tells us is the probability of getting a certain result, assuming the null hypothesis is true.

• So if she correctly sorts the teas, the p-value is the probability of her doing so assuming she can't tell the difference.

• But the reverse isn't true: the p-value doesn't tell us the probability that she can taste the difference, which is what we're trying to find out.

• So if a p-value doesn't answer the research question, why does the scientific community use it?

• Well, because even though a p-value doesn't directly state the probability that the results are due to random chance, it usually gives a pretty reliable indication.

• At least, it does when used correctly. And that's where many researchers, and even whole fields, have run into trouble.

• Most real studies are more complex than the tea experiment. Scientists can test their research question in multiple ways, and some of these tests might produce a statistically significant result, while others don't.

• It might seem like a good idea to test every possibility. But it's not, because with each additional test, the chance of a false positive increases.

• Searching for a low p-value, and then presenting only that analysis, is often called p-hacking.

• It's like throwing darts until you hit a bullseye and then saying you only threw the dart that hit the bull's eye. This is exactly what the music researchers did.

• They played three groups of participants each a different song and collected lots of information about them.

• The analysis they published included only two out of the three groups.

• Of all the information they collected, their analysis only used participants' fathers' agetocontrol for variation in baseline age across participants.”

• They also paused their experiment after every ten participants, and continued if the p-value was above .05, but stopped when it dipped below .05.

• They found that participants who heard one song were 1.5 years younger than those who heard the other song, with a p-value of .04.

• Usually it's much tougher to spot p-hacking, because we don't know the results are impossible: the whole point of doing experiments is to learn something new.

• Fortunately, there's a simple way to make p-values more reliable: pre-registering a detailed plan for the experiment and analysis beforehand that others can check, so researchers can't keep trying different analyses until they find a significant result.

• And, in the true spirit of scientific inquiry, there's even a new field that's basically science doing science on itself: studying scientific practices in order to improve them.

• This new field has emerged in response to a crisis in science, and p-hacking is just one part of that crisis. So, what's going on? And can we fix it? Learn more with this video.

In 2011, a group of researchers conducted a scientific study to find an impossible result: that listening to certain songs can make you younger.

B1 中級

# The method that can "prove" almost anything - James A. Smith

• 2198 135
Minjane に公開 2021 年 08 月 09 日