字幕表 動画を再生する 英語字幕をプリント In 2012, a researcher named Glenn Begley published a commentary in the journal Nature. He said that during his decade as the head of cancer research for Amgen -- an American pharmaceutical company -- he’d tried to reproduce the results of 53 so-called “landmark” cancer studies. But his team wasn’t able to replicate 47 out of those 53 studies. That means that /88 percent/ of these really important cancer studies couldn’t be reproduced. Then, in August 2015, a psychologist named Brian Nosek published /another/ paper, this time in the journal Science. Over the course of the previous three years, he said, he’d organized repeats of 100 psychological studies. 97 of the original experiments had reported statistically significant results -- that is, results that were most likely caused by variables in the experiment, and not some coincidence. But when his team tried to reproduce those results, they only got /36/ significant results -- barely a third of what the original studies had found. It seems like every few months, there’s some kind of news about problems with the scientific publishing industry. A lot of people -- both scientists and science enthusiasts -- are concerned. Why does this keep happening? And what can be done to fix the system? [intro] Right now, the scientific publishing industry is going through what’s being called a Replication, or Reproducibility Crisis. Researchers are repeating earlier studies, trying to reproduce the experiments as closely as possible. One group might publish findings that look promising, and other groups might use those results to develop their own experiments. But if the original study was wrong, that’s a whole lot of time and money right down the drain. In theory, these repeat studies should be finding the same results as the original experiments. If a cancer drug worked in one study, and a separate group repeats that study under the same conditions, the cancer drug should still work. But that’s not what happened with Begley’s cancer studies. And researchers in other fields have been having the same problem. They’re repeating earlier studies, and they aren’t getting the same results. So why are these inaccurate results getting published in the first place? Well, sometimes people really are just making things up, but that’s relatively rare. Usually, it has to do with misleading research tools, the way a study is designed, or the way data are interpreted. Take an example from the world of biomedicine. Researchers who work with proteins will often use antibodies to help them with their research. You might know antibodies as a part of your immune system that helps target outside invaders, but in scientific research, antibodies can be used to target specific proteins. And lately, there’s been a lot of evidence that these antibodies aren’t as reliable as scientists have been led to believe. Companies produce these antibodies so researchers can buy them, and they’ll say in their catalog which antibodies go with which proteins. The problem is, those labels aren’t always right. And if researchers don’t check to make sure that their antibody works the way it should, they can misinterpret their results. One analysis, published in 2011 in a journal called Nature Structural & Molecular Biology, tested 246 of these antibodies, each of which was said to only bind with one particular protein. But it turned out that about a quarter of them actually targeted more than one protein. And four of /those/ … actually targeted the wrong kind of protein. Researchers were using this stuff to detect proteins in their experiments, but the antibodies could have been binding with a completely different materials -- creating false positives and therefore, flawed results. That’s exactly what happened to researchers at Mount Sinai Hospital in Toronto. They wasted two years and half a million dollars using an antibody to look for a specific protein that they thought might be connected to pancreatic cancer. Then they figured out that the whole time, the antibody had actually been binding to a /different/ cancer protein, and didn’t even target the protein they were looking for. So the antibody-production industry is having some quality control problems, and it’s affecting a lot of biomedical research. Some companies have already taken steps to try and ensure quality -- one reviewed its entire catalogue in 2014 and cut about a third of the antibodies it had been offering. Now, researchers /could/ try testing the antibodies themselves, to make sure they only bind to the protein they’re supposed to. But that’s like conducting a whole separate study before they even get to start on the main project. Most research groups don’t have the time or money to do that. But now that scientists are aware of the issue, they can at least be more careful about where they get their antibodies. Having accurate tools for research isn’t enough, though. Part of the reproducibility crisis also has to do with how experiments are designed. This can be a problem in all kinds of different fields, but it’s especially an issue for psychology, where results often depend on human experience, which can be very subjective. Experiments are supposed to be designed to control for as many external factors as possible, so that you can tell if your experiment is actually what’s leading to the effect. But in psychology, you can’t really control for all possible external factors. Too many of them just have to do with the fact that humans are human. One classic experiment, for example, showed that when when people had been exposed to words related to aging, they walked more slowly. Another research group tried to replicate that study and failed, but that doesn’t necessarily prove or disprove the effect. It’s possible that the replication study exposed the subjects to too /many/ aging-related words, which might have ruined the subconscious effect. Factors that weren’t directly related to the study could have also affected the results -- like what color the room was, or the day of the week. When such tiny differences can change the results of a study, it’s not too surprising that when they reviewed those 100 psychology papers, Nosek’s research group was only able to replicate 36 out of 97 successful studies. But it also makes the results of the original studies pretty weak. At the very least, being able to replicate a study can show the strength of its results, so some scientists have been calling for more replication to be done -- in lots of fields, but especially in psychology. Because it can just be hard to determine the strength of the results of psychological experiments. The fact that journals are so selective about what they publish is another reason the results of a study might turn out to be false. A lot of the time, researchers are pressured to make their findings look as strong as possible. When you publish papers, you get funding to do more research, and the more grant money you bring in, the more likely it is that the academic institution sponsoring you will want to keep you. The problem is, journals are MUCH more likely to publish positive results than negative ones. Say you’re a biologist, and you spend three months working on a potential cancer drug. If after three months, you get positive results -- suggesting the drug worked -- then a journal will probably want to publish those results. But if you end up with negative results -- the drug didn’t work -- that’s just not as interesting or exciting. So negative results almost always go unpublished -- which means that there’s a lot of pressure on researchers to conduct experiments that /do/ have positive results. For example, Begley -- the biologist who led the cancer replication studies at Amgen -- tried an experiment 50 times and hadn’t been able to reproduce the original results. The lead researcher on the original study told Begley that /his/ team had tried the experiment 6 times, and gotten positive results once. So that’s what they published -- that one positive result. There’s so much pressure to publish significant findings that researchers might not always include all of their data in their analysis. To get around this problem, some experts have suggested creating a new standard, where researchers include a line in their papers saying that they’ve reported any excluded data and all aspects of their analysis. If including that statement became standard practice, then if a paper /didn’t/ have that line, that would be a red flag. But even if researchers /do/ include all their data, they might just be doing the analysis wrong. Because: data analysis involves math. Sometimes a lot of math. But researchers in a lot of fields -- like psychology and biology -- aren’t necessarily trained in all that math. You don’t always need to take courses in advanced statistical methods to get a degree in biology. So, sometimes, the data analysis that researchers do is just … wrong. And the peer reviewers don’t always catch it, because they haven’t been trained in those methods, either. Then … there are p-values. The term p-value is short for probability value, and it’s often used as a kind of shorthand for the significance of scientific findings. To calculate a p-value, you first assume the /opposite/ of what you want to prove. So if you were testing a cancer drug, for instance, and you found that it kills cancer cells. To calculate the p-value for your study, you’d start by assuming that the drug /doesn’t/ kill cancer cells. Then, you’d calculate the odds that the cancer cells would die anyway. That would be your p-value. In other words, a p-value tells you the probability that the results of an experiment were a total coincidence. So for that cancer drug you’re testing, a p-value of less than .01 would mean that there’s a less than 1% chance the cancer cells would die even if the drug didn’t kill cancer cells. Usually, the standard for whether results are worth publishing is a p-value of less than .05 -- which would translate to less than a 5% chance of the cancer cells dying by coincidence. 5% is a 1 in 20 chance, which is pretty low. And there are /lots/ of studies that get published with p-values just under .05. Odds are that for at least a few of them, the results will be a coincidence -- and the findings will be wrong. That’s why a lot of people argue that p-values aren’t a good metric for whether results are significant. Instead, they suggest placing more emphasis on things like effect size, which tells you more than just whether an experiment produced some kind of change. It tells you how /big/ the change was. They also suggest more sharing of data -- including unpublished data -- something that’s gradually becoming more popular and accepted. So, yes -- there is a replication crisis, and it’s been highlighting a lot of problems with the scientific research and publication process. But scientists are also doing their best to solve it. Thanks for watching this episode of SciShow, which was brought to you by our patrons on Patreon. If you want to help support this show, just go to patreon.com/scishow. And don’t forget to go to youtube.com/scishow and subscribe!