## 字幕表 動画を再生する

• Hi, I'm Adriene Hill, and Welcome back to Crash Course, Statistics.

• To recap from last time, P-values tell us howraresomething is.

• So far, we've been using that information to decide whether or not our hypotheses are

• reasonable, and using P-values to reject or fail to reject an idea.

• Today, we're going to explore p-values a little more and talk about the logic of p-values

• and some of the problems that come up.

• INTRO

• Remember, to calculate a p-value, we first assume that the null distribution is the true

• distribution our sample was taken from.

• Then we calculate how often we'd see a value that is at least as extreme as our observed value.

• So in probability terms, the p-value is the probability of getting a sample as or more

• extreme than ours, given that the null hypothesis is true:

• So all the values that we see in the sampling distribution are means we could actually get

• if the null hypothesis was true.

• For example, let's say the average cat weigh 10lbs (or 4.5 kg).

• We might want to calculate the probability of getting a group of 30 randomly selected

• calico cats who have an average weight of 11 lbs (or 5 kg) if calico cats have the same

• average weight as the whole population of cats.

• The first issue is if, in real life, there is no connection between two things like fur

• color and weight --we still might get samples of calicos, mackerel tabbies, or tortoise

• shells that are different enough to cause us torejectthe null hypothesis that

• there is no difference.

• Our alpha tells us how often this will happen.

• Let's say our hypothesis is that the reaction time of older professional chess players is

• different from the reaction time of the general population of professional chess players.

• Even if older chess players are the same as their colleagues, if we ran this study over

• and over, we'd expect that 5% of the time, we'd mistakenly reject the null if it were true.

• This is one reason why p-values are pretty controversial in the statistical community right now.

• Not everyone agrees that a p-value less than 0.05 is sufficient evidence to reject the

• null hypothesis.

• In fact, some studies that look at incredibly important things like new medications, have

• already decided that an alpha of 0.05 isn't low enough.

• They want p-values lower than 0.01 so that if the null hypothesis is true, they'll

• only mistakenly reject it 1% of the time.

• Still others argue that 0.005 is the better cutoff.

• As you can see, the standard cutoff is arbitrary.

• Null Hypothesis Significance Testing requires that we draw a line in the sand somewhere,

• but it isn't clear where.

• Arguments have been made that we can have different p-value cutoffs--our alphas--depending

• on the situation, and that scientists should be allowed to justify their reasons for picking

• a certain cutoff.

• But on the whole, many fields that regularly use p-values have some sort ofofficial

• cutoff that they use.

• The second, related issue is that a p-value tells you howextremeyour data would

• be if you assume the null hypothesis is true.

• But when you really think about it...that's not what we want to know.

• We want to know whether the null is correct, or at least probably correct.

• In other words, the probability of the null, given that we've seen our data.

• A p-value of 0.02 in a study on cancer rates in mice tells you that if your new drug didn't

• work and there was no difference between the cancer rates of mice on and off the drug,

• then you'd only expect 2% of identically run studies to produce a difference in cancer

• rates that's as or more extreme than the one you just observed.

• But we can't use these p-values alone to tell us about the probability of the null

• being true or false, even though it can be tempting to think we can.

• One common misinterpretation of a p-value is that it can tell you the probability that

• the null hypothesis is true.

• For example, if a random sample of tuna has a 10% higher mercury content than a random

• sample of mahi-mahi, it would be incorrect to say that a p-value of 0.02 in this case

• means there's only a 2% chance that the null hypothesis is true.

• This is an especially tempting misinterpretation because it feels like it maybe should be true,

• but again, when we calculate our p-value, we've already assumed for a moment that

• the null hypothesis is true and that any sample differences we see are actually due to just

• random sampling variation.

• If our p-value for the chess study was 0.01, that means that we already assumed older chess

• players were the same as the general population of chess players, so 0.01 can't tell us

• much about the probability that older chess players are the same as their colleagues.

• That would be like sayingassuming that grass is green, what's the probability that

• grass is green?”

• It just doesn't make much sense.

• Similarly, p-values can't tell you the probability that you've made an error, given that you

• rejected the null.

• Again, this is because p-values don't tell you about the probability of the null being

• true or false.

• If you've rejected the null hypothesis--like that drinking orange juice is not associated

• with higher levels of cavities than drinking coffee--either you did so correctly, because

• there really is a difference between cavities in OJ and coffee drinkers, or you did so mistakenly

• because there really is no discernible difference.

• But p-values--since they assume the null is true--don't tell you how likely either of

• these options is.

• Ronald Fisher--one of the first proponents of Null Hypothesis Significance Testing wrote

• that: “ In general tests of significance are based on hypothetical probabilities calculated

• from their null hypotheses.

• They do not generally lead to any probability statements about the real world, but to a

• rational and well-defined measure of reluctance to the acceptance of the hypotheses they test."

• In other words, getting a p-value of 0.04 doesn't mean that there's a 4% chance

• that the null hypothesis is true.

• The probability we want to know is the opposite conditional probability from what a p-value

• gives you.

• We want to know the probability of the null hypothesis given that we got this data.

• But that's not what we get.

• From the p-value we get the Probability of the data given the null.

• For example, we calculate P(data

• |older chess players are the same as population of chess players ) but we wish we could calculate

• P(older chess players are the same as population of chess players | data).

• And while all the same pieces are there, they're not the same.

• This is made even more clear when you realize the probability of being a child, given that

• you're at Chuck E Cheese is NOT the same as the probability of being at Chuck E Cheese,

• given that you're a child.

• This is one reason why p-values are so perplexing.

• They don't give us the probability that we truly want.

• There are some statistical methods that will give you the probability of a hypothesis given

• the data, and we'll talk about those later.

• A third issue is that if you reject the null, you still don't have much information about

• the alternative.

• When the data is pretty improbable under the null hypothesis, we reject the null and accept

• the hypothesis that the data came from another distribution that is not the null distribution.

• We call this the alternative distribution, and the hypothesis that goes with it, the

• alternative hypothesis.

• If we reject the null that Mrs. Smith and Mr. Kennedy give the same amount of homework

• each week, then the alternative is that they don't give the same amount each week.

• But, we don't know whether the difference is by 30 minutes, 25 minutes...45 minutes.

• Or, for example,we might want to know whether people who were primed with the wordsElderly,

• Florida, and Retiredwalked more slowly than the average person who takes 10 minutes

• to go around our office building, with a standard deviation of 1 minute.

• We think they will.

• We take a sample of 50 people, primed them, and set them off.

• Their mean time is 10.5 minutes, which corresponds to a p-value of 0.00036.

• We already decided beforehand to make our alpha (or predetermined cutoff) 0.005.

• So our p-value which is less than 0.005 allows us to reject the null hypothesis...in this

• case that the people primed with words about being old take a mean of 10 minutes to walk

• around the building.

• But what now?

• While we've rejected the null hypothesis that the primed subjects take a mean of 10 minutes.

• The alternative hypothesis is just that their mean isn't 10.

• Our p-values can't tell us anything else.

• A fourth common issue for p-values is more about how we interpretnon-significant

• p-values.

• If our p-value isn't lower than our predetermined cutoff, our alpha, wefail to reject

• the null hypothesis.

• Notice that we say fail to reject, not accept.

• Null hypothesis testing doesn't allow us toacceptor provide evidence that the

• null is true, instead we've only failed to provide evidence that it's false.

• Consider this: Your best friend makes the statement, “there are no black swans in China".

• You think she's wrong, so you go to China and you look at a bunch of swans, and none

• of them are black.

• You may, at a certain point, decide that you've seen SO many swans that if there were black

• swans in China, it's unlikely that you wouldn't have seen one yet.

• But you can't PROVE there are no black swans until you've seen EVERY.SINGLE.SWAN.

• Just like you can't prove the null is true--that there's no relationship between two variables,

• you can only show that you didn't find any evidence it's false.

• The absence of evidence is not the evidence of absence.

• failing to rejectthe null hypothesis doesn't mean that there isn't an effect

• or relationship, it just means we didn't get enough evidence to say there definitely is one.

• If we looked whether bees produce more honey when it's warm than when it's cold, we

• could look at some data and calculate a p-value of 0.25.

• Since we decided beforehand that our alpha would be 0.01, we fail to reject the null

• hypothesis that bees produce the same amount of honey in hot and cold seasons.

• But we can't conclude that there is no difference or even that it's unlikely that there's a difference.

• We can only conclude that we didn't find any evidence of one.

• Since null hypothesis significance testing is often the first type of statistical inference

• that people learn, it can seem pretty limiting to know that you can't provide good evidence

• for the null hypothesis being true.

• In some cases the null hypothesis might be what you actually want to demonstrate.

• For example, say there are two groups: people who play a souped up, bells and whistles version

• of a cognitive training game and those who plan a less fancy version of the game.

• If these two groups have the same amount of improvement in cognitive abilities (which

• is our null hypothesis says) that's really interesting.

• It means that researchers could feel comfortable using whichever version of the game that they want.

• If playing the fancier, more aesthetically pleasing game made people with strokes, or

• children with learning differences more likely to play it, researchers would know that's fine.

• They wouldn't have any concerns that the bells and whistles would detract from the

• cognitive benefits.

• P-values can be perplexing.

• But they give us insight into how to make decisions about data.

• They also remind us that people's perception of evidence can be arbitrary.

• What you consider sufficient evidence might not be enough to convince someone else.

• When you read about the results of scientific studies, you can see the alpha they used and

• decide if you think it's a stringent enough criteria.

• More than that, though, we now know what p-values are and how to interpret them.

• This helps us compare the logic of null hypothesis significance testing with how we normally

• reason about the world.

• Thanks for watching, I'll see you next time.

Hi, I'm Adriene Hill, and Welcome back to Crash Course, Statistics.

B1 中級

# P値の問題。クラッシュコース統計学 #22 (P-Value Problems: Crash Course Statistics #22)

• 0 0
林宜悉 に公開 2021 年 01 月 14 日