Placeholder Image

字幕表 動画を再生する

  • Hi, I'm Adriene Hill, and Welcome back to Crash Course, Statistics.

  • We've been talking a lot about how to tell whether two groups are different like whether

  • there's more car accidents on rainy days than snowy days.

  • or whether the IQ of university students is actually different from the population.

  • Today, we're going to start a conversation about statistical inference, which tells us

  • how we can go from describing data we already have to making inferences about data we don't have.

  • INTRO

  • If you've watched any of the other videos in this series, you've heard a lot about

  • uncertainty.

  • It comes up endlessly in statistics.

  • And uncertainty is at the core of what Inferential Statistics is about: making decisions about

  • ideas, or hypotheses.

  • I might be interested in whether listening to Mozart while doing calculus homework improves

  • my calculus grades.

  • But I need to test my hypothesis, I can't just have an idea and claim it's correct

  • without any evidence.

  • One thing we need for sure, is data.

  • So we could randomly sample two groups of 25 people and make half of them listen to

  • Mozart and half to do their homework in silence.

  • We collect their calculus grades and see that those who listened to Mozart scored on average

  • 3 points higher than those who didn't.

  • So Mozart's good.

  • Problem solved, break out Sonatas, right?

  • Unfortunately, no.

  • We've seen that sample parameters like the mean are just estimates of the mean of the

  • population that they are taken from.

  • The sample mean score of the Mozart group is higher.

  • But we don't have sufficient evidence that the population mean of Mozart listeners is

  • higher than those who did their work in silence.

  • We may have gotten an especially high sample mean that isn't close to the true population mean.

  • So we need a way to test our hypothesis while taking into account the random variation of

  • sample means.

  • In theory, one way you could test a hypothesis or model is by how well it predicts the data

  • you got.

  • For example, you and your best friend really love giraffes, and you've spent a lot of

  • time watching them at the zoo and drawing sketches of them.

  • So you both have a hypothesis about the average number of spots a baby giraffe has, but they're

  • slightly different.

  • You think that baby giraffes have an average of 175 spots, with a standard deviation of

  • 50 spots, and your best friend thinks that baby giraffes have an average of 209 spots

  • with a standard deviation of 45 spots.

  • With the permission of your local zoo, of course, you begin to collect a random sample

  • of baby giraffes and count how many spots they had.

  • Your sample of 25 baby giraffes had a mean of 200 spots.

  • Now that you have data, you can use it to evaluate which one of you is more likely to

  • be right.

  • Both you and your friend have a model or idea about what the population distribution of

  • baby giraffe spots is.

  • If you're right, then the sampling distribution of all the possible sample means we could

  • get looks like this: (RED in chart)

  • And the distribution of sample means for your friend's model looks like this: (black in chart)

  • Let's look at where our sample mean of 200 lies on both of these distributions.

  • You can see that you're more likely to see a mean of 200 spots under your friend's

  • hypothesis than yours.

  • If your model were correct, a mean of 200 spots is pretty rare...it's in the top 1.2%

  • most extreme values we'd expect to see, whereas in your friend's model, a mean of

  • 200 spots is only in the top 32%, which means it's pretty common that we'd see sample

  • means around 200 if your friend's model was correct.

  • But we don't always have predictions that are as specific as you and your friend's

  • predictions about baby giraffe spots.

  • We might have a more general hypothesis, like that the average number of baby giraffe spots

  • is more than 200... but that's all that you really know.

  • In situations like these, one common method of testing ideas is Null Hypothesis Significance Testing (NHST)

  • You have a hypothesis.

  • That people with a certain gene, we'll call it gene X, eat a different amount of calories

  • than the general population.

  • Null Hypothesis Significance testing asks you to test a different hypothesis--which

  • says there is no difference or effect of this gene.

  • And we'll see how well this null hypothesis predicts the data we've collected.

  • In this case the null hypothesis--or null model-- is that the population mean caloric

  • intake for people with gene X is actually 2,300, the same as the regular population.

  • If the null hypothesis is found to be infeasible, we canrejectit.

  • We can represent this hypothesis like this:

  • This might seem like a pretty round about way to test your theory that people with gene

  • X eat differently, and that's because it is.

  • Null Hypothesis Significance testing is a form of the reductio ad absurdum argument

  • which tries to discredit an idea by assuming the idea is true, and then showing that if

  • you make that assumption, something contradictory happens.

  • For example, you can use reductio ad absurdum to show that there is no largest positive

  • integer.

  • Let's assume there is a largest positive integer.

  • We'll call it AB forabsurdly big”.

  • Now add one to AB.

  • shoot.

  • That would be a larger positive integer...which would be absurd since AB is the largest.

  • Therefore, by reductio ad absurdum, there is no largest positive integer.

  • By the way, if this kind of argument sounds familiar, it might because reductio ad absurdum

  • is like proof by contradiction.

  • Let's test the null hypothesis for our our gene X case.

  • First, we assume that the mean number of calories eaten by people with gene X is 2,300, just

  • like the regular population.

  • If we can show that this assumption makes somethingabsurdhappen, then we can

  • rejectthe idea that it's true.

  • With data from 60 people with gene X, we see that the mean number of calories eaten was

  • 2,400 with a sample standard deviation of 500 calories.

  • We have to ask how rare orabsurdit would be to get a sample mean that is this

  • far away from our assumed mean of 2,300.

  • Essentially, we imagine that we take a random sample of 60 people with gene X over and over

  • and over again and calculate the mean.

  • Then we ask how many times out of all those experiments, do we get a sample mean that's

  • as far away from 2,300 as our actual sample mean of 2,400 is.

  • Even if you haven't heard of the term null hypothesis significance testing, you may have

  • heard of p-values which have been covered everywhere from academic journals, to Buzzfeed

  • articles.

  • A p-value answers the question of howrareyour data is by telling you the probability

  • of getting data that's as extreme as the data you observed if the null hypothesis was

  • true.

  • If your p-value was 0.10 you could say that your sample is in the top 10% most extreme

  • samples we'd expect to see based on the distribution of sample means.

  • If we assume that the null hypothesis is true, and the mean caloric intake of people with

  • gene X is 2,300 with a standard deviation of 500 calories, the distribution of sample

  • means will look like this, and tells us which means we expect to see and how often we expect

  • to see each of them.

  • Sample means around 2,300 are most common, but we'll also often see sample means a

  • little bit further away.

  • We can use this distribution to calculate our p-value.

  • This is similar to how we compared the likelihood of 200 giraffe spots in you and your friend's

  • models, but with only 1 model this time.

  • Here's our sample mean of 2,400 on this graph.

  • Only about 8.99 percent of the possible sample means are higher than 2,400.

  • So it's not that unlikely that we'd get a sample mean that's this high if the true

  • population mean was 2,300 calories.This is called a one-sided p-value since it only tells

  • us the probability of getting a sample mean that's higher than 2,400.

  • Often when we ask scientific questions likeDoes this medicine have a different level

  • of efficacy than the existing treatment?” we don't know which direction the effect

  • will be in.

  • The new medicine might be better...or it might be worse.

  • Gene X'ers might eat more, or they might eat less.

  • Because of this--and a few other reasons we'll talk about later in the series--p-values are

  • often two-sided, meaning that we look at how far away a value is from the mean, regardless

  • of if it's higher or lower . This allows us to reject the null hypothesis if our value

  • is significantly higher than the mean, or if the value is significantly lower than the

  • mean.

  • Because the distribution of sample means is symmetrical, if 9% of the samples of caloric

  • intake are higher than a mean of 2,400, about 18 percent of sample means for calories would

  • be as far away or further from the population mean than 2,400 is in either direction.

  • In other words, a two-sided p-value is a measure of how extreme your sample mean is, because

  • it tells you how often you'll get a value that's as or more extreme than the one you

  • got.

  • The smaller your p-value is, the morerareit would be to get your sample just by random

  • chance alone if the null is true.

  • In our example, we learned that if we assume that there is no effect of gene X on caloric

  • intake, then there would be an 18% chance, about 1 in 5, that we'd see a sample like

  • this just because of the random variation of samples.

  • To finish our attempt at reductio ad absurdum, we have to decide whether this sample isabsurd

  • orextremeenough to lead us to believe that this sample probably isn't from the

  • null distribution.

  • But that decision isn't always an easy one to make...It's not clear howrare

  • orabsurd” a sample needs to be before I decide torejectthe idea that the

  • sample was taken from a population that has the null distribution.

  • Especially since we don't have another distribution to compare it to, like we did with the giraffes.

  • Our p-value of 0.18 tells us that if we took a sample like this over and over, about 1

  • out of every 5 times we'd get a sample with a mean caloric intake that's further from

  • the mean than 2,400 calories is.

  • 1 in 5's not bad...but a 1 in 20 chance might be better.

  • And 1 in 100 better than that.

  • Some statisticians see a p-value as a continuous measure of evidence.

  • A p-value of 0.18 like ours might be considered pretty weak evidence that our sample isn't

  • taken from the null distribution.

  • But it's better than 0.19, which is in turn better than 0.20 and so on.

  • However, in Null Hypothesis Significance Testing, p-values need a cutoff.

  • We could set a cut of at 0.05 and say that a p-value that is less than 0.05 is sufficient

  • evidence to allow us torejectthe idea that the null hypothesis is true.

  • When we can reject the null hypothesis, we consider our result to bestatistically

  • significant”, which is basically a phrase that just meansunlikely due to random

  • chance alone”.

  • As we'll see later on, it doesn't always mean that it should besignificantor

  • meaningful to you.

  • A cutoff of 0.05 means that we want our sample value to be at least in the top 5% of most

  • extreme values in our distribution before we consider the value evidence against that

  • hypothesis.

  • And any p-value less than the 0.05 cutoff counts.

  • 0.049 leads to the same conclusion as 0.0001.

  • Both cause you to reject the null hypothesis.

  • The current scientific consensus in most fields is that your cutoff--or alpha--should be 0.05.

  • But there's huge disagreement in the field of statistics about whether 0.05 is appropriate,

  • and we're going to dive into later.

  • In the meantime I'm going to get 24 more giraffes so I can compare my model with my

  • friends.

  • Thanks for watching.

  • I'll see you next time.

Hi, I'm Adriene Hill, and Welcome back to Crash Course, Statistics.

字幕と単語

動画の操作 ここで「動画」の調整と「字幕」の表示を設定することができます

B1 中級

P値はどのようにして仮説を検定するのに役立つか。クラッシュコース統計学 #21 (How P-Values Help Us Test Hypotheses: Crash Course Statistics #21)

  • 1 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語