 ## 字幕表 動画を再生する

• There are many ways to quantify variability, however, we will focus on the most common

• ones: variance, standard deviation, and coefficient of variation.

• In the field of statistics, we will typically use different formulas when working with population

• data and sample data.

• When you have the whole population, each data point is known so you are 100% sure of the

• measures you are calculating.

• When you take a sample of this population and you compute a sample statistic, it is

• interpreted as an approximation of the population parameter.

• Moreover, if you extract 10 different samples from the same population, you will get 10

• different measures.

• Statisticians have solved the problem by adjusting the algebraic formulas for many statistics

• to reflect this issue.

• Therefore, we will explore both population and sample formulas, as they are both used.

• You must be asking yourself why there are unique formulas for the mean, median and mode.

• Well, actually, the sample mean is the average of the sample data points, while the population

• mean is the average of the population data points.

• Technically there are two different formulas, but they are computed in the same way.

• Okay, now.

• After this short clarification, it’s time to get onto variance.

• Variance measures the dispersion of a set of data points around their mean value.

• Population variance, denoted by sigma squared, is equal to the sum of squared differences

• between the observed values and the population mean, divided by the total number of observations.

• Sample variance, on the other hand, is denoted by s squared and is equal to the sum of squared

• differences between observed sample values and the sample mean, divided by the number

• of sample observations minus 1.

• Alright.

• *** When you are getting acquainted with statistics,

• it is hard to grasp everything right away.

• Therefore, let’s stop for a second to examine the formula for the population and try to

• clarify its meaning.

• The main part of the formula is its numerator, so that’s what we want to comprehend.

• The sum of differences between the observations and the mean, squared.

• Hmmso, the closer a number to the mean, the lower the result we will obtain, right?

• And the further away from the mean it lies, the larger this difference.

• Easy.

• But why do we elevate to the second degree?

• Squaring the differences has two main purposes.

• First, by squaring the numbers, we always get non-negative computations.

• Without going too deep into the mathematics of it, it is intuitive that dispersion cannot

• be negative.

• Dispersion is about distance and distance cannot be negative.

• If, on the other hand, we calculate the difference and do not elevate to the second degree, we

• would obtain both positive and negative values that when summed would cancel out, leaving

• us with no information about the dispersion.

• Second, squaring amplifies the effect of large differences.

• For example, if the mean is 0 and you have an observation of 100, the squared spread

• is 10,000! Alright, enough dry theory.

• It is time for a practical example.

• We have a population of five observations – 1, 2, 3, 4 and 5.

• Let’s find its variance.

• We start by calculating the mean: 1+2+3+4+5 divided by 5 equals 3.

• Then we apply the formula we just saw: 1 minus 3 squared, plus, 2 minus 3 squared, plus,

• 3 minus 3, squared, plus, 4 minus 3, squared, plus, 5 minus 3, squared.

• All of these components have to be divided by 5.

• When we do the math, we get 2.

• So, the population variance of the data set is 2.

• But what about the sample variance?

• This would only be suitable if we were told that these five observations were a sample

• drawn from a population.

• So, let’s imagine that’s the case.

• The sample mean is once again 3.

• The numerator is the same, but the denominator is going to be 4, instead of 5, giving us

• a sample variance of 2.5.

• To conclude the variance topic, we should interpret the result.

• Why is the sample variance bigger than the population variance?

• In the first case, we knew the population, that is, we had all the data and we calculated

• the variance.

• In the second case, we were told that 1, 2, 3, 4 and 5 was a sample, drawn from a bigger

• population.

• Imagine the population of this sample were these 9 numbers: 1, 1, 1, 2, 3, 4, 5, 5 and

• 5.

• Clearly, the numbers are the same, but there is a concentration around the two extremes

• of the data set – 1 and 5.

• The variance of this population is 2.96.

• So, our sample variance has rightfully corrected upwards in order to reflect the higher potential

• variability.

• This is the reason why there are different formulas for sample and population data.

• *** While variance is a common measure of data

• dispersion, in most cases the figure you will obtain is pretty large and hard to compare

• as the unit of measurement is squared.

• The easy fix is to calculate its square root and obtain a statistic known as standard deviation.

• In most analyses you perform, standard deviation will be much more meaningful than variance.

• As we saw in the previous lecture, there are different measures for the population and

• sample variance.

• Consequently, there is also population and sample standard deviation.

• The formulas are: the square root of the population variance and square root of the sample variance

• respectively.

• I believe there is no need for an example of the calculation, right?

• If you have a calculator in your hands, youll be able to do the job.

• Alright.

• The other measure we still have to introduce is the coefficient of variation.

• It is equal to the standard deviation, divided by the mean.

• Another name for the term is relative standard deviation.

• This is an easy way to remember its formulait is simply the standard deviation relative

• to the mean.

• As you probably guessed, there is a population and sample formula once again.

• So, standard deviation is the most common measure of variability for a single data set.

• But why do we need yet another measure such as the coefficient of variation?

• Well, comparing the standard deviations of two different data sets is meaningless, but

• comparing coefficients of variation is not.

• Aristotle once said: “Tell me, I’ll forget.

• Show me, I’ll remember.

• Involve me, I’ll understand.”

• To make sure you remember, here’s an example of a comparison between standard deviations.

• Let’s take the prices of pizza at 10 different places in New York.

• They range from 1 to 11 dollars.

• Now, imagine that you only have Mexican pesos and to you the prices look more like 18.81

• pesos to 206.91 pesos, given the exchange rate of 18.81 pesos for one dollar.

• Let’s combine our knowledge so far and find the standard deviations and coefficients of

• variation of these two data sets.

• First, we have to see if this is a sample or a population.

• Are there only 11 restaurants in New York?

• Of course not; this is obviously a sample drawn from all the restaurants in the city.

• Then we have to use the formulas for sample measures of variability.

• Second, we have to find the mean.

• The mean in dollars is equal to 5.5 and the mean in pesos to 103.46.

• The third step of the process is finding the sample variance.

• Following the formula that we showed earlier, we can obtain 10.72 dollars squared and 3793.69

• pesos squared.

• The respective sample standard deviations are 3.27 dollars and 61.59 pesos.

• Let’s make a couple of observations.

• First, variance gives results in squared units, while standard deviation in original units.

• This is the main reason why professionals prefer to use standard deviation as the main

• measure of variability.

• It is directly interpretable.

• Squared dollars means nothing even in the field of statistics.

• Second, we got standard deviations of 3.27 and 61.59 for the same pizza at the same 11

• restaurants in New York City.

• Seems wrong, right?

• Don’t worry.

• It is time to use our last toolthe coefficient of variation.

• Dividing the standard deviations by the respective means, we get the two coefficients of variation.

• The result is the same – 0.60.

• Notice that it is not dollars, pesos, dollars squared or pesos squared.

• It is just 0.60.

• This shows us the great advantage that the coefficient of variation gives us.

• Now, we can confidently say that the two data sets have the same variability, which was

• what we expected beforehand.

• Let’s recap what we have learned so far.

• There are three main measures of variabilityvariance, standard deviation and coefficient

• of variation.

• Each of them has different strengths and applications.

• You should feel confident using all of them as we are getting closer to more complex statistical

• topics.

• Thanks for watching!

There are many ways to quantify variability, however, we will focus on the most common

B1 中級

# 分散、標準偏差、変動係数 (Variance, Standard Deviation, Coefficient of Variation)

• 0 0
林宜悉 に公開 2021 年 01 月 14 日