Placeholder Image

字幕表 動画を再生する

  • There are many ways to quantify variability, however, we will focus on the most common

  • ones: variance, standard deviation, and coefficient of variation.

  • In the field of statistics, we will typically use different formulas when working with population

  • data and sample data.

  • Let’s think about this for a bit.

  • When you have the whole population, each data point is known so you are 100% sure of the

  • measures you are calculating.

  • When you take a sample of this population and you compute a sample statistic, it is

  • interpreted as an approximation of the population parameter.

  • Moreover, if you extract 10 different samples from the same population, you will get 10

  • different measures.

  • Statisticians have solved the problem by adjusting the algebraic formulas for many statistics

  • to reflect this issue.

  • Therefore, we will explore both population and sample formulas, as they are both used.

  • You must be asking yourself why there are unique formulas for the mean, median and mode.

  • Well, actually, the sample mean is the average of the sample data points, while the population

  • mean is the average of the population data points.

  • Technically there are two different formulas, but they are computed in the same way.

  • Okay, now.

  • After this short clarification, it’s time to get onto variance.

  • Variance measures the dispersion of a set of data points around their mean value.

  • Population variance, denoted by sigma squared, is equal to the sum of squared differences

  • between the observed values and the population mean, divided by the total number of observations.

  • Sample variance, on the other hand, is denoted by s squared and is equal to the sum of squared

  • differences between observed sample values and the sample mean, divided by the number

  • of sample observations minus 1.

  • Alright.

  • *** When you are getting acquainted with statistics,

  • it is hard to grasp everything right away.

  • Therefore, let’s stop for a second to examine the formula for the population and try to

  • clarify its meaning.

  • The main part of the formula is its numerator, so that’s what we want to comprehend.

  • The sum of differences between the observations and the mean, squared.

  • Hmmso, the closer a number to the mean, the lower the result we will obtain, right?

  • And the further away from the mean it lies, the larger this difference.

  • Easy.

  • But why do we elevate to the second degree?

  • Squaring the differences has two main purposes.

  • First, by squaring the numbers, we always get non-negative computations.

  • Without going too deep into the mathematics of it, it is intuitive that dispersion cannot

  • be negative.

  • Dispersion is about distance and distance cannot be negative.

  • If, on the other hand, we calculate the difference and do not elevate to the second degree, we

  • would obtain both positive and negative values that when summed would cancel out, leaving

  • us with no information about the dispersion.

  • Second, squaring amplifies the effect of large differences.

  • For example, if the mean is 0 and you have an observation of 100, the squared spread

  • is 10,000! Alright, enough dry theory.

  • It is time for a practical example.

  • We have a population of five observations – 1, 2, 3, 4 and 5.

  • Let’s find its variance.

  • We start by calculating the mean: 1+2+3+4+5 divided by 5 equals 3.

  • Then we apply the formula we just saw: 1 minus 3 squared, plus, 2 minus 3 squared, plus,

  • 3 minus 3, squared, plus, 4 minus 3, squared, plus, 5 minus 3, squared.

  • All of these components have to be divided by 5.

  • When we do the math, we get 2.

  • So, the population variance of the data set is 2.

  • But what about the sample variance?

  • This would only be suitable if we were told that these five observations were a sample

  • drawn from a population.

  • So, let’s imagine that’s the case.

  • The sample mean is once again 3.

  • The numerator is the same, but the denominator is going to be 4, instead of 5, giving us

  • a sample variance of 2.5.

  • To conclude the variance topic, we should interpret the result.

  • Why is the sample variance bigger than the population variance?

  • In the first case, we knew the population, that is, we had all the data and we calculated

  • the variance.

  • In the second case, we were told that 1, 2, 3, 4 and 5 was a sample, drawn from a bigger

  • population.

  • Imagine the population of this sample were these 9 numbers: 1, 1, 1, 2, 3, 4, 5, 5 and

  • 5.

  • Clearly, the numbers are the same, but there is a concentration around the two extremes

  • of the data set – 1 and 5.

  • The variance of this population is 2.96.

  • So, our sample variance has rightfully corrected upwards in order to reflect the higher potential

  • variability.

  • This is the reason why there are different formulas for sample and population data.

  • *** While variance is a common measure of data

  • dispersion, in most cases the figure you will obtain is pretty large and hard to compare

  • as the unit of measurement is squared.

  • The easy fix is to calculate its square root and obtain a statistic known as standard deviation.

  • In most analyses you perform, standard deviation will be much more meaningful than variance.

  • As we saw in the previous lecture, there are different measures for the population and

  • sample variance.

  • Consequently, there is also population and sample standard deviation.

  • The formulas are: the square root of the population variance and square root of the sample variance

  • respectively.

  • I believe there is no need for an example of the calculation, right?

  • If you have a calculator in your hands, youll be able to do the job.

  • Alright.

  • The other measure we still have to introduce is the coefficient of variation.

  • It is equal to the standard deviation, divided by the mean.

  • Another name for the term is relative standard deviation.

  • This is an easy way to remember its formulait is simply the standard deviation relative

  • to the mean.

  • As you probably guessed, there is a population and sample formula once again.

  • So, standard deviation is the most common measure of variability for a single data set.

  • But why do we need yet another measure such as the coefficient of variation?

  • Well, comparing the standard deviations of two different data sets is meaningless, but

  • comparing coefficients of variation is not.

  • Aristotle once said: “Tell me, I’ll forget.

  • Show me, I’ll remember.

  • Involve me, I’ll understand.”

  • To make sure you remember, here’s an example of a comparison between standard deviations.

  • Let’s take the prices of pizza at 10 different places in New York.

  • They range from 1 to 11 dollars.

  • Now, imagine that you only have Mexican pesos and to you the prices look more like 18.81

  • pesos to 206.91 pesos, given the exchange rate of 18.81 pesos for one dollar.

  • Let’s combine our knowledge so far and find the standard deviations and coefficients of

  • variation of these two data sets.

  • First, we have to see if this is a sample or a population.

  • Are there only 11 restaurants in New York?

  • Of course not; this is obviously a sample drawn from all the restaurants in the city.

  • Then we have to use the formulas for sample measures of variability.

  • Second, we have to find the mean.

  • The mean in dollars is equal to 5.5 and the mean in pesos to 103.46.

  • The third step of the process is finding the sample variance.

  • Following the formula that we showed earlier, we can obtain 10.72 dollars squared and 3793.69

  • pesos squared.

  • The respective sample standard deviations are 3.27 dollars and 61.59 pesos.

  • Let’s make a couple of observations.

  • First, variance gives results in squared units, while standard deviation in original units.

  • This is the main reason why professionals prefer to use standard deviation as the main

  • measure of variability.

  • It is directly interpretable.

  • Squared dollars means nothing even in the field of statistics.

  • Second, we got standard deviations of 3.27 and 61.59 for the same pizza at the same 11

  • restaurants in New York City.

  • Seems wrong, right?

  • Don’t worry.

  • It is time to use our last toolthe coefficient of variation.

  • Dividing the standard deviations by the respective means, we get the two coefficients of variation.

  • The result is the same – 0.60.

  • Notice that it is not dollars, pesos, dollars squared or pesos squared.

  • It is just 0.60.

  • This shows us the great advantage that the coefficient of variation gives us.

  • Now, we can confidently say that the two data sets have the same variability, which was

  • what we expected beforehand.

  • Let’s recap what we have learned so far.

  • There are three main measures of variabilityvariance, standard deviation and coefficient

  • of variation.

  • Each of them has different strengths and applications.

  • You should feel confident using all of them as we are getting closer to more complex statistical

  • topics.

  • Thanks for watching!

There are many ways to quantify variability, however, we will focus on the most common

字幕と単語

動画の操作 ここで「動画」の調整と「字幕」の表示を設定することができます

B1 中級

分散、標準偏差、変動係数 (Variance, Standard Deviation, Coefficient of Variation)

  • 0 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語