字幕表 動画を再生する
-
This lecture is going to serve as an overview of what a probability distribution is and
-
what main characteristics it has.
-
Simply put, a distribution shows the possible values a variable can take and how frequently
-
they occur.
-
Before we start, let us introduce some important notation we will use for the remainder of
-
the course.
-
Assume that “upper-case Y” represents the actual outcome of an event and “lowercase
-
y” represents one of the possible outcomes.
-
One way to denote the likelihood of reaching a particular outcome “y”, is P of, Y equals
-
y.
-
We can also express it as “p of y”.
-
For example, uppercase “Y” could represent the number of red marbles we draw out of a
-
bag and lowercase “y” would be a specific number, like 3 or 5.
-
Then, we express the probability of getting exactly 5 red marbles as “P, of Y equals
-
5”, or “p of 5”.
-
Since “p of y” expresses the probability for each distinct outcome, we call this the
-
probability function.
-
Good job, folks!
-
So, probability distributions, or simply probabilities, measure the likelihood of an outcome depending
-
on how often it features in the sample space.
-
Recall that we constructed the probability frequency distribution of an event in the
-
introductory section of the course.
-
We recorded the frequency for each unique value and divide it by the total number of
-
elements in the sample space.
-
Usually, that is the way we construct these probabilities when we have a finite number
-
of possible outcomes.
-
If we had an infinite number of possibilities, then recording the frequency for each one
-
becomes impossible, because… there are infinitely many of them!
-
For instance, imagine you are a data scientist and want to analyse the time it takes for
-
your code to run.
-
Any single compilation could take anywhere from a few the milliseconds to several days.
-
Often the result will be between a few milliseconds and a few minutes.
-
If we record time in seconds, we lose precision which we want to avoid.
-
To do so we need to use the smallest possible measurement of time.
-
Since every milli-, micro-, or even nanosecond could be split in half for greater accuracy,
-
no such thing exists.
-
Less than an hour from now we will talk in more detail about continuous distributions
-
and how to deal with them.
-
Let’s introduce some key definitions.
-
Now, regardless of whether we have a finite or infinite number of possibilities, we define
-
distributions using only two characteristics – mean and variance.
-
Simply put, the mean of the distribution is its average value.
-
Variance, on the other hand, is essentially how spread out the data is.
-
We measure this “spread” by how far away from the mean all the values are.
-
We denote the mean of a distribution as the Greek letter ‘mu’ and its variance as
-
“sigma squared”.
-
Okay.
-
When analysing distributions, it is important to understand what kind of data we have - population
-
data or sample data.
-
Population data is the formal way of referring to “all” the data, while sample data is
-
just a part of it.
-
For example, if an employer surveys an entire department about how they travel to work,
-
the data would represent the population of the department.
-
However, this same data would also be just a sample of the employees in the whole company.
-
Something to remember when using sample data is that we adopt different notation for the
-
mean and variance.
-
We denote sample mean as “x bar” and sample variance as “s” squared.
-
One flaw of variance is that it is measured in squared units.
-
For example, if you are measuring time in seconds, the variance would be measured in
-
seconds squared.
-
Usually, there is no direct interpretation of that value.
-
To make more sense of variance, we introduce a third characteristic of the distribution,
-
called standard deviation.
-
Standard deviation is simply the positive square root of variance.
-
As you can suspect, we denote it as “sigma” when dealing with a population, and as “s”
-
when dealing with a sample.
-
Unlike variance, standard deviation is measured in the same units as the mean.
-
Thus, we can directly interpret it and is often preferable.
-
One idea which we will use a lot is that any value between “mu minus sigma” and “mu
-
plus sigma” falls within one standard deviation away from the mean.
-
The more congested the middle of the distribution, the more data falls within that interval.
-
Similarly, the less data falls within the interval, the more dispersed the data is.
-
Fantastic!
-
It is important to know there exists a constant relationship between mean and variance for
-
any distribution.
-
By definition, the variance equals the expected value of the squared difference from the mean
-
for any value.
-
We denote this as “sigma squared, equals the expected value of Y minus mu, squared”.
-
After some simplification, this is equal to the expected value of “Y squared” minus
-
“mu” squared.
-
As we will see in the coming lectures, if we are dealing with a specific distribution,
-
we can find a much more precise formula.
-
Okay, when we are getting acquainted with a certain dataset we want to analyse or make
-
predictions with, we are most interested in the mean, variance and type of the distribution.
-
In our next video we will introduce several distributions and the characteristics they
-
possess.
-
Thanks for watching!
-
4.2 Types of distributions
-
Hello, again!
-
In this lecture we are going to talk about various types of probability distributions
-
and what kind of events they can be used to describe.
-
Certain distributions share features, so we group them into types.
-
Some, like rolling a die or picking a card, have a finite number of outcomes.
-
They follow discrete distributions and we use the formulas we already introduced to
-
calculate their probabilities and expected values.
-
Others, like recording time and distance in track & field, have infinitely many outcomes.
-
They follow continuous distributions and we use different formulas from the once we mentioned
-
so far.
-
Throughout the course of this video we are going to examine the characteristics of some
-
of the most common distributions.
-
For each one we will focus on an important aspect of it or when it is used.
-
Before we get into the specifics, you need to know the proper notation we implement when
-
defining distributions.
-
We start off by writing down the variable name for our set of values, followed by the
-
“tilde” sign.
-
This is superseded by a capital letter depicting the type of the distribution and some characteristics
-
of the dataset in parenthesis.
-
The characteristics are usually, mean and variance but they may vary depending on the
-
type of the distribution.
-
Alright!
-
Let us start by talking about the discrete ones.
-
We will get an overview of them and then we will devote a separate lecture to each one.
-
So, we looked at problems relating to drawing cards from a deck or flipping a coin.
-
Both examples show events where all outcomes are equally likely.
-
Such outcomes are called equiprobable and these sorts of events follow a Uniform Distribution.
-
Then there are events with only two possible outcomes – true or false.
-
They follow a Bernoulli Distribution, regardless of whether one outcome is more likely to occur.
-
Any event with two outcomes can be transformed into a Bernoulli event.
-
We simply assign one of them to be “true” and the other one to be “false”.
-
Imagine we are required to elect a captain for our college sports team.
-
The team consists of 7 native students and 3 international students.
-
We assign the captain being domestic to be “true” and the captain being an international
-
as “false”.
-
Since the outcome can now only be “true” or “false”, we have a Bernoulli distribution.
-
Now, if we want to carry out a similar experiment several times in a row, we are dealing with
-
a Binomial Distribution.
-
Just like the Bernoulli Distribution, the outcomes for each iteration are two, but we
-
have many iterations.
-
For example, we could be flipping the coin we mentioned earlier 3 times and trying to
-
calculate the likelihood of getting heads twice.
-
Lastly, we should mention the Poisson Distribution.
-
We use it when we want to test out how unusual an event frequency is for a given interval.
-
For example, imagine we know that so far Lebron James averages 35 points per game during the
-
regular season.
-
We want to know how likely it is that he will score 12 points in the first quarter of his
-
next game.
-
Since the frequency changes, so should our expectations for the outcome.
-
Using the Poisson distribution, we are able to determine the chance of Lebron scoring
-
exactly 12 points for the adjusted time interval.
-
Great, now on to the continuous distributions!
-
One thing to remember is that since we are dealing with continuous outcomes, the probability
-
distribution would be a curve as opposed to unconnected individual bars.
-
The first one we will talk about is the Normal Distribution.
-
The outcomes of many events in nature closely resemble this distribution, hence the name
-
“Normal”.
-
For instance, according to numerous reports throughout the last few decades, the weight
-
of an adult male polar bear is usually around 500 kilograms.
-
However, there have been records of individual species weighing anywhere between 350kg and
-
700kg.
-
Extreme values, like 350 and 700, are called outliers and do not feature very frequently
-
in Normal Distributions.
-
Sometimes, we have limited data for events that resemble a Normal distribution.
-
In those cases, we observe the Student’s-T distribution.
-
It serves as a small sample approximation of a Normal distribution.
-
Another difference is that the Student’s-T accommodates extreme values significantly
-
better.
-
Graphically, that is represented by the curve having fatter “tails”.
-
Overall, this results in more values extremely far away from the mean, so the curve would
-
probably more closely resemble a Student’s-T distribution than a Normal distribution.
-
Now imagine only looking at the recorded weights of the last 10 sightings across Alaska and
-
Canada.
-
The lower number of elements would make the occurrence of any extreme value represent
-
a much bigger part of the population than it should.
-
Good job, everyone!
-
Another continuous distribution we would like to introduce is the Chi-Squared distribution.
-
It is the first asymmetric continuous distribution we are dealing with as it only consists of
-
non-negative values.
-
Graphically, that means that the Chi-Squared distribution always starts from 0 on the left.
-
Depending on the average and maximum values within the set, the curve of the Chi Squared
-
graph is usually skewed to the left.
-
Unlike the previous two distributions, the Chi-Squared does not often mirror real life
-
events.
-
However, it is often used in Hypothesis Testing to help determine goodness of fit.
-
The next distribution on our list is the Exponential distribution.
-
The Exponential distribution is usually present when we are dealing with events that are rapidly
-
changing early on.
-
An easy to understand example is how online news articles generates hits.
-
They get most of their clicks when the topic is still fresh.
-
The more time passes, the more irrelevant it becomes as interest dies off.
-
The last continuous distribution we will mention is the Logistic distribution.
-
We often find it useful in forecast analysis when we try to determine a cut-off point for
-
a successful outcome.
-
For instance, take a competitive e-sport like Dota 2 . We can use a Logistic distribution
-
to determine how much of an in-game advantage at the 10-minute mark is necessary to confidently
-
predict victory for either team.
-
Just like with other types of forecasting, our predictions would never reach true certainty
-
but more on that later!
-
Woah!
-
Good job, folks!
-
In the next video we are going to focus on discrete distributions.
-
We will introduce formulas for competing Expected Values and Standard Deviations before looking
-
into each distribution individually.
-
Thanks for watching!
-
4.3 Discrete Distributions
-
Welcome back!
-
In this video we will talk about discrete distributions and their characteristics.
-
Let’s get started!
-
Earlier in the course we mentioned that events with discrete distributions have finitely
-
many distinct outcomes.
-
Therefore, we can express the entire probability distribution with either a table, a graph
-
or a formula.
-
To do so we need to ensure that every unique outcome has a probability assigned to it.
-
Imagine you are playing darts.
-
Each distinct outcome has some probability assigned to it based on how big its associated
-
interval is.
-
Since we have finitely many possible outcomes, we are dealing with a discrete distribution.
-
Great!
-
In probability, we are often more interested in the likelihood of an interval than of an
-
individual value.
-
With discrete distributions, we can simply add up the probabilities for all the values
-
that fall within that range.
-
Recall the example where we drew a card 20 times.
-
Suppose we want to know the probability of drawing 3 spades or fewer.
-
We would first calculate the probability of getting 0, 1, 2 or 3 spades and then add them
-
up to find the probability of drawing 3 spades or fewer.
-
One peculiarity of discrete events is that the “The probability of Y being less than
-
or equal to y equals the probability of Y being less than y plus 1”.
-
In our last example, that would mean getting 3 spades or fewer is the same as getting fewer
-
than 4 spades.
-
Alright!
-
Now that you have an idea about discrete distributions, we can start exploring each type in more detail.
-
In the next video we are going to examine the Uniform Distribution.
-
Thanks for watching!
-
4.4 Uniform Distribution
-
Hey, there!
-
In this lecture we are going to discuss the uniform distribution.
-
For starters, we use the letter U to define a uniform distribution, followed by the range
-
of the values in the dataset.
-
Therefore, we read the following statement as “Variable “X” follows a discrete
-
uniform distribution ranging from 3 to 7”.
-
Events which follow the uniform distribution, are ones where all outcomes have equal probability.
-
One such event is rolling a single standard six-sided die.
-
When we roll a standard 6-sided die, we have equal chance of getting any value from 1 to
-
6.
-
The graph of the probability distribution would have 6 equally tall bars, all reaching
-
up to one sixth.
-
Many events in gambling provide such odds, where each individual outcome is equally likely.
-
Not only that, but many everyday situations follow the Uniform distribution.
-
If your friend offers you 3 identical chocolate bars, the probabilities assigned to you choosing
-
one of them also follow the Uniform distribution.
-
One big drawback of uniform distributions is that the expected value provides us no
-
relevant