Placeholder Image

字幕表 動画を再生する

  • [MUSIC PLAYING]

  • SPEAKER 1: All right.

  • Welcome back, everyone, to an introduction

  • to Artificial Intelligence with Python.

  • Now last time, we took a look at machine learning-- a set of techniques

  • that computers can use in order to take a set of data

  • and learn some patterns inside of that data, learn how to perform a task,

  • even if we, the programmers, didn't give the computer explicit instructions

  • for how to perform that task.

  • Today, we transition to one of the most popular techniques and tools

  • within machine learning that have neural networks.

  • And neural networks were inspired as early as the 1940s

  • by researchers who were thinking about how it is that humans learn,

  • studying neuroscience and the human brain,

  • and trying to see whether or not we can apply those same ideas to computers as

  • well, and model computer learning off of human learning.

  • So how is the brain structured?

  • Well, very simply put, the brain consists of a whole bunch of neurons,

  • and those neurons are connected to one another

  • and communicate with one another in some way.

  • In particular, if you think about the structure of a biological neural

  • network-- something like this--

  • there are a couple of key properties that scientists observed.

  • One was that these neurons are connected to each other

  • and receive electrical signals from one another,

  • that one neuron can propagate electrical signals to another neuron.

  • And another point is that neurons process

  • those input signals, and then can be activated, that a neuron becomes

  • activated at a certain point, and then can propagate further signals

  • onto neurons in the future.

  • And so the question then became, could we take this biological idea of how it

  • is that humans learn-- with brains and with neurons--

  • and apply that to a machine as well, in effect,

  • designing an artificial neural network, or an ANN, which

  • will be a mathematical model for learning that is inspired

  • by these biological neural networks?

  • And what artificial neural networks will allow us to do

  • is they will first be able to model some sort of mathematical function.

  • Every time you look at a neural network, which we'll see more of later today,

  • each one of them is really just some mathematical function

  • that is mapping certain inputs to particular outputs,

  • based on the structure of the network, that depending

  • on where we place particular units inside of this neural network,

  • that's going to determine how it is that the network is going to function.

  • And in particular, artificial neural networks

  • are going to lend themselves to a way that we can learn what

  • the network's parameters should be.

  • We'll see more on that in just a moment.

  • But in effect we want to model, such that it is easy for us

  • to be able to write some code that allows for the network

  • to be able to figure out how to model the right mathematical function,

  • given a particular set of input data.

  • So in order to create our artificial neural network,

  • instead of using biological neurons, we're

  • just going to use what we're going to call units--

  • units inside of a neural network--

  • which we can represent kind of like a node in a graph,

  • which will here be represented just by a blue circle like this.

  • And these artificial units-- these artificial neurons--

  • can be connected to one another.

  • So here, for instance, we have two units that

  • are connected by this edge inside of this graph, effectively.

  • And so what we're going to do now is think

  • of this idea as some sort of mapping from inputs to outputs,

  • that we have one unit that is connected to another unit,

  • that we might think of this side as the input and that side of the output.

  • And what we're trying to do then is to figure out how to solve a problem,

  • how to model some sort of mathematical function.

  • And this might take the form of something

  • we saw last time, which was something like, we

  • have certain inputs like variables x1 and x2, and given those inputs,

  • we want to perform some sort of task--

  • a task like predicting whether or not it's going to rain.

  • And ideally, we'd like some way, given these inputs x1 and x2,

  • which stand for some sort of variables to do with the weather,

  • we would like to be able to predict, in this case,

  • a Boolean classification-- is it going to rain, or is it not going to rain?

  • And we did this last time by way of a mathematical function.

  • We defined some function h for our hypothesis function

  • that took as input x1 and x2--

  • the two inputs that we cared about processing-- in order

  • to determine whether we thought it was going to rain, or whether we thought it

  • was not going to rain.

  • The question then becomes, what does this hypothesis function do in order

  • to make that determination?

  • And we decided last time to use a linear combination of these input variables

  • to determine what the output should be.

  • So our hypothesis function was equal to something

  • like this: weight 0 plus weight 1 times x1 plus weight 2 times x2.

  • So what's going on here is that x1 and x2--

  • those are input variables-- the inputs to this hypothesis function--

  • and each of those input variables is being

  • multiplied by some weight, which is just some number.

  • So x1 is being multiplied by weight 1, x2 is being multiplied by weight 2,

  • and we have this additional weight-- weight 0--

  • that doesn't get multiplied by an input variable

  • at all, that just serves to either move the function up or move the function's

  • value down.

  • You can think of this as either a weight that's

  • just multiplied by some dummy value, like the number

  • 1 when it's multiplied by 1, and so it's not multiplied by anything.

  • Or sometimes you'll see in the literature,

  • people call this variable weight 0 a "bias,"

  • so that you can think of these variables as slightly different.

  • We have weights that are multiplied by the input

  • and we separately add some bias to the result as well.

  • You'll hear both of those terminologies used

  • when people talk about neural networks and machine learning.

  • So in effect, what we've done here is that in order

  • to define a hypothesis function, we just need

  • to decide and figure out what these weights should be,

  • to determine what values to multiply by our inputs to get some sort of result.

  • Of course, at the end of this, what we need

  • to do is make some sort of classification

  • like raining or not raining, and to do that, we use some sort of function

  • to define some sort of threshold.

  • And so we saw, for instance, the step function, which is defined as 1

  • if the result of multiplying the weights by the inputs is at least 0;

  • otherwise as 0.

  • You can think of this line down the middle-- it's kind

  • of like a dotted line.

  • Effectively, it stays at 0 all the way up to one point,

  • and then the function steps--

  • or jumps up-- to 1.

  • So it's zero before it reaches some threshold,

  • and then it's 1 after it reaches a particular threshold.

  • And so this was one way we could define what

  • we'll come to call an "activation function," a function that

  • determines when it is that this output becomes active--

  • changes to a 1 instead of being a 0.

  • But we also saw that if we didn't just want a purely binary classification,

  • if we didn't want purely 1 or 0, but we wanted

  • to allow for some in-between real number values,

  • we could use a different function.

  • And there are a number of choices, but the one that we looked at was

  • the logistic sigmoid function that has sort of an S-shaped curve,

  • where we could represent this as a probability--

  • that may be somewhere in between the probability of rain of something like

  • 0.5, and maybe a little bit later the probability of rain is 0.8--

  • and so rather than just have a binary classification of 0 or 1,

  • we can allow for numbers that are in between as well.

  • And it turns out there are many other different types

  • of activation functions, where an activation function just

  • takes the output of multiplying the weights together and adding that bias,

  • and then figuring out what the actual output should be.

  • Another popular one is the rectified linear unit, otherwise known ReLU,

  • and the way that works is that it just takes as input

  • and takes the maximum of that input and 0.

  • So if it's positive, it remains unchanged, but i if it's negative,

  • it goes ahead and levels out at 0.

  • And there are other activation functions that we can choose as well.

  • But in short, each of these activation functions,

  • you can just think of as a function that gets applied to the result of all

  • of this computation.

  • We take some function g and apply it to the result of all of that calculation.

  • And this then is what we saw last time-- the way of defining

  • some hypothesis function that takes on inputs,

  • calculates some linear combination of those inputs,

  • and then passes it through some sort of activation function to get our output.

  • And this actually turns out to be the model

  • for the simplest of neural networks, that we're

  • going to instead represent this mathematical idea graphically, by using

  • a structure like this.

  • Here then is a neural network that has two inputs.

  • We can think of this as x1 and this as x2.

  • And then one output, which you can think of classifying whether or not

  • we think it's going to rain or not rain, for example,

  • in this particular instance.

  • And so how exactly does this model work?

  • Well, each of these two inputs represents one of our input variables--

  • x1 and x2.

  • And notice that these inputs are connected

  • to this output via these edges, which are

  • going to be defined by their weights.

  • So these edges each have a weight associated with them--

  • weight 1 and weight 2--

  • and then this output unit, what it's going to do

  • is it is going to calculate an output based on those inputs

  • and based on those weights.

  • This output unit is going to multiply all the inputs by their weights,

  • add in this bias term, which you can think of as an extra w0 term that

  • gets added into it, and then we pass it through an activation function.

  • So this then is just a graphical way of representing the same idea

  • we saw last time, just mathematically.

  • And we're going to call this a very simple neural network.

  • And we'd like for this neural network to be

  • able to learn how to calculate some function,

  • that we want some function for the neural network to learn,

  • and the neural network is going to learn what

  • should the values of w0, w1, and w2 be.

  • What should the activation function be in order

  • to get the result that we would expect?

  • So we can actually take a look at an example of this.

  • What then is a very simple function that we might calculate?

  • Well, if we recall back from when we were looking at propositional logic,

  • one of the simplest functions we looked at

  • was something like the or function, that takes two inputs--

  • x and y-- and outputs 1, otherwise known as true, if either one of the inputs,

  • or both of them, are 1, and outputs a 0 if both of the inputs are 0, or false.

  • So this then is the or function.

  • And this was the truth table for the or function-- that as long

  • as either of the inputs are 1, the output of the function is 1,

  • and the only case where the output of 0 is where both of the inputs are 0.

  • So the question is, how could we take this and train a neural network to be

  • able to learn this particular function?

  • What would those weights look like?

  • Well, we could do something like this.

  • Here's our neural network, and I'll propose

  • that in order to calculate the or function,

  • we're going to use a value of 1 for each of the weights,

  • and we'll use a bias of negative 1, and then

  • we'll just use this step function as our activation function.

  • How then does this work?

  • Well, if I wanted to calculate something like 0 or 0,

  • which we know to be 0, because false or false is false, then

  • what are we going to do?

  • Well, our output unit is going to calculate

  • this input multiplied by the weight.

  • 0 times 1, that's 0.

  • Same thing here.

  • 0 times 1, that's 0.

  • And we'll add to that the bias, minus 1.

  • So that'll give us some result of negative 1.

  • If we plot that on our activation function-- negative 1 is here--

  • it's before the threshold, which means either 0 or 1.

  • It's only 1 after the threshold.

  • Since negative 1 is before the threshold,

  • the output that this unit provides it is going to be 0.

  • And that's what we would expect it to be, that 0 or 0 should be 0.

  • What if instead we had had 1 or 0, where this is the number 1?

  • Well, in this case, in order to calculate

  • what the output is going to be, we again have to do this weighted sum.

  • 1 times 1, that's 1.

  • 0 times 1, that's 0.