Placeholder Image

字幕表 動画を再生する

  • what's going on?

  • Everybody.

  • And welcome to part five of the reinforcement machine learning Siri's In this video and subsequent videos, we're gonna be talking about deep que learning or de que ends or deep que networks.

  • Um, and to start the prerequisites if you don't know deep learning stop, pump the brakes You gotta learn deep learning.

  • So, uh, if you want you go to the home page of python programming dot net click on the machine learning here and then do this deep learning basics with python tens flew in Caracas at least do like the 1st 2 the 1st 2 tutorials, I want to say, uh, getting it loading in euro data probably 1st 3 especially since we're using confidence.

  • Do the 1st 3 and then come back to this because otherwise you're gonna be so lost.

  • So Okay, Deep Cube learning basically got its start with this The following paper this human level control through deep reinforcement learning if you've ever looked up reinforcement learning.

  • First you found that all the tutorial suck and, uh, hopefully not this one.

  • And then, um and then you've seen the following image.

  • So this is how the D Deep Query network or Deep que network is gonna learn.

  • So what's gonna happen is you've got this input.

  • In this case, it's an image.

  • You've got some convolution layers.

  • They don't have to be convolution layers, fully connected layers.

  • You don't have to have those, but you've got some sort of deep neural network, and then you've got your output layer in your output layer is going to map directly to various actions that you could take on is gonna do so with a linear output.

  • So it's It's a regression model with many outputs, not just one.

  • Now, some people did try to do just one output per Esso.

  • It basically have a model per possible action.

  • That doesn't really work.

  • Well in that super.

  • I'm gonna take a really long time to train.

  • So anyways, here's another example.

  • It's beautiful, really beautiful.

  • Uh, you got input values.

  • So again, it doesn't have to be an image.

  • You know, this could be Delta X, delta y for the food delta X, delta y for the enemy, for example.

  • That could be your input Boom.

  • Then you've got your hidden layers again.

  • They could be convolution.

  • They could be dense.

  • They could be recurrent, whatever you want there.

  • And then here you got your output again with the linear activation.

  • So it's just gonna output thes scaler values.

  • Well, these are your cue values.

  • They map directly to the various actions you could take.

  • So in this case, um, let's say this was your output.

  • You would take the arg Maxwell, the hard Matt or the Max Value here is 9.5 and the values, you know, you're basically we were to map this and get the index values would be 0123 So argh!

  • Max would be one.

  • So whatever Action one is, uh, that would be the move that we would make.

  • Okay, so we've replaced this cute table with a deep neural network.

  • The benefit here is we can learn more complex environments.

  • First of all, we could learn more complex environments just because of deep neural Network is capable of actually mapping that.

  • Also, a deep neural network can take actions that it's never seen before.

  • So with Q learning if if a certain scenario presented itself and it was outside of any of the discreet combinations that we've ever seen, well, it's gonna take a random actions that got initialized randomly, a deep.

  • Nor network, on the other hand, is not.

  • It can actually recognize things that are similar, but it's never seen this thing before and it can act accordingly.

  • So first of all, deep neural network is gonna do better in that case, it.

  • So in that way it can solve for way more complex environments.

  • But also, as we saw as you just even barely increased that that discreet size of your cue table, the amount of memory that's required to maintain that beautiful just just explodes.

  • Right?

  • And it both in in terms of your observation, space size or you're discreet observations based size, but also in your actions.

  • So in our case, up to this point, we've only taken four.

  • I'd like to introduce a fume or actions moving forward.

  • So what we're gonna do is we're gonna keep the diagonal moves, but also allow for just straight cardinal, like up, down, left, right, a cz well, as don't move.

  • So So we're gonna introduce all of those as well, anyway.

  • Um so So for those two reasons, neural networks are way better.

  • The downside is nor networks are kind of finicky.

  • So we've got we're gonna have to handle for a lot of things that are finicky about neural networks.

  • Also, it's gonna take a lot longer to train.

  • So on an identical model or an identical environment like the blob end, where it took our cue table minutes Thio fully populate.

  • Basically, it's just a brute force operation, basically.

  • So if it's small enough, you know, your CPU can handle it where it took, you know, minutes for acute table is gonna take hours for Q learning, but the benefit is where it takes.

  • I'm sorry.

  • It's gonna take hours for deep que learning.

  • Um, but the benefit here is for certain environments where it takes us a long time like weeks for deep to learning to learn a new environment.

  • It would require, you know, petabytes of memory for acute, able to figure it out.

  • Okay, so So that's that.

  • There's your difference.

  • So really, they're gonna solve different types of environments and honestly, que learning is pretty much useless.

  • You can use it for cool novel little niche thing's for sure, but you're not gonna find too much use for it in the real world.

  • I don't think where is deep que learning?

  • You can actually start to apply it to really cool things.

  • So anyways, enough jibber jabber, Uh, the last concept I want to do before we get into, uh, actually, writing code is right here this learned value change.

  • So before the new cue function, basically was this whole thing.

  • Whereas the neural network kind of solves for this, like learning rate and all that, and just updating values.

  • This is solved through back propagation and fancy things.

  • But we still want to retain the future values because neural networks don't care.

  • They don't give a heck about future they care about right now.

  • What is this thing?

  • What is this exact value they don't care about?

  • Well, what does this chain of events do other than a recurrent neural network, I suppose, but really, over current neural network cares about the history, not the future.

  • So anyways, in this case, we still want to retain this.

  • So we are so going to use this, and basically the way we're gonna do this is every step this agent takes.

  • We still need up to attack you value.

  • So what we're gonna do is we query for the Q value we take that action or a random one, depending on Absalon.

  • Then we you know, we re sampled the environment, figure out what are you know what the next reward would be here, and then we can calculate this nuke you value, um, and then do a fit operation.

  • So people who are familiar with the neural networks are already like, Wow, that's a lot of Fitz.

  • Yep, Sure is.

  • Also, that's one fit at a time.

  • So as you're going to see when we go to write this code, we actually have to have to handle for that as well, because that would make for a very unstable neural network s.

  • So there's two ways that we're handling for that.

  • But with that, I think we're just gonna jump in the code.

  • I think it'll make more sense.

  • You coating it in covering it as we get to those points.

  • Um, okay, so hopefully that was enough information.

  • Like I said, all that you two rows have ever seen on deep you learning have been terrible.

  • And there's, like, so much information that's left out.

  • Um, that, you know, to get the overarching concept.

  • Honestly, this picture is enough.

  • But then when it comes time to actually look at code and how it really will work, like can you sit down and code it after you read someone's tutorial?

  • My hope is you really can't after this one.

  • Otherwise, I don't think a tutorial exist for doing it.

  • So So anyways, let's get started.

  • So the first thing we're gonna do is I'm going to create we're gonna at least hopefully code or agent, uh, least the model and talk about some of the parameters.

  • And then probably with next tutorial will do fit minute training, basically and all that.

  • So anyway, class de que en agents and this agent, let's just do defying create model first.

  • That should be fairly basic.

  • It's just going to be a continent.

  • We don't have to do a continent.

  • I'm just gonna do one just so you can more easily translate this to something else.

  • So the first thing you should do, what, any time you learn something is go try to apply it to something else, so try it and then you'll meet it cause you might make sense to you as you're listening to me, talk about it.

  • But then you go to try to apply it, and then suddenly it doesn't work.

  • You're confused or you realize, Oh, I don't really actually know how to do this.

  • So anyway, first thing you should do, try to apply it.

  • Someone complained recently about my drinking on camera, saying I don't want to hear me gulp.

  • Um, I'm drinking because my mouth is dry and the most annoying thing ever is listening to someone talked with a dry mouth.

  • So, uh, you're welcome, jerk.

  • So anyway, um, create model, Okay, so we'll start with model equals a sequential model.

  • And now let's go ahead and pick some importance.

  • So the first thing that I need to bring in is from Karen Stop models.

  • It's import import.

  • Sequential.

  • And then we're going to say from Cara, stop layers.

  • Let's import dense dropout con tu de max pool to D.

  • Um, activation activation flatten.

  • I think that's everything.

  • Sorry, that ran over my face.

  • My bad.

  • Anyway, there we go.

  • Dents, dropout, Combat duty, duty, Max, pool Tootie.

  • And actually it's Max pooling to de activation and flatten.

  • Okay, those air, all of the imports.

  • Also, that you always go to the text based version of editorials of code runs over my ugly mug.

  • Um, you can check those at office all.

  • Actually, I don't even know.

  • Maybe at the very end, it'll be there anyway.

  • By the time you need it, it'll be there.

  • Anyways, Um Okay, so we got that stuff.

  • Um, the other thing.

  • I'll go ahead and import to is from KERA.

  • Stop callbacks.

  • That's important.

  • Tensor board.

  • Um, we need other things, but I want to cover them when we get there.

  • So model is equal to a sequential model that we're gonna say model dot ad, and we're gonna start adding convert to capital D.

  • Let me fix that as well.

  • Cool.

  • Constitute, e uh, which will give us 256 convolutions.

  • And the convolution, um, window be a three by three, and then input underscore shape.

  • And this is going to be equal to, uh, will say, end dot observation, space values, observation space, and then close this off.

  • And this isn't quite yet exist.

  • We have to create the environment.

  • So a little uncertain if I'm gonna actually rewrite the environment.

  • It it's converted to object oriented programming.

  • Now, I may not actually like.

  • We might just copy and paste the updated environment.

  • I'll just talk about it because otherwise it'll take, like, an hour to go for that.

  • So anyways, but we will.

  • This will exist at some point in the near future.

  • Um, then once we've done that, I guess let me zoom out a little bit.

  • Since we're running out of space here, the next thing that we're gonna do is we'll say model dot and sets are con layer.

  • We're gonna add an activation, and the activation here will use rectified, linear and then, uh, model dot ad max pooling to D.

  • We'll use a two by two window again if you don't know what Max pooling is or convolutions, uh, check out that basics tutorial because I cover it, and I also have beautifully drawn photos.

  • If you really like my other photo, you'll love that those voters, uh then after the max pulling, we're just a model, that ad, and we're gonna add a dropout layer and we'll drop out 20% and then we're just gonna do the same thing again.

  • So this will be two by 2 56 So copy pasta.

  • We don't need to include the environment.

  • Shape together the input shape Rather um, this through.

  • Okay, so two by 2 56 then we're gonna say model dot ad We're going to do, ah, flatten here so we can pass it through dense layers.

  • And then we'll say model dot ad will throw in a dense 64 layer and then finally model dot ad a dense layer and it'll be end dot action space size, and then the activation activation will be linear.

  • And then model dot compile will say loss is MSC for mean squared air optimizer will be the Adam Optimizer with a learning rate of 0.1 Uh and then metrics we will track a chorus e.

  • Okay, so that is our model again.

  • Sample code is in.

  • There will be a link in the description to sample code.

  • So if you missed anything, you can check that out.

  • Okay, So, um and then Adam, we don't actually have Adam imported, so let's go ahead and grab that as well.

  • So from care ross dot Optimizers import Adam Just for the record, too, if anybody's watching this in the future, This is still tense or feel like one point 15 I don't really know, actually version online, but it's not tensorflow to yet s, so keep that in mind so something might change by then.

  • And if it has changed, check the comments.

  • Um, And then when it finally actually does truly matter what version on tensorflow I'm on, I will let you guys know.

  • Um, it's also kind of expected that you guys will know howto install tensorflow and care.

  • Awesome.

  • Not gonna cover that again for that.

  • Go to the basics tutorial.

  • So anyway, model compile.

  • Okay, then we will return that model.

  • Okay, so that's our model, then.

  • Uh, define, innit?

  • So then now we're gonna do the innit method for this model.

  • So we're gonna say a self dot model equals create underscore model.

  • And then So that is going What is your problem?

  • Why are you unhappy?

  • What?

  • What the heck am I missing?

  • Oh, self doubt, Karima.

  • Uh, more coffee.

  • Definitely necessary.

  • Okay.

  • Self dot model is create model.

  • So that is that's gonna be our main model.

  • But then we need what's gonna be called or target model.

  • Put the hex index.

  • Let's write that first.

  • So self dot we'll call this target Model equals self dot create model.

  • But then we're gonna say self dot target model dot set underscore weights.

  • And we want to set the weights to be exactly the same as self doubt.

  • Model dot Get underscore waits.

  • Okay, So what's going on here?

  • So way we're gonna wind up having to models here The reason why we want to do that.

  • Is there a few reasons?

  • But mostly it's because this model is going to be going crazy.

  • So first of all, we initiate the model itself is initialized randomly, as all neural networks are.

  • But we also are going to initialize with an F salon, likely of one.

  • So the, um, the agent is also gonna be taking random actions meaningless.

  • So initially, this model is going to be trying to fit to a whole bunch of random, and that's gonna be useless.

  • And but eventually, as as its explored as it's gotten rewards, it's gonna start toe, hopefully figure something out.

  • But the problem is, we're doing a dot predict for every single step this agent takes and what we wanna have is some sort of consistency in those DOT predicts that were doing because besides, doing a dot predict every single step, We're also doing a dot bit every single step.

  • So this model, especially initially, is just gonna be, like call over the place as its attempting to figure things out randomly.

  • So the way we're gonna compensate for that, we're gonna have to models.

  • So we've got self dot model.

  • This