Placeholder Image

字幕表 動画を再生する

  • And you thought we were done with the ML5 neural network

  • tutorials.

  • But no.

  • There is one more because I am leading to something.

  • I am going to-- you will soon see in this playlist

  • a section on convolutional neural networks.

  • But before I get to convolutional neural networks,

  • I want to look at reasons why a convolutional layer.

  • I have to answer this question like, what is a convolution?

  • I've got to get to that.

  • But before I get to that, I want to just see why

  • they exist in the first place.

  • So I want to start with another scenario

  • for training your own neural network.

  • That scenario is an image classifier.

  • Now you might rightfully be sitting

  • there saying to yourself, you've done videos

  • on image classifiers before.

  • And in fact, I have.

  • The very beginning of this whole series

  • was about using a pre-trained model for an image classifier.

  • And guess what?

  • That pre-trained model had convolutional layers in it.

  • So I want to now take the time to unpack what that means more

  • and look at how you could train your own convolutional neural

  • network.

  • Again, first though, let's just think

  • about how we would make an image classifier

  • with what we have so far.

  • We have an image.

  • And that image is being sent into an ML5 neural network.

  • And out of that neural network comes either a classification

  • or regression.

  • And in fact, we could do an image regression.

  • And I would love to do that.

  • But let me start with a classifier

  • because I think it's a lot simpler to think about

  • and consider.

  • So maybe it comes out with one of two things,

  • either a cat or a dog and some type of confidence score.

  • I previously zoomed in on the ML5 neural network

  • and looked at what's inside, right?

  • We have this hidden layer with some number

  • of units and an output layer, which, in this case,

  • would have just two if there's two classes.

  • Everything is connected, and then there are the inputs.

  • With post net, you might recall, there were 34 inputs

  • because there were 17 points on my body,

  • each with an xy position.

  • What are these?

  • Let's just say, for the sake of argument,

  • that this image is 10 by 10 pixels.

  • So I could consider every single pixel

  • to be an individual input into this ML5 neural network.

  • But each pixel has three channels,

  • and R, G, and B. So that would make 100 times three inputs,

  • 300 inputs.

  • That's reasonable.

  • So this is actually what I want to implement.

  • Take the idea of a two layer neural network

  • to perform classification, the same thing I've

  • done in previous videos, but, this time, use as the input

  • the actual raw pixels.

  • Can we get meaningful results from just doing that?

  • After we do that, I want to return back to here

  • and talk about why this is inadequate or not going

  • to say inadequate but how this can be improved on

  • by adding another layer.

  • So this layer won't--

  • sorry.

  • The inputs will still be there.

  • We're always going to have the inputs.

  • The hidden layer will still be there.

  • And the output layer will still be there.

  • But I want to insert right in here

  • something called a convolutional layer.

  • And I want to do a two dimensional convolutional

  • layer.

  • So I will come back.

  • If you want to just skip to that next video,

  • if and when it exists, that's when I

  • will start talking about that.

  • But let's just get this working as a frame of reference.

  • I'm going to start with some prewritten code.

  • All this does, it's a simple P5JS sketch

  • that opens a connection to the web cam,

  • resizes it to 10 by 10 pixels, and then

  • draws a rectangle in the canvas for each and every pixel.

  • So this could be unfamiliar to you.

  • How do you look at an image in JavaScript in P5

  • and address every single pixel individually?

  • If that's unfamiliar to you, I would refer

  • to my video on that topic.

  • That's appearing over next to me right now.

  • If you go take a look at that and then come back here.

  • But really, this is just looking at every x and y position,

  • getting the R, G, B values, filling a rectangle,

  • and drawing it.

  • So what I want to do next is think about,

  • how do I configure this ML5 neural network,

  • which expects that 10 by 10 image as its input?

  • I'm going to make a variable called pixel brain.

  • And pixel brain will be a new ML5 neural network.

  • I should have mentioned that you could find the link to the code

  • that I'm starting with, in case you

  • wanted to code along with me, both the finished code

  • and the code I'm starting with will

  • be in this video's description.

  • So to create a neural network, I call the neural network

  • function and give it a set of options.

  • One thing I should mention is while in all the videos

  • I've done so far, I've said that you

  • need to specify the number of inputs

  • and the number of outputs to configure your neural network.

  • The truth is ML5 is set up to infer

  • the total number of inputs and outputs

  • based on the data you're training it with.

  • But to be really explicit about things

  • and make the tutorial as clear as possible,

  • I'm going to write those into the options.

  • So how many inputs?

  • Think about that for a second.

  • The number of columns times the number of the rows times

  • R, G, B. Maybe I would have a grayscale image.

  • Maybe I could just make it I don't

  • need a separate input for R, G, and B. But let's do that.

  • Why not?

  • I have the 10 by 10 in a variable called video size.

  • So let's make that video size times video size times three.

  • Let's just make a really simple classifier that's

  • like I'm here or not here.

  • So I'm going to make that two.

  • The task is classification.

  • And I want to see debugging when I train the model.

  • Now I have my pixel brain, my neural network.

  • Oops.

  • That should be three.

  • Let's go with my usual typical, terrible interface,

  • meaning no interface.

  • And I'm just going to train the model based on when

  • I press keys on the keyboard.

  • So I'll add a key press function.

  • And then let me just a little goofy here,

  • which I'm just going to say when I press the key,

  • add example key.

  • So I need a new function called add example.

  • Label.

  • So basically, I'm going to make the key that I press the label.

  • So I'm going to press a bunch of keys

  • when I'm standing in front the camera

  • and then press a different key when I'm not standing

  • in front of the camera.

  • Now comes the harder work.

  • I need to figure out how to make an array of inputs

  • out of all of the pixels.

  • Luckily for me, this is something

  • that I have done before.

  • And in fact, I actually have some code

  • that I could pull from right in here,

  • which is looking at how to go through all the pixels

  • to draw them.

  • But here's the thing.

  • I am going to do something to flatten the data.

  • I am not going to keep the data in its original columns

  • and rows orientation.

  • I'm going to take the pixels and flatten them out

  • into one single array.

  • Guess what?

  • This is actually the problem that

  • convolutional neural networks will address.

  • It's bad to flatten the data because its spatial arrangement

  • is meaningful.

  • I'll start by creating an empty array called inputs.

  • Then I'll loop through all of the pixels.

  • And to be safe, I should probably

  • say video dot load pixels.

  • The pixels may already be loaded because I'm

  • doing that for down here.

  • And I could do something where if I'm drawing them,

  • I might as well create the data here.

  • But I'm going to be redundant about it.

  • And I'm going to say--

  • ah, but this is weird.

  • Here's the weird thing.

  • I thought I wasn't going to talk about the pixel array

  • in this video and just refer you to the previous one.

  • But I can't escape it right now.

  • For every single pixel in an image in P5JS,

  • there are four spots in the array, a red value,

  • a green value, a blue value, and an alpha value.

  • Alpha value for transparency.

  • The alpha value, I can ignore because it's

  • going to be 255 for everything.

  • There's no transparency.

  • If I wanted to learn transparency,

  • I could make that an input and have 10 by 10 times 4.

  • But I don't need to do that here.

  • So in other words, pixel zero starts here, 0, 1, 2, 3.

  • And the second pixel starts at index four.

  • So as I'm iterating over all of the pixels,

  • I want to move through the array four spaces at a time.

  • There's a variety of ways I could approach this,

  • but that's going to make things easiest for me.

  • So that means right over here, this

  • should be plus equals four.

  • Then I can say the red value is video dot pixels index

  • I. The green value is at I plus one.

  • And the blue value is at I plus two.

  • And just to be consistent, I'm going

  • to just put a plus zero in there so everything lines up nicely.

  • So that's the R, G, and B values.

  • Then I want those R, G, and B values

  • for this particular pixel to go in the inputs array.

  • The chat is making a very good point,

  • which is that I have all of the stuff in an array already.

  • And all I'm really doing is making a slightly smaller array

  • that's removing every fourth element.

  • I could do that with the filter function

  • or some kind of higher order function

  • or maybe just use the original array.

  • I'm not really sure why I'm doing it this way.

  • But I'm going to emphasize this data preparation step.

  • So I look forward to hearing your comments about

  • and maybe reimplementations of this that just

  • use the pixel array directly.

  • But I'm going to keep it this way for right now.

  • So I'm taking the R, G, and B and putting them

  • all into my new array.

  • Then the target is just the label,

  • a single label in an array.

  • And I can now add this as training data,

  • pixel brain add data inputs target.

  • Let's console log something just to see that this is working.

  • So I'm going to console log the inputs.

  • And let's also console log the target,

  • just to see that something is coming out.

  • So, a, yeah.

  • We can see there's an array there.

  • And there's the a.

  • And now if I do b, I'm getting a different array with b there.

  • So I'm going to assume this is working.

  • I could say inputs dot length to make sure

  • that that's the right idea.

  • Yeah.

  • It's got 300 things in it.

  • OK.

  • Next step is to train the model.

  • So I'm going to say, if the key pressed is T,

  • don't add an example but rather train the model.

  • And let's give it train it over 50 epochs

  • and have a callback when it's finished training.

  • Let's also add an option to save the data,

  • just in case I want to stop and start a bunch of times

  • and not collect the data again.

  • And I'm ready to go, except I missed something important.

  • I have emphasized before that when

  • working with neural networks, it's

  • important to normalize your data,

  • to take the data that you're using as inputs or outputs,

  • look at its range, and standardize it

  • to some specific range, typically between zero and one

  • or maybe between negative one and one.

  • And it is true that ML5 will do this for you.

  • I could just call normalized data.

  • But this is a nice opportunity to show that I can just

  • do the normalization myself.

  • For example, I know-- this is another reason

  • to make a separate array sort of.

  • I know that the range of any given pixel color

  • is between zero and 255.

  • So let me take the opportunity to just divide every R, G,

  • B value by 255 to squash it, to normalize it

  • between zero and one.

  • Let's see if this works.

  • I'm going to collect it.

  • So I'm going to press-- this is a little bit silly,

  • but I'm going to press H for me being

  • here in front of the camera.

  • Then I'm going to move off to the side,

  • and I'm going to use N for not being in front of the camera.

  • So I'm not here.

  • And I'm just going to do a little bit right now,

  • and then I'm going to hit T for train.

  • And loss function going crazy.

  • But eventually, it gets down.

  • It's a very small amount of data that I gave it to train.

  • But we can see that I'm getting a low loss function.

  • If I had built in the inference stage to the code,

  • it would start to guess Dan or no Dan.

  • So let's add that in.

  • When I'm finished training, then I'll start classifying.

  • The first thing I need to do if I'm going to classify the video

  • is pack all of those pixels into an input array again.

  • Then I can call classify on pixel brain

  • and add a function to receive the results.

  • Let's do something fun and have it say hi to me.

  • So I'm going to make this label a global variable with nothing

  • in it.

  • And then I'll say, label equals results label.

  • After I draw the pixels, let's either write hi or not

  • write hi.

  • So just to see that this works, let's make the label H

  • to start.

  • It says hi.

  • Now let's not make it H. And let's go

  • through the whole process.

  • Train the model.

  • And it says hi.

  • Oh, I forgot to classify the video again

  • after I get the results.

  • So it classified it only once.

  • And I want to then recursively continue

  • after I get the results to classify the video again.

  • Just so we can finish this out, I actually

  • saved all of the data I collected to a file

  • called data dot JSON.

  • And now I can say, pixel brain load data data dot JSON.

  • And when the data is loaded, then I can train the model.

  • So now I've eliminated the need to collect

  • the data every single time.

  • Let's run the sketch.

  • It's going to train the model.

  • I don't really even need to see this.

  • When it gets to the end, hi.

  • Hooray.

  • I'm pleased that that worked.

  • I probably shouldn't, but I just want

  • to try having three outputs.

  • So let's try something similar to what

  • I did in my previous videos using teachable machine

  • to train an image classifier.

  • And we'll look at this ukulele, coding train notebook,

  • and a Rubik's cube.

  • So let me collect a whole lot of data.

  • I'm going to press U for ukulele, R for Rubik's cube,

  • and N for notebook.

  • Save the date in case I need it later and train the model.

  • All right, so now ukulele, U, N for notebook.

  • And can we get an R?

  • I stood to the side when I was doing the Rubik's cube,

  • so that is pretty important.

  • So it's not working so well.

  • So that's not a surprise.

  • I don't expect it to work that well.

  • This is why I want to make another video that

  • covers how to take this very simplistic approach

  • and improve upon it by adding something

  • called a convolutional layer.

  • So what is a convolution?

  • What are the elements of a convolutional layer?

  • How do I add one with the ML5 library?

  • That's what I'm going to start looking at in the next section

  • of videos.

  • But before I go, I can't resist just

  • doing one more thing because I really

  • want to look at and demonstrate to you what happens if you

  • change from using pixel input to perform a classification

  • to a regression.

  • So I took code from my previous examples that just demonstrated

  • how ML5 in regression works, and I

  • changed the task to regression.

  • I had to lower the learning rate.

  • Thank you to the live chat who helped me figure this

  • out after like over an hour of debugging.

  • I had to lower the learning rate to get this to work.

  • I trained the model with me standing

  • in different positions associated

  • with a different frequency that P5 sound library played.

  • And you can see some examples of me training it over here.

  • And now, I am going to run it and see if it works,

  • and that'll be the end of this video.

  • So I had saved the data.

  • And now it's training the model.

  • And as soon as it finishes training,

  • you'll be able to hear.

  • All right, so I will leave that to you as an exercise.

  • I'll obviously include the link to the code

  • for this in the video's description

  • on the web page on the codingtrain.com

  • with this particular video.

  • I can come back and implement it.

  • You can go find the link to a Livestream

  • where I spend over an hour implementing it.

  • But I'll leave that to you as an exercise.

  • See if you followed this video and have image classification

  • working, can you change it to a regression

  • and have it control something with continuous output?

  • OK, if you made it this far, [KISSING NOISE] thank you.

  • And I will be back and start to talk

  • about convolutional neural networks, what

  • they mean in the next video.

  • [MUSIC PLAYING]

And you thought we were done with the ML5 neural network

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

ml5.js: ピクセルを入力としてニューラルネットワークを学習する (ml5.js: Train a Neural Network with Pixels as Input)

  • 2 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語