字幕表 動画を再生する 英語字幕をプリント And you thought we were done with the ML5 neural network tutorials. But no. There is one more because I am leading to something. I am going to-- you will soon see in this playlist a section on convolutional neural networks. But before I get to convolutional neural networks, I want to look at reasons why a convolutional layer. I have to answer this question like, what is a convolution? I've got to get to that. But before I get to that, I want to just see why they exist in the first place. So I want to start with another scenario for training your own neural network. That scenario is an image classifier. Now you might rightfully be sitting there saying to yourself, you've done videos on image classifiers before. And in fact, I have. The very beginning of this whole series was about using a pre-trained model for an image classifier. And guess what? That pre-trained model had convolutional layers in it. So I want to now take the time to unpack what that means more and look at how you could train your own convolutional neural network. Again, first though, let's just think about how we would make an image classifier with what we have so far. We have an image. And that image is being sent into an ML5 neural network. And out of that neural network comes either a classification or regression. And in fact, we could do an image regression. And I would love to do that. But let me start with a classifier because I think it's a lot simpler to think about and consider. So maybe it comes out with one of two things, either a cat or a dog and some type of confidence score. I previously zoomed in on the ML5 neural network and looked at what's inside, right? We have this hidden layer with some number of units and an output layer, which, in this case, would have just two if there's two classes. Everything is connected, and then there are the inputs. With post net, you might recall, there were 34 inputs because there were 17 points on my body, each with an xy position. What are these? Let's just say, for the sake of argument, that this image is 10 by 10 pixels. So I could consider every single pixel to be an individual input into this ML5 neural network. But each pixel has three channels, and R, G, and B. So that would make 100 times three inputs, 300 inputs. That's reasonable. So this is actually what I want to implement. Take the idea of a two layer neural network to perform classification, the same thing I've done in previous videos, but, this time, use as the input the actual raw pixels. Can we get meaningful results from just doing that? After we do that, I want to return back to here and talk about why this is inadequate or not going to say inadequate but how this can be improved on by adding another layer. So this layer won't-- sorry. The inputs will still be there. We're always going to have the inputs. The hidden layer will still be there. And the output layer will still be there. But I want to insert right in here something called a convolutional layer. And I want to do a two dimensional convolutional layer. So I will come back. If you want to just skip to that next video, if and when it exists, that's when I will start talking about that. But let's just get this working as a frame of reference. I'm going to start with some prewritten code. All this does, it's a simple P5JS sketch that opens a connection to the web cam, resizes it to 10 by 10 pixels, and then draws a rectangle in the canvas for each and every pixel. So this could be unfamiliar to you. How do you look at an image in JavaScript in P5 and address every single pixel individually? If that's unfamiliar to you, I would refer to my video on that topic. That's appearing over next to me right now. If you go take a look at that and then come back here. But really, this is just looking at every x and y position, getting the R, G, B values, filling a rectangle, and drawing it. So what I want to do next is think about, how do I configure this ML5 neural network, which expects that 10 by 10 image as its input? I'm going to make a variable called pixel brain. And pixel brain will be a new ML5 neural network. I should have mentioned that you could find the link to the code that I'm starting with, in case you wanted to code along with me, both the finished code and the code I'm starting with will be in this video's description. So to create a neural network, I call the neural network function and give it a set of options. One thing I should mention is while in all the videos I've done so far, I've said that you need to specify the number of inputs and the number of outputs to configure your neural network. The truth is ML5 is set up to infer the total number of inputs and outputs based on the data you're training it with. But to be really explicit about things and make the tutorial as clear as possible, I'm going to write those into the options. So how many inputs? Think about that for a second. The number of columns times the number of the rows times R, G, B. Maybe I would have a grayscale image. Maybe I could just make it I don't need a separate input for R, G, and B. But let's do that. Why not? I have the 10 by 10 in a variable called video size. So let's make that video size times video size times three. Let's just make a really simple classifier that's like I'm here or not here. So I'm going to make that two. The task is classification. And I want to see debugging when I train the model. Now I have my pixel brain, my neural network. Oops. That should be three. Let's go with my usual typical, terrible interface, meaning no interface. And I'm just going to train the model based on when I press keys on the keyboard. So I'll add a key press function. And then let me just a little goofy here, which I'm just going to say when I press the key, add example key. So I need a new function called add example. Label. So basically, I'm going to make the key that I press the label. So I'm going to press a bunch of keys when I'm standing in front the camera and then press a different key when I'm not standing in front of the camera. Now comes the harder work. I need to figure out how to make an array of inputs out of all of the pixels. Luckily for me, this is something that I have done before. And in fact, I actually have some code that I could pull from right in here, which is looking at how to go through all the pixels to draw them. But here's the thing. I am going to do something to flatten the data. I am not going to keep the data in its original columns and rows orientation. I'm going to take the pixels and flatten them out into one single array. Guess what? This is actually the problem that convolutional neural networks will address. It's bad to flatten the data because its spatial arrangement is meaningful. I'll start by creating an empty array called inputs. Then I'll loop through all of the pixels. And to be safe, I should probably say video dot load pixels. The pixels may already be loaded because I'm doing that for down here. And I could do something where if I'm drawing them, I might as well create the data here. But I'm going to be redundant about it. And I'm going to say-- ah, but this is weird. Here's the weird thing. I thought I wasn't going to talk about the pixel array in this video and just refer you to the previous one. But I can't escape it right now. For every single pixel in an image in P5JS, there are four spots in the array, a red value, a green value, a blue value, and an alpha value. Alpha value for transparency. The alpha value, I can ignore because it's going to be 255 for everything. There's no transparency. If I wanted to learn transparency, I could make that an input and have 10 by 10 times 4. But I don't need to do that here. So in other words, pixel zero starts here, 0, 1, 2, 3. And the second pixel starts at index four. So as I'm iterating over all of the pixels, I want to move through the array four spaces at a time. There's a variety of ways I could approach this, but that's going to make things easiest for me. So that means right over here, this should be plus equals four. Then I can say the red value is video dot pixels index I. The green value is at I plus one. And the blue value is at I plus two. And just to be consistent, I'm going to just put a plus zero in there so everything lines up nicely. So that's the R, G, and B values. Then I want those R, G, and B values for this particular pixel to go in the inputs array. The chat is making a very good point, which is that I have all of the stuff in an array already. And all I'm really doing is making a slightly smaller array that's removing every fourth element. I could do that with the filter function or some kind of higher order function or maybe just use the original array. I'm not really sure why I'm doing it this way. But I'm going to emphasize this data preparation step. So I look forward to hearing your comments about and maybe reimplementations of this that just use the pixel array directly. But I'm going to keep it this way for right now. So I'm taking the R, G, and B and putting them all into my new array. Then the target is just the label, a single label in an array. And I can now add this as training data, pixel brain add data inputs target. Let's console log something just to see that this is working. So I'm going to console log the inputs. And let's also console log the target, just to see that something is coming out. So, a, yeah. We can see there's an array there. And there's the a. And now if I do b, I'm getting a different array with b there. So I'm going to assume this is working. I could say inputs dot length to make sure that that's the right idea. Yeah. It's got 300 things in it. OK. Next step is to train the model. So I'm going to say, if the key pressed is T, don't add an example but rather train the model. And let's give it train it over 50 epochs and have a callback when it's finished training. Let's also add an option to save the data, just in case I want to stop and start a bunch of times and not collect the data again. And I'm ready to go, except I missed something important. I have emphasized before that when working with neural networks, it's important to normalize your data, to take the data that you're using as inputs or outputs, look at its range, and standardize it to some specific range, typically between zero and one or maybe between negative one and one. And it is true that ML5 will do this for you. I could just call normalized data. But this is a nice opportunity to show that I can just do the normalization myself. For example, I know-- this is another reason to make a separate array sort of. I know that the range of any given pixel color is between zero and 255. So let me take the opportunity to just divide every R, G, B value by 255 to squash it, to normalize it between zero and one. Let's see if this works. I'm going to collect it. So I'm going to press-- this is a little bit silly, but I'm going to press H for me being here in front of the camera. Then I'm going to move off to the side, and I'm going to use N for not being in front of the camera. So I'm not here. And I'm just going to do a little bit right now, and then I'm going to hit T for train. And loss function going crazy. But eventually, it gets down. It's a very small amount of data that I gave it to train. But we can see that I'm getting a low loss function. If I had built in the inference stage to the code, it would start to guess Dan or no Dan. So let's add that in. When I'm finished training, then I'll start classifying. The first thing I need to do if I'm going to classify the video is pack all of those pixels into an input array again. Then I can call classify on pixel brain and add a function to receive the results. Let's do something fun and have it say hi to me. So I'm going to make this label a global variable with nothing in it. And then I'll say, label equals results label. After I draw the pixels, let's either write hi or not write hi. So just to see that this works, let's make the label H to start. It says hi. Now let's not make it H. And let's go through the whole process. Train the model. And it says hi. Oh, I forgot to classify the video again after I get the results. So it classified it only once. And I want to then recursively continue after I get the results to classify the video again. Just so we can finish this out, I actually saved all of the data I collected to a file called data dot JSON. And now I can say, pixel brain load data data dot JSON. And when the data is loaded, then I can train the model. So now I've eliminated the need to collect the data every single time. Let's run the sketch. It's going to train the model. I don't really even need to see this. When it gets to the end, hi. Hooray. I'm pleased that that worked. I probably shouldn't, but I just want to try having three outputs. So let's try something similar to what I did in my previous videos using teachable machine to train an image classifier. And we'll look at this ukulele, coding train notebook, and a Rubik's cube. So let me collect a whole lot of data. I'm going to press U for ukulele, R for Rubik's cube, and N for notebook. Save the date in case I need it later and train the model. All right, so now ukulele, U, N for notebook. And can we get an R? I stood to the side when I was doing the Rubik's cube, so that is pretty important. So it's not working so well. So that's not a surprise. I don't expect it to work that well. This is why I want to make another video that covers how to take this very simplistic approach and improve upon it by adding something called a convolutional layer. So what is a convolution? What are the elements of a convolutional layer? How do I add one with the ML5 library? That's what I'm going to start looking at in the next section of videos. But before I go, I can't resist just doing one more thing because I really want to look at and demonstrate to you what happens if you change from using pixel input to perform a classification to a regression. So I took code from my previous examples that just demonstrated how ML5 in regression works, and I changed the task to regression. I had to lower the learning rate. Thank you to the live chat who helped me figure this out after like over an hour of debugging. I had to lower the learning rate to get this to work. I trained the model with me standing in different positions associated with a different frequency that P5 sound library played. And you can see some examples of me training it over here. And now, I am going to run it and see if it works, and that'll be the end of this video. So I had saved the data. And now it's training the model. And as soon as it finishes training, you'll be able to hear. All right, so I will leave that to you as an exercise. I'll obviously include the link to the code for this in the video's description on the web page on the codingtrain.com with this particular video. I can come back and implement it. You can go find the link to a Livestream where I spend over an hour implementing it. But I'll leave that to you as an exercise. See if you followed this video and have image classification working, can you change it to a regression and have it control something with continuous output? OK, if you made it this far, [KISSING NOISE] thank you. And I will be back and start to talk about convolutional neural networks, what they mean in the next video. [MUSIC PLAYING]
B1 中級 ml5.js: ピクセルを入力としてニューラルネットワークを学習する (ml5.js: Train a Neural Network with Pixels as Input) 2 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語