字幕表 動画を再生する 英語字幕をプリント ♪ (music) ♪ Hi, and welcome to episode three of Zero to Hero with TensorFlow. In the previous episode, you saw how to do basic computer vision using a deep neural network that matched the pixels of an image to a label. So an image like this was matched to a numeric label that represented it like this. But there was a limitation to that. The image you were looking at had to have the subject centered in it and it had to be the only thing in the image. So the code you wrote would work for that shoe, but what about these? It wouldn't be able to identify all of them because it's not trained to do so. For that we have to use something called a convolutional neural network, which works a little differently than what you've just seen. The idea behind a convolutional neural network is that you filter the images before training the deep neural network. After filtering the images, features within the images could then come to the forefront and you would then spot those features to identify something. A filter is simply a set of multipliers. So, for example, in this case, if you're looking at a particular pixel that has the value 192, and the filter is the values in the red box, then you multiply 192 by 4.5, and each of its neighbors by the respective filter value. So it's neighbor above and to the left is zero, so you multiply that by -1. Its upper neighbor is 64, so you multiply that by zero and so on. Sum up the result, and you get the new value for the pixel. Now this might seem a little odd, but check out the results for some filters like this one that when multiplied over the contents of the image, it removes almost everything except the vertical lines. And this one, that removes almost everything except the horizontal lines. This can then be combined with something called pooling, which groups up the pixels in the image and filters them down to a subset. So, for example, max pooling two by two will group the image into sets of 2x2 pixels and simply pick the largest. The image will be reduced to a quarter of its original size but the features can still be maintained. So the previous image after being filtered and then max pooled could look like this. The image on the right is one quarter the size of the one on the left, but the vertical line features were maintained and indeed they were enhanced. So where did these filters come from? That's the magic of a convolutional neural network. They're actually learned. They are just parameters like those in the neurons of a neural network that we saw in the last video. So as our image is fed into the convolutional layer, a number of randomly initialized filters will pass over the image. The results of these are fed into the next layer and matching is performed by the neural network. And over time, the filters that give us the image outputs that give the best matches will be learned and the process is called feature extraction. Here is an example of how a convolutional filter layer can help a computer visualize things. You can see across the top row here that you actually have a shoe, but it has been filtered down to the sole and the silhouette of a shoe by filters that learned what a shoe looks like. You'll run this code for yourself in just a few minutes. Now, let's take a look at the code to build a convolutional neural network like this. So this code is very similar to what you used earlier. We have a flattened input that's fed into a dense layer that in turn in fed into the final dense layer that is our output. The only difference here is that I haven't specified the input shape. That's because I'll put a convolutional layer on top of it like this. This layer takes the input so we specify the input shape, and we're telling it to generate 64 filters with this parameter. That is, it will generate 64 filters and multiply each of them across the image, then each epoch, it will figure out which filters gave the best signals to help match the images to their labels in much the same way it learned which parameters worked best in the dense layer. The max pooling to compress the image and enhance the features looks like this, and we can stack convolutional layers on top of each other to really break down the image and try to learn from very abstract features like this. With this methodology, your network starts to learn based on the features of the image instead of just the raw patterns of pixels. Two sleeves, it's a shirt. Two short sleeves, it's a t-shirt. Sole and laces, it's a shoe-- that type of thing. Now we are still looking at just the simple image as a fashion at the moment but the principles will extend into more complex images and you'll see that in the next video. But before going there try out the notebook to see convolutions for yourself. I've made a link to it in the description below. Before we get to the next video, don't forget to hit that subscribe button. Thank you. ♪ (music) ♪
B1 中級 畳み込みニューラルネットワークの導入(MLゼロからヒーローへ、その3 (Introducing convolutional neural networks (ML Zero to Hero, part 3)) 1 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語