字幕表 動画を再生する 英語字幕をプリント ♪ (music) ♪ Hi, everyone, and welcome to episode 2 of TensorFlow Zero to Hero. In the last episode, you learned about machine learning and how it works. You saw a simple example of matching numbers to each other and how, using Python code, a computer could learn through trial and error what the relationship between the numbers was. In this episode, you're going to take it a little further by teaching a computer how to see and recognize different objects. For example, look at these pictures. How many shoes do you see? You might say two, right? But how do you know they are shoes? Imagine if somebody had never seen shoes before. How would you tell them that despite the great difference between the high heel and the sports shoe, they're still both shoes. Maybe they would think if it's red, it's a shoe. Because all they've seen are these two, and they're both red. But, of course, it's not that simple. But how do you know that these two are shoes? Because, in your life, you've seen lots of shoes, and you've learned to understand what makes a shoe a shoe. So it follows logically that if we show a computer lots of shoes, it will be able to recognize what a shoe is. And that's where the dataset called Fashion MNIST is useful. It has 70,000 images in 10 different categories. So there's 7,000 examples of each category, including shoes. Hopefully, seeing 7,000 shoes is enough for a computer to learn what a shoe looks like. The images in Fashion MNIST are only 28x28 pixels. So they're pretty small. And the less data used, the faster it is for a computer to process it. That being said, they still lead to recognizable items of clothing. In this case, you can still see that it's a shoe. In the next few minutes, I'll show you the code that will teach you how to train a computer to recognize items of clothing based on this training data. The type of code you write is almost identical to what you did in the last video. That's part of the power of TensorFlow that allows you to design neural networks for a variety of tasks with a consistent programming API. We'll start by loading the data. The Fashion MNIST dataset is built into TensorFlow, so it's easy to load it with code like this. The training images is a set of 60,000 images, like our ankle boot here. The other 10,000 are a test set that we can use to check to see how well our neural network performs. We'll see them later. The label is a number indicating the class of that type of clothing. So, in this case, the number 09 indicates an ankle boot. Why do you think it would be a number and not just the text, "ankle boot"? There's two main reasons: first, computers deal better with numbers; but perhaps more importantly, there's the issue with bias. If we label it as "ankle boot," we're already showing a bias towards the English language. So by using a number, you can point to a text description in any language as shown here. Can you guess all of the languages that we used here? When looking at a neural network design, it's always good to explore the input values and the output values first. Here we can see that our neural network is a little more complex than the one in the first episode. Our first layer has the input of shape 28x28, which, if you remember, was the size of our image. Our last layer is 10, which, if you remember, is the number of different items of clothing represented in our dataset. So our neural network will kind of act like a filter, which takes in a 28x28 set of pixels and outputs 1 of 10 values. So what about this number, 128? What does that do? Well, think of it like this, we're going to have 128 functions, each one of which has parameters inside of it. Let's call these f0 through f127. What we want is that when the pixels of the shoe get fed into them, one by one, that the combination of all of these functions will output the correct value. In this case, 9. In order to do that, the computer will need to figure out the parameters inside of these functions to get that result. And it will then extend this to all of the other items of clothing in the dataset. The logic is, once it has done this, then it should be able to recognize items of clothing. So if you remember from the last video, there's the optimizer function and the loss function. The neural network will be initialized with random values. The loss function will then measure how good or how bad the results were, and then with the optimizer, it will generate new parameters for the functions to see if it can do better. You probably also wondered about these. And they're called activation functions. The first one is on the layer of 128 functions, and it's called relu, or rectified linear unit. What it really does is as simple as returning a value if it's greater than zero. So if that function had zero or less as output, it just gets filtered out. And softmax has the effect of picking the biggest number in a set. The output layer in this neural network has 10 items in it, representing the probability that we're looking at that specific item of clothing. So, in this case, it has a high probability that it's item 09, which is our ankle boot. So instead of searching through to find the largest, what softmax does is it sets it to 1 and the rest is 0, so all we have to do is find the 1. Training is then very simple: we fit the training images to the training labels. This time, we'll try it for just 5 epochs. Remember earlier we had 10,000 images and labels that we didn't train with? These are images that the model hasn't previously seen, so we can use them to test how well our model performs. We can do that test by passing them to the evaluate method, like this. And then, finally, we can get predictions back for new images by calling model.predict like this. And that's all it takes to teach a computer how to see and recognize images. You can try this out for yourself in the notebook that I've linked in the description below. Having gone through this, you've probably seen one drawback and that's the fact that the images are always 28x28 grayscale with the item of clothing centered. So what if it's just a normal photograph, and we want to recognize its contents, and you don't have the luxury of it being the only thing in the picture as well as being centered? That's where the process of spotting features becomes useful and the tool of convolutional neural networks is your friend. You'll learn all about that in the next video, so don't forget to hit that subscribe button, and I'll see you there. ♪ (music) ♪
A2 初級 MLを用いたコンピュータビジョンの基礎(MLゼロからヒーローまで、その2 (Basic Computer Vision with ML (ML Zero to Hero, part 2)) 1 1 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語