字幕表 動画を再生する 英語字幕をプリント So today, I thought we talk about generative adversarial networks because they're really cool, and they've They can do a lot of really cool things people have used them for all kinds of things Things like you know you draw a sketch of a shoe And it will render you an actual picture of a shoe or a handbag They're fairly low-resolution right now, but it's very impressive the way that they can produce real quite good-looking images You could make a neural network That's a classifier right you give it lots and lots of pictures of cats and lots and lots of pictures of dogs and you say you know you present it with a picture of a cat and It says it outputs a number. Let's say between zero and one and Zero represents cats and one represents dogs and so you give it a cat and it puts out one and you say no That's not right should be zero and you keep training it until eventually it can tell the difference right? so somewhere inside that Network It's... it must have formed some model of what cats are and what dogs are, at least as far as images of images of them are concerned But That model really... you can only really use it to classify things You can't say "ok draw me a new cat picture", "draw me a cat picture I haven't seen before" It doesn't know how to do that so quite often you want a model that can generate new Samples you have so you give it a bunch of samples from a particular distribution, and you want it to Give you more samples which are also from that same distribution, so it has to learn the underlying Structure of what you've given it. And that's kind of tricky, actually. There's a lot of... Well there's a lot of challenges involved in that. Well, let's be honest I don't think as a human you can find that tricky You know if... if I know what a cat looks like but, uh, being not the greatest artist in the world I'm not sure that I could draw you a decent cat. So, you know, that this is not confined to just Computing is it? This... Yeah, that's true. That's really true. but if you take Let's do like a really simple example of a generative model say you you give your network one thing It looks like this. And then you give it another one you're like these are your training samples looks like this You give it another one that looks like this, and then... What are those dots in the systems? Instances of something on two dimensions? Yeah, I mean right now, it's literally just data. We just... it doesn't matter what it is Just some... yeah, these are these are data points And so these are the things you're giving it, and then it will learn You can train it. It will learn a model, and the model it might learn is something like this, right? It's figured out that these dots all lie along a path, and if its model was always to draw a line Then it could learn by adjusting the parameters of that line It would move the line around until it found a line that was a good fit, and generally gave you a good prediction. But then if you were to ask this model: "Okay, now make me a new one" unless you did something clever, what you get is probably this, because that is on average The closest to any of these, because any of these dots you don't know if they're going to be above or below or, you know, to the left or the right. There's no pattern there. It's kind of random. So the best place you can go that will minimize your error, is to go just right on the line every time. But anybody looking at this will say: "well, that's fake" That's not a plausible example of something from this distribution, even though for a lot of the like, error functions, that people use when training networks this would perform best, so it's this interesting situation where There's not just one right answer. you know, generally speaking the way that neuron networks work is: you're training them towards a specific you have a label or you have a you have an output a target output and You get penalty the further away you are from that output, whereas in in a in an application like this There's effect... there's basically an infinite number of perfectly valid Outputs here But, so, to generate this what you actually need is to take this model and then apply some randomness, you say: "they're all Within, you know, They occur randomly and they're normally distributed around this line with this standard deviation" or whatever. But a lot of models would have a hard time actually picking one of all of the possibilities And they would have this tendency to kind of smooth things out and go for the average, whereas we actually just want "Just pick me one doesn't matter". So that's part of the problem of generating. Adversarial training is is help is a way of training Not just networks, actually, a way of training machine learning systems. Which involves focusing on the system's weaknesses. So, if you are learning... let's say you're teaching your Network to recognize handwritten digits. The normal way you would do that you have your big training sample of labeled samples You've got an array of pixels that looks like a three and then it's labeled with three and so on. And the normal way that you would train a network with this is you would just Present all of them pretty much at random. You'd present as many ones as two as threes and just keep throwing examples at it "What's this?", you know, "Yes, you got that right", "no. You've got that wrong, It should really be this". And keep doing that and the system will eventually learn but If you were actually teaching a person to recognize the numbers, if you were teaching a child you wouldn't do that, like, if you'd been teaching them for a while, presenting them and You know, getting the response and correcting them and so on, and you noticed that they can do... you know... with 2 3 4 5 6 8 & 9 they're getting like 70 80 percent You know, accuracy recognition rate. But 1 & 7 it's like 50/50, because any time they get a 1 or a 7 they just guess because they can't Tell the difference between them. If you noticed that you wouldn't keep training those other numbers, right? You would stop and say: "Well, You know what? we're just gonna focus on 1 & 7 because this is an issue for you". "I'm gonna keep showing you Ones and 7s and correcting you until The error rate on ones and 7s comes down to the error rate that you're getting on your other numbers". You're focusing the training on the area where the student is failing and there's kinda of a balance there when you're teaching humans because if you keep relentlessly focusing on their weaknesses and making them do stuff they can't do all the time They will just become super discouraged and give up. But neural networks don't have feelings yet, so that's really not an issue. You can just continually hammer on the weak points Find whatever they're having trouble with and focus on that. And so, that behavior, and I think some people have had teachers where it feels like this, It feels like an adversary, right? it feels like they want you to fail. So in fact you can make them an actual adversary. If you have some process which is genuinely Doing its best to make the network give as high an error as possible that will produce this effect where if it spots any weakness it will focus on that and Thereby force the learner To learn to not have that weakness anymore. Like one form of adversarial training people sometimes Do is if you have a game playing program you make it play itself a lot of times Because all the time. They are trying to look for weaknesses in their opponent and exploit those weaknesses and when they do that They're forced to then improve or fix those weaknesses in themselves because their opponent is exploiting those weaknesses, so Every time the Every time the system finds a strategy that is extremely good against this opponent The the opponent, who's also them, has to learn a way of dealing with that strategy. And so on and so on. So, as the system gets better it forces itself to get better Because it's continuously having to learn how to play a better and better opponent It's quite elegant, you know. This is where we get to generative adversarial. Networks. Let's say You've got a network you want to... Let's say you want cat pictures You know, you want to be able to give it a bunch of pictures of cats and have it Spit out a new picture of a cat that you've never seen before that looks exactly like a cat the way that the generative adversarial network works is it's this architecture where you actually have two networks one of the networks is the discriminator How's my spelling? Yeah, like that The discriminator Network is a classifier right it's a straightforward classifier You give it an image And it outputs a number between 0 & 1 and your training that in standard supervised learning way Then you have a generator and the generator Is... Usually a convolutional neural network, although actually both of these can be other processes But people tend to use in your networks for this. And the generator, you give it some random noise, and that's the random, that's where it gets its source of randomness, so That it can give multiple answers to the same question effectively. You give it some random noise and it generates an image From that noise and the idea is it's supposed to look like a cat So the way that we do this with a generative adversarial Network is it's this architecture whereby you have two networks Playing a game Effectively it's a competitive game. It's adversarial between them and in fact It's a very similar to the games we talked about in the Alpha go video. it's a min/max game Because these two networks are fighting over one number one of them wants the number to be high one of them wants the number to be low. And what that number actually is is the error rate of the discriminator? so The discriminator Wants a low error rate the generator wants a high error rate the discriminators job is to look at an image which could have come from the original data set or It could have come from the generator and its job is to say yes. This is a real image or no. This is a fake any outputs a number between 0 & 1 like 1 for its real and 0 for its fake for example and the generator Gets fed as its input. Just some random noise and it then generates an image from that and it's Reward you know it's training is Pretty much the inverse of what the discriminator says for that image so if it produces an image Which the discriminator can immediately tell this fake? It gets a negative reward you know it's a That's it's trained not to do that if it manages to produce an image that the discriminator Can't tell is fake Then that's really good so you train them in a inner cycle effectively you you give the discriminator a real image get its output, then you generate a fake image and get the discriminator that and Then you give it a real so the discriminator gets alternating real image fake image real image fake image usually I mean there are things you can do where you Train them at different rates and whatever but by default they're generally to get any help with this at all, or is it purely Yes, so if you this is this is like part of what makes this especially clever actually the generator does get help because if You set up the networks right you can use the gradient of the discriminator to train the generator so when I Know you done back propagation before about how neural networks are trained its gradient descent right and in fact we talked about this in like 2014 sure if you were a You're a blind person climbing a mountain or you're it's really foggy, and you're climbing a mountain you can only see directly What's underneath your own feet? You can still climb that mountain if you just follow the gradient you just look directly under me which way is the You know which way is the ground sloping? This is what we did the hill climb algorithm exactly Yeah, sometimes people call it hill climbing sometimes people call it gradient descent It's the same metaphor Upside down effectively if we're climbing up or we're climbing down you're training them by gradient descent, which means that You're not just you're not just able to say Yes, that's good. No. That's bad You're actually able to say and you should adjust yours you should adjust your weights in this direction so that you'll move down the gradient right So generally you're trying to move down the gradient of error for the network If you're like if you're training if your training the thing to just recognize cats and dogs you're just moving it You're moving it down the gradient towards the correct label whereas in this case The generator is being moved sort of up the gradient for the discriminators error So it can find out not just you did well you did badly But here's how to tweak your weights so that you will so that the discriminator would have been more wrong So so that you can confuse the discriminator more so you can think of this whole thing? An analogy people sometimes use is like a a forger and An expert investigator person right at the beginning, you know let's assume There's one forger in there's one investigator and all of the art buyers of the world are idiots at the beginning the the Level of the the quality of the forgeries is going to be quite low right the guy Just go get some paint, and he he then he just writes you know Picasso on it And he can sell it for a lot of money and the investigator comes along and says yeah I do I don't know that's right or maybe it is. I'm not sure I haven't really figured it out And then as time goes on the investigator who's the discriminator will? Start to spot certain things that are different between the things that the forger produces and real paintings And then they'll start to be able to reliably spot. Oh, this is a fake You know this uses the wrong type of paint or whatever So it's fake and once that happens the forger is forced to get better right you can't sell his fakes anymore He has to find that kind of paint So he goes and you know Digs up Egyptian mummies or whatever to get the legit paint and now he can forge again and now of the discriminator the investigator is fooled and They have to find a new thing That distinguishes the real from the fakes and so on and so on in a cycle they force each other to improve And it's the same thing here So at the beginning the generator is making just random noise basically because it's it's it's getting random noisy And it's doing something to it who knows what and it spits out an image and the discriminator goes that looks nothing like a cat you know and then eventually because the discriminator is also not very smart at the beginning right and And they just they both get better and better The generator gets better at producing cat looking things and the discriminator gets better and better at identifying them until eventually in principle if you run this for long enough theoretically you end up with a situation where the Generator is creating images that look exactly Indistinguishable from Images from the real data set and the discriminator if it's given a real image, or a fake image always outputs 0.5 5050 I Don't know could be either these things are literally indistinguishable, then you pretty much can throw away the discriminator And you've got a generator, which you give random noise to and it outputs brand-new Indistinguishable images of cats there's another cool thing about this Which is every every time we ask the generator to generate new image We're giving it some random data, right we give it just this vector of random numbers Which you can think of as being a randomly selected point in a space because you know if you give it If you give it ten random numbers you know between zero and one or whatever that is effectively a point in a 10 dimensional space and the thing that's cool is that as the generator learns It's forced to You if the generator is effectively making a mapping from that space into cat pictures This is called the lateness base by the way generally Any two nearby points in that latent space will when you put them through the generator produce similar cabbages you know similar pictures in general Which means sort of as you move Around if you sort of take that point and smoothly move it around the latent space you get a smooth léa varying picture of a cat and so the directions you can move in the space Actually end up corresponding to Something that we as humans might consider meaningful about cats so there's one you know there's one direction, and it's not necessarily one dimension of the space or whatever but And it's not necessarily linear or a straight line or anything But there will be a direction in that space which corresponds to How big the cat is in the frame for example or another dimension will be the color of the cat or? whatever so That's really cool, because it means that by Intuitively you think The fact that the generator can reliably produce a very large number of images of cats means it must have some like understanding understanding of What cats are right or at least what images of cats are And it's nice to see that it has actually Structured its latent space in this way that it's by looking at a huge number of pictures of cats it has actually extracted some of the structure of cat pictures in general In a way, which is meaningful when you look at it? So and that means you can do some really cool things, so one example was they trained Annette one of these systems on a really large Database of just face photographs and so it could generate arbitrarily large number of well as largest the input space a number of different faces and So they found that actually by doing basic arithmetic like just adding and subtracting vectors on the Latent space would actually produce meaningful changes in the image if you took a bunch of latent vectors, which when you give them to the generator produce pictures of men and a bunch of them that produce pictures of women and average those you get a point in your latent space which corresponds to a picture of a man or a picture of a woman which is not one of your input points, but it's sort of representative and Then you could do the same thing and say oh, I only want Give me the average point of all of the things that correspond to pictures of men wearing sunglasses right and Then if you take your sunglass vector, you're you're men wearing sunglasses vector Subtract the man vector and add the woman vector you get a point in your space And if you run that through the generator you get a woman wearing sunglasses right So doing doing basic vector arithmetic in your input space actually is? Meaningful in terms of images in a way that humans would recognize, which means that? There's there's a sense in which the generator really does Have an understanding of wearing sunglasses or not or being a man or being a woman Which is kind of an impressive result All the way along But it's not a truly random thing because if I know the key and I can start want to generate the same Yeah I'm so I mean that's about Unfortunate is the problem with cryptography is that we couldn't ever use truly random because we wouldn't be able to decrypt it again We have our message bits, which are you know naught 1 1 naught something different? And we XOR these together one bit at a time, and that's how we encrypt
B1 中級 生成的敵対者ネットワーク(GAN) - コンピュータマニア (Generative Adversarial Networks (GANs) - Computerphile) 5 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語