字幕表 動画を再生する
So today, I thought we talk about generative adversarial networks because they're really cool, and they've
They can do a lot of really cool things people have used them for all kinds of things
Things like you know you draw a sketch of a shoe
And it will render you an actual picture of a shoe or a handbag
They're fairly low-resolution right now, but it's very impressive the way that they can produce
real quite good-looking images
You could make a neural network
That's a classifier right you give it lots and lots of pictures of cats and lots and lots of pictures of dogs
and you say you know you present it with a picture of a cat and
It says it outputs a number. Let's say between zero and one
and
Zero represents cats and one represents dogs and so you give it a cat and it puts out one and you say no
That's not right should be zero and you keep training it until eventually it can tell the difference right?
so
somewhere inside that
Network
It's... it must have formed some model of what cats are and what dogs are, at least as far as images of
images of them are concerned
But
That model really... you can only really use it to classify things
You can't say "ok draw me a new cat picture", "draw me a cat picture I haven't seen before"
It doesn't know how to do that so quite often you want a model that can generate new
Samples you have so you give it a bunch of samples from a particular distribution, and you want it to
Give you more samples which are also from that same distribution, so it has to learn the underlying
Structure of what you've given it. And that's kind of tricky, actually.
There's a lot of...
Well there's a lot of challenges involved in that.
Well, let's be honest
I don't think as a human you can find that tricky
You know if... if I know what a cat looks like but, uh, being not the greatest artist in the world
I'm not sure that I could draw you a decent cat. So, you know, that this is not confined to just
Computing is it? This...
Yeah, that's true. That's really true.
but if you take
Let's do like a really simple example of a generative model
say you you give your network one thing
It looks like this.
And then you give it another one you're like these are your training samples looks like this
You give it another one that looks like this, and then...
What are those dots in the systems?
Instances of something on two dimensions?
Yeah, I mean right now, it's literally just data. We just... it doesn't matter what it is
Just some... yeah, these are these are data points
And so these are the things you're giving it, and then it will learn
You can train it. It will learn a model, and the model it might learn is something like this, right?
It's figured out that these dots all lie along a path, and if its model was always to draw a line
Then it could learn by adjusting the parameters of that line
It would move the line around until it found a line that was a good fit, and generally gave you a good prediction.
But then if you were to ask this model:
"Okay, now make me a new one"
unless you did something clever, what you get is probably this, because that is on average
The closest to any of these, because any of these dots you don't know if they're going to be above or below
or, you know, to the left or the right. There's no pattern there. It's kind of random.
So the best place you can go that will minimize your error, is to go just right on the line every time.
But anybody looking at this will say: "well, that's fake"
That's not a plausible example of something from this distribution, even though for a lot of the
like, error functions, that people use when training networks this would perform best, so it's this interesting situation where
There's not just one right answer.
you know, generally speaking the way that neuron networks work is:
you're training them towards a specific you have a label or you have a
you have an output a target output and
You get penalty the further away you are from that output, whereas in in a in an application like this
There's effect... there's basically an infinite number of perfectly valid
Outputs here
But, so, to generate this what you actually need is to take this model and then apply some randomness, you say: "they're all
Within, you know,
They occur randomly and they're normally distributed around this line with this standard deviation" or whatever.
But a lot of models would have a hard time actually
picking one of all of the possibilities
And they would have this tendency to kind of smooth things out and go for the average, whereas we actually just want
"Just pick me one doesn't matter". So that's part of the problem of generating.
Adversarial training is is help is a way of
training
Not just networks, actually, a way of training machine learning systems.
Which
involves focusing on
the system's weaknesses.
So, if you are learning... let's say you're teaching your
Network to recognize handwritten digits.
The normal way you would do that you have your big training sample of labeled samples
You've got an array of pixels that looks like a three and then it's labeled with three and so on.
And the normal way
that you would train a network with this is you would just
Present all of them pretty much at random. You'd present as many ones as two as threes and just keep throwing examples at it
"What's this?", you know, "Yes, you got that right", "no. You've got that wrong, It should really be this".
And keep doing that and the system will eventually learn
but
If you were actually teaching a person to recognize the numbers, if you were teaching a child
you wouldn't do that, like, if you'd been teaching them for a while, presenting them and
You know, getting the response and correcting them and so on, and you noticed that they can do...
you know... with 2 3 4 5 6 8 & 9 they're getting like 70 80 percent
You know, accuracy recognition rate.
But 1 & 7 it's like 50/50, because any time they get a 1 or a 7 they just guess because they can't
Tell the difference between them.
If you noticed that you wouldn't keep training those other numbers, right? You would stop and say:
"Well, You know what? we're just gonna focus on 1 & 7 because this is an issue for you".
"I'm gonna keep showing you Ones and 7s and correcting you until
The error rate on ones and 7s comes down to the error rate that you're getting on your other numbers".
You're focusing the training on the area where the student is failing and
there's kinda of a balance there when you're teaching humans
because if you keep relentlessly focusing on their weaknesses and making them do stuff they can't do all the time
They will just become super discouraged and give up. But neural networks don't have feelings yet, so that's really not an issue.
You can just
continually hammer on the weak points
Find whatever they're having trouble with and focus on that. And so, that behavior,
and I think some people have had teachers where it feels like this,
It feels like an adversary, right? it feels like they want you to fail.
So in fact
you can make them an actual adversary. If you have some process which is genuinely
Doing its best to make the network give as high an error as possible
that will produce this effect where if it spots any weakness it will focus on that and
Thereby force the learner
To learn to not have that weakness anymore. Like one form of adversarial training people sometimes
Do is if you have a game playing program you make it play itself a lot of times
Because all the time. They are trying to look for weaknesses in their opponent and exploit those weaknesses and when they do that
They're forced to then improve or fix those weaknesses in themselves because their opponent is exploiting those weaknesses, so
Every time
the
Every time the system finds a strategy that is extremely good against this opponent
The the opponent, who's also them, has to learn a way of dealing with that strategy. And so on and so on.
So, as the system gets better it forces itself to get better
Because it's continuously having to learn how to play a better and better opponent
It's quite elegant, you know.
This is where we get to generative adversarial. Networks. Let's say
You've got a network you want to...
Let's say you want cat pictures
You know, you want to be able to give it a bunch of pictures of cats and have it
Spit out a new picture of a cat that you've never seen before that looks exactly like a cat
the way that the generative
adversarial network works is it's this architecture where you actually have two networks one of the networks is the discriminator
How's my spelling?
Yeah, like that
The discriminator Network is a classifier right it's a straightforward classifier
You give it an image
And it outputs a number between 0 & 1 and your training that in standard supervised learning way
Then you have a generator and the generator
Is...
Usually a convolutional neural network, although actually both of these can be other processes
But people tend to use in your networks for this.
And the generator, you
give it some random noise, and that's the random,
that's where it gets its source of randomness, so
That it can give multiple answers to the same question effectively.
You give it some random noise and it generates an image
From that noise and the idea is it's supposed to look like a cat
So the way that we do this with a generative adversarial Network is it's this architecture whereby you have two networks
Playing a game
Effectively it's a competitive game. It's adversarial between them and in fact
It's a very similar to the games we talked about in the Alpha go video.
it's a min/max game
Because these two networks are fighting over one number
one of them wants the number to be high one of them wants the number to be low.
And what that number actually is is the error rate of the discriminator?
so
The discriminator
Wants a low error rate the generator wants a high error rate the discriminators job is to look at an image
which could have come from the original data set or
It could have come from the generator and its job is to say yes. This is a real image or no. This is a fake
any outputs a number between 0 & 1 like 1 for its real and 0 for its fake for example and
the generator
Gets fed as its input. Just some random noise and it then generates an image from that and
it's
Reward you know it's training is
Pretty much the inverse of what the discriminator says
for that image so if it produces an image
Which the discriminator can immediately tell this fake? It gets a negative reward you know it's a
That's it's trained not to do that if it manages to produce an image that the discriminator
Can't tell is fake
Then that's really good so you train them in a inner cycle effectively you you give the discriminator a real image
get its output, then you generate a fake image and get the discriminator that and
Then you give it a real so the discriminator gets alternating real image fake image real image fake image
usually I mean there are things you can do where you
Train them at different rates and whatever but by default they're generally to get any help with this at all, or is it purely
Yes, so if you this is this is like part of what makes this especially clever actually
the generator does get help because
if
You set up the networks right you can use the gradient of the discriminator
to train the generator
so when I
Know you done back propagation before about how neural networks are trained its gradient descent right and in fact we talked about this in like
2014 sure if you were a
You're a blind person climbing a mountain or you're it's really foggy, and you're climbing a mountain you can only see directly
What's underneath your own feet?
You can still climb that mountain if you just follow the gradient you just look directly under me which way is the
You know which way is the ground sloping? This is what we did the hill climb algorithm exactly
Yeah, sometimes people call it hill climbing sometimes people call it gradient descent
It's the same
metaphor
Upside down effectively if we're climbing up or we're climbing down you're training them by gradient descent, which means that
You're not just you're not just able to say
Yes, that's good. No. That's bad
You're actually able to say and you should adjust yours you should adjust your weights in this direction so that you'll move down the gradient
right
So generally you're trying to move down the gradient of error for the network
If you're like if you're training if your training the thing to just recognize cats and dogs you're just moving it
You're moving it down the gradient towards the correct label whereas in this case
The generator is being moved
sort of up the gradient for the discriminators error
So it can find out not just you did well you did badly
But here's how to tweak your weights so that you will so that the discriminator would have been more wrong
So so that you can confuse the discriminator more so you can think of this whole thing?
An analogy people sometimes use is like a a forger and
An expert investigator person right at the beginning, you know let's assume
There's one forger in there's one investigator and all of the art
buyers of the world are idiots at the beginning the the
Level of the the quality of the forgeries is going to be quite low right the guy
Just go get some paint, and he he then he just writes you know Picasso on it
And he can sell it for a lot of money and the investigator comes along and says yeah
I do I don't know that's right or maybe it is. I'm not sure I haven't really figured it out
And then as time goes on the investigator who's the discriminator will?
Start to spot certain things that are different between the things that the forger produces and real paintings
And then they'll start to be able to reliably spot. Oh, this is a fake
You know this uses the wrong type of paint or whatever
So it's fake and once that happens the forger is forced to get better right you can't sell his fakes anymore
He has to find that kind of paint
So he goes and you know
Digs up Egyptian mummies or whatever to get the legit paint and now he can forge again
and now of the discriminator the investigator is fooled and
They have to find a new thing
That distinguishes the real from the fakes and so on and so on in a cycle they force each other to improve
And it's the same thing here
So at the beginning the generator is making just random noise basically because it's it's it's getting random noisy
And it's doing something to it who knows what and it spits out an image and the discriminator goes that looks nothing like a cat
you know
and
then
eventually because the discriminator is also not very smart at the beginning right and
And they just they both get better and better
The generator gets better at producing cat looking things and the discriminator gets better and better at identifying them
until eventually in principle if you run this for long enough theoretically you end up with a situation where the
Generator is creating images that look exactly
Indistinguishable from
Images from the real data set and the discriminator if it's given a real image, or a fake image always outputs 0.5
5050 I
Don't know could be either these things are literally indistinguishable, then you pretty much can throw away the discriminator
And you've got a generator, which you give random noise to and it outputs
brand-new
Indistinguishable images of cats there's another cool thing about this
Which is every every time we ask the generator to generate new image
We're giving it some random data, right we give it just this vector of random numbers
Which you can think of as being a randomly selected point in a space because you know if you give it
If you give it ten random numbers you know between zero and one or whatever that is effectively a point in a 10 dimensional space
and
the thing that's cool is that as
the generator learns
It's forced to
You if the generator is effectively making a mapping from that space into cat pictures
This is called the lateness base by the way generally
Any two nearby points in that latent space will when you put them through the generator produce similar
cabbages you know similar pictures in general
Which means sort of as you move
Around if you sort of take that point and smoothly move it around the latent space you get a smooth léa varying
picture of a cat and so the directions you can move in the space
Actually end up
corresponding to
Something that we as humans might consider meaningful about cats
so there's one you know there's one direction, and it's not necessarily one dimension of the space or whatever but
And it's not necessarily linear or a straight line or anything
But there will be a direction in that space which corresponds to
How big the cat is in the frame for example or another dimension will be the color of the cat or?
whatever
so
That's really cool, because it means that
by
Intuitively you think
The fact that the generator can reliably produce a very large number of images of cats
means it must have some like understanding understanding of
What cats are right or at least what images of cats are
And it's nice to see that it has actually
Structured its latent space in this way that it's by looking at a huge number of pictures of cats it has actually extracted
some of the structure of
cat pictures in general
In a way, which is meaningful when you look at it?
So and that means you can do some really cool things, so one example was they trained Annette one of these systems
on a really large
Database of just face photographs and so it could generate
arbitrarily large number of well as largest the input space a number of different faces and
So they found that actually by doing
basic
arithmetic like just adding and subtracting vectors on the
Latent space would actually produce meaningful changes in the image if you took a bunch of latent
vectors, which when you give them to the generator produce pictures of men and
a bunch of them that produce pictures of women and average those you get a point in your
latent space which
corresponds to
a picture of a man or a picture of a woman which is not one of your input points, but it's sort of representative and
Then you could do the same thing and say oh, I only want
Give me the average point of all of the things that correspond to pictures of men wearing sunglasses
right and
Then if you take your sunglass vector, you're you're men wearing sunglasses vector
Subtract the man vector and add the woman vector you get a point in your space
And if you run that through the generator you get a woman wearing sunglasses
right
So doing doing basic vector arithmetic in your input space actually is?
Meaningful in terms of images in a way that humans would recognize, which means that?
There's there's a sense in which the generator really does
Have an understanding of wearing sunglasses or not or being a man or being a woman
Which is kind of an impressive result
All the way along
But it's not a truly random thing because if I know the key and I can start want to generate the same
Yeah
I'm so I mean that's about
Unfortunate is the problem with cryptography is that we couldn't ever use truly random because we wouldn't be able to decrypt it again
We have our message bits, which are you know naught 1 1 naught something different?
And we XOR these together one bit at a time, and that's how we encrypt