機械学習ゼロからヒーローへ（Google I/O'19 (Machine Learning Zero to Hero (Google I/O'19))

字幕表動画を再生する

[MUSIC PLAYING]
LAURENCE MORONEY: So the first question
that comes out, of course, is that whenever
you see machine learning or you hear about machine learning,
it seems to be like this magic wand.
Your boss says, put machine learning into your application.
Or if you hear about startups, they
put machine learning into their pitch somewhere.
And then suddenly, they become a viable company.
But what is machine learning?
What is it really all about?
And particularly for coders, what's machine
learning all about?
Actually, quick show of hands if any of you are coders.
Yeah, it's I/O. I guess pretty much all of us,
right, are coders.
I do talks like this all the time.
And sometimes, I'll ask how many people are coders,
and three or four hands show up.
So it's fun that we can geek out and show a lot of code today.
So I wanted to talk about what machine learning is
from a coding perspective by picking a scenario.
Can you imagine if you were writing a game
to play rock, paper, and scissors?
And you wanted to write something
so that you could move your hand as a rock, a paper,
or a scissors.
The computer would recognize that and be
able to play that with you.
Think about what that would be like to actually write code
for.
You'd have to pull in images from the camera,
and you'd have to start looking at the content of those images.
And how would you tell the difference
between a rock and a scissors?
Or how would you tell the difference
between a scissors and a paper?
That would end up being a lot of code
that you would have to write, and a lot
of really complicated code.
And not only the difference in shapes--
think about the difference in skin tones,
and male hands and female hands, large hands and small hands,
people with gnarly knuckles like me,
and people with nice smooth hands like Karmel.
So how is it that you would end up
being able to write all the code to do this?
It'd be really, really complicated and, ultimately,
not very feasible to write.
And this is where we start bringing
machine learning into it.
This is a very simple scenario, but you
can think about there are many scenarios where it's
really difficult to write code to do something,
and machine learning may help in that.
And I always like to think of machine learning in this way.
Think about traditional programming.
And in traditional programming, something that has been
our bread and butter for many years--
all of us here are coders--
what it is is that we think about expressing something
and expressing rules in a programming language,
like Java, or Kotlin, or Swift, or C++.
And those rules generally act on data.
And then out of that, we get answers.
Like in rock, paper, scissors, the data would be an image.
And my rules would be all my if-thens
looking at the pixels in that image to try and determine
if something is a rock, a paper, or a scissors.
Machine learning then turns this around.
It flips the axes on this.
And we say, hey, instead of doing it this way
where it's like we have to think about all of these rules,
and we have to write and express all of these rules in code,
what if we could provide a lot of answers,
and we could label those answers and then have a machine infer
the rules that maps one to the other?
So for example, in something like the rock, paper,
and scissors, we could say, these
are the pixels for a rock.
And this is what a rock looks like.
And we could get hundreds or thousands of images
of people doing a rock--
so we get diverse hands, diverse skin tones,
those kind of things.
And we say, hey, this is what a rock looks like.
This is what a paper looks like.
And this is what a scissors looks like.
And if a computer can then figure out
the patterns between these and can be taught
and it can learn what the patterns is between these,
now, we have machine learning.
Now, we have an application, and we
have a computer that has determined these things for us.
So if we take a look at this diagram again,
and if we look at this again and we replace
what we've been talking about by us creating rules, and we say,
OK, this is machine learning, we're to feed in answers,
we're going to feed in data, and the machine
is going to infer the rules--
what's that going to look like at runtime?
How can I then run an application
that looks like this?
So this is what we're going to call the training phase.
We've trained what's going to be called a model on this.
And that model is basically a neural network.
And I'm going to be talking a lot about neural networks
in the next few minutes.
But what that neural network is-- we're going to wrap that.
We're going to call that a model.
And then at runtime, we're going to pass in data,
and it's going to give us out something called predictions.
So for example, if I've trained it on lots of rocks,
lots of papers, and lots of scissors,
and then I'm going to hold my fist up to a webcam,
it's going to get the data of my fist.
And it's going to give back what we
like to call a prediction that'll
be something like, hey, there's an 80% chance that's a rock.
There's a 10% chance it's a paper and 10% chance
it's a scissor.
Something like that.
So a lot of the terminology of machine learning
is a little bit different from traditional programming.
We're calling it training, rather than
coding and compiling.
We're calling it inference, and we're getting
predictions out of inference.
So when you hear us using terms like that,
that's where it all comes from.
It's pretty similar to stuff that you've been doing already
with traditional coding.
It's just slightly different terminology.
So I'm going to kick off a demo now
where I'm going to train a model for rock, paper, and scissors.
The demo takes a few minutes to train,
so I'm just going to kick it off before I get back to things.
So I'm going to start it here.
And it's starting.
And as it starts to run, I just want to show something
as it goes through.
So if you can imagine a computer,
I'm going to give it a whole bunch of data of rock, paper,
and scissors, and I'm going to ask
it to see if it can figure out the rules for rock, paper,
and scissors.
So any one individual item of data
I give to it, there's a one in three
chance it gets it right first time.
If it was purely random, and I said, what is this,
there's a one in three chance it would get it correct as a rock.
So as I start training, that's one
of the things I want you to see here
is the accuracy that, the first time through this,
the accuracy was actually--
it was exactly 0.333.
Sometimes, when I run this demo, it's a little bit more.
But the idea is once it started training,
it's getting that random.
It's like, OK, I'm just throwing stuff at random.
I'm making guesses of this.
And it was, like, one in three right.
As we continue, we'll see that it's actually
getting more and more accurate.
The second time around, it's now 53% accurate.
And as it continues, it will get more and more accurate.
But I'm going to switch back to the slides
and explain what it's doing before we get back
to see that finish.
Can we go back to the slides, please?
OK.
So the code to be able to write something like this
looks like this.
This is a very simple piece of code
for creating a neural network.
And what I want you to focus on, first of all,
are these things that I've outlined in the red box.
So these are the input to the neural network and the output
coming from the neural network.
That's why I love talking about neural networks
at I/O, because I/O, Input/Output.
And you'll see I'll talk a lot about inputs and outputs
in this.
So the input to this is the size of the images.
All of the images that I'm going to feed
to the neural network of rocks, papers, and scissors
are 150 square, and they're a 3-byte color depth.
And that's why you see 150 by 150 by 3.
And then the output from this is going
to be three things, because we're
classifying for three different things--
a rock, a paper, or a scissors.
So always when you're looking at a neural network,
those are really the first things to look at.
What are my inputs?
What are my outputs?
What do they look like?
But then there's this mysterious thing
in the middle where we've created
this tf.keras.layers.Dense, and there's a 512 there.
And a lot of people wonder, well,
what are those 512 things?
Well, let me try and explain that visually.
So visually, what's going on is what those 512 things are
in the center of this diagram--
consider them to be 512 functions.
And those functions all have internal variables.
And those internal variables are just
going to be initialized with some random states.
But what we want to do is when we start
passing the pixels from the images into these,
we want them to try and figure out
what kind of output, based on those inputs,
will give me the desired output at the bottom?
So function 0 is going to grab all those pixels.
Function 1 is going to grab all those pixels.
Function 2 is going to grab all those pixels.
And if those pixels are the shape of a rock,
then we want the output of function 0, 1,
and 2 all the way up to 511 to be outputting
to the box on the left at the bottom-- to stick a 1
in that box.
And similarly for paper.
If we say, OK, when the pixels look like this,
we want your outputs of F0, F1, and F2 to go to this box.
And that's the process of learning.
So all that's happening-- all that learning
is when we talk about machine learning,
is setting those internal variables in those functions
so we get that desired output.
Now, those internal variables, just
to confuse things a little bit more,
in machine learning parlance, tend to be called parameters.
And so for me, as a programmer, it was hard at first
to understand that.
Because for me, parameters are something
I pass into a function.
But in this case, when you hear a machine learning person talk
about parameters, those are the values
inside those functions that are going to get set
and going to get changed as it tries
to learn how I'm going to match those inputs to those outputs.
So if I go back to the code and try
to show this again in action--
now, remember, my input shape that I spoke about earlier on,
the 150 by 150 by 3, those are the pixels
that I showed in the preview.
I'm simulating them here with gray boxes,
but those are the pixels that I showed
in the previous diagrams.
My functions, now, is that dense layer in the middle, those 512.
So that's 512 functions randomly initialized
or semi-randomly initialized that I'm
going to try to train to match my inputs to my outputs.
And then, of course, the bottom-- those three
are the three neurons that are going to be my outputs.
And I've just said the word neuron for the first time.
But ultimately, when we talk about neurons
and neural networks, it's not really anything
to do with the human brain.
It's a very rough simulation of how
the human brain does things.
And these internal functions that try and figure
out how to match the inputs to the outputs,
we call those neurons.
And on my output, those three at the bottom
are also going to be neurons too.
And that's what lends the name "neural networks" to this.
It tends to sound a little bit mysterious and special
when we call it like that.
But ultimately, just think about them
as functions with randomly initialize variables
that, over time, are going to try
to change the value of those variables
so that the inputs match our desired outputs.
So then there's this line, the model.compile line.
And what's that going to do?
That's a kind of fancy term.
It's not really doing compilation
where we're turning code into bytecode as before.
But think about the two parameters to this.
And these are the most important part
to learn in machine learning-- and these
are the loss and the optimizer.
So the idea is the job of these two
is-- remember, earlier on, I said
it's going to randomly initialize all those functions.
And if they're randomly initialized
and I pass in something that looks like a rock,
there's a one in three chance it's
going to get it right as a rock, or a paper, or scissors.
So what the Loss function does is
it measures the results of all the thousands of times
I do that.
It figures out how well or how badly it did.
And then based on that, it passes that data
to the other function, which is called the Optimizer function.
And the Optimizer function then generates the next guess
where the guess is set to be the parameters of those 512
little functions, those 512 neurons.
And if we keep repeating this, we'll pass our data in.
We'll take a look.
We'll make a guess.
We'll see how well or how badly we did.
Then based on that, we'll optimize,
and we'll make another guess.
And we'll repeat, repeat, repeat,
until our guesses get better, and better, and better.
And that's what happens in the model.fit.
Here, you can see I have model.fit where epochs--
"ee-pocks," "epics," depending on how you pronounce it--
is 100.
All it's doing is doing that cycle 100 times.
For every image, take a look at the parameters.
fit those parameters.
Take a guess.
Measure how good or how bad you did,
and then repeat and keep going.
And the optimizer then will make it
better and better and better.
So you can imagine the first time through,
you're going to get it right roughly one in three times.
Subsequent times, it's going to get
closer, and closer, and closer, and better, and better.
Now, those of us who know a little bit about images
and image processing go, OK, that's nice,
but it's a little naive.
I'm just throwing all of the pixels of the image--
and maybe a lot of these pixels aren't even set--
into a neural network and having it
try to figure out from those pixel values.
Can I do it a little bit smarter than that?
And the answer to that is, yes.
And one of the ways that we can do it
a little bit smarter than that is using something
called convolutions.
Now, convolutions is a convoluted term,
if you'll excuse the pun.
But the idea behind convolutions is
if you've ever done any kind of image processing,
the way you can sharpen images or soften images
with things like Photoshop, it's exactly the same thing.
So with a convolution, the idea is you take a look
at every pixel in the image.
So for example, this picture of a hand, and I'm
just looking at one of the pixels on the fingernail.
And so that pixel is value 192 in the box on the left here.
So if you take a look at every pixel in the image
and you look at its immediate neighbors,
and then you get something called a filter, which
is the gray box on the right.
And you multiply out the value of the pixel
by the corresponding value in the filter.
And you do that for all of the pixel's neighbors
to get a new value for the pixel.
That's what a convolution is.
Now, many of us, if you've never done this
before, you might be sitting around thinking, why on earth
would I do that?
Well, the reason for that is that when finding convolutions
and finding filters, it becomes really,
really good at extracting features in an image.
So let me give an example.
So if you look at the image on the left here
and I apply a filter like this one,
I will get the image on the right.
Now, what has happened here is that the image
on the left, I've thrown away a lot of the noise in the image.
And I've been able to detect vertical lines.
So just simply by applying a filter like this,
vertical lines are surviving through the multiplication
of the filter.
And then similarly, if I apply a filter like this one,
horizontal lines survive.
And there are lots of filters out there
that can be randomly initialized and that
can be learned that do things like picking out
items in an image, like eyes, or ears, or fingers,
or fingernails, and things like that.
So that's the idea behind convolutions.
Now, the next thing is, OK, if I'm
going to be doing lots of processing
on my image like this, and I'm going to be doing training,
and I'm going to have to have hundreds of filters
to try and pick out different features in my image, that's
going to be a lot of data that I have to deal with.
And wouldn't it be nice if I could compress my images?
So compression is achieved through something
called pooling.
And it's a very, very simple thing.
Sometimes, it seems a very complex term
to describe something simple.
But when we talk about pooling, I'm
going to apply, for example, a 2 x 2 pool to an image.
And what that's going to do is it's going
to take the pixels 2 x 2--
like if you look at my left here,
if I've got 16 simulated pixels--
I'm going to take the top four in the top left-hand corner.
And of those four, I'm going to pick the biggest value.
And then the next four on the top right-hand corner,
of those four, I'll pick the biggest value,
and so on, and so on.
So what that's going to do is effectively throw away
75% of my pixels and just keep the maximums in each of these 2
x 2 little units.
But the impact of that is really interesting
when we start combining it with convolutions.
So if you look at the image that I created earlier
on where I applied the filter to that image of a person
walking up the stairs, and then I pool that,
I get the image that's on the right, which is 1/4
the size of the original image.
But not only is it not losing any vital information,
it's even enhancing some of the vital information that
came out of it.
So pooling is your friend when you start using convolutions
because, if you have 128 filters, for example,
that you apply to your image, you're
going to have 128 copies of your image.
You're going to have 128 times the data.
And when you're dealing with thousands of images,
that's going to slow down your training time really fast.
But pooling, then, really speeds it up
by shrinking the size of your image.
So now, when we want to start learning with a neural network,
now it's a case of, hey, I've got my image at the top,
I can start applying convolutions to that.
Like for example, my image might be a smiley face.
And one convolution will keep it as a smiley face.
Another one might keep the circle outline of a head.
Another one might kind of change the shape
of the head, things like that.
And as I start applying more and more convolutions to these
and getting smaller and smaller images, instead of me now
having a big, fat image that I'm trying to classify,
that I'm trying to pick out the features of to learn from,
I can have lots of little images highlighting features in that.
So for example, in rock, paper, scissors,
my convolutions might show, in some cases, five fingers,
or four fingers and a thumb.
And I know that that's going to be a paper.
Or it might show none, and I know that's going to be a rock.
And it then begins to make the process of the machine learning
these much, much simpler.
So to show this quickly--
I've been putting QR codes on these slides, by the way.
So I've open-sourced all the code that I'm showing here
and we're talking through.
And this is a QR code to a workbook
where you can train a rock, paper,
scissors model for yourself.
But once we do convolutions-- and earlier in the slide,
you saw I had multiple convolutions moving down.
And this is what the code for that would look like.
I just have a convolution layer, followed by a pooling.
Another convolution, followed by a pooling.
Another convolution, followed by a pooling,
et cetera, et cetera.
So the impact of that-- and remember, first of all,
at the top, I have my input shape,
and I have my output at the bottom
where the Dense equals 3.
So I'm going to switch back to the demo
now to see if it's finished training.
And we can see it.
So we started off with 33% accuracy.
But as we went through the epochs--
I just did this one, I think, for 15 epochs--
it got steadily, and steadily, and steadily more accurate.
So after 15 loops of doing this, it's now 96.83% accurate.
So as a result, we can see, using these techniques,
using convolutions like this, we've
been actually able to train something in just a few minutes
to be roughly 97% accurate at detecting rock, paper,
and scissors.
And if I just take a quick plot here, we can see this
is a plot of that accuracy-- the red line showing the accuracy
where we started at roughly 33%.
And we're getting close to 100%.
The blue line is I have a separate data
set of rock, paper, scissors that I tested with, just
to see how well it's doing.
And it's pretty close.
I need to do a little bit of work in tweaking it.
And I can actually try an example to show you.
So I'm going to upload a file.
I'm going to choose a file from my computer.
I've nicely named that file Paper,
so you can guess it's a paper.
And if I open that and upload that,
it's going to upload that.
And then it's going to give me an output.
And the output is 1, 0, 0.
So you think, ah, I got it wrong.
It detected it's a rock.
But actually, my neurons here, based on the labels,
are in alphabetical order.
So the alphabetical order would be paper, then rock,
then scissors.
So it actually classified that correctly by giving me a 1.
So it's actually a paper.
And we can try another one at random.
I'll choose a file from my machine.
I'll choose a scissors and open that and run it.
And again, paper, rock, scissors,
so we see it actually classified that correctly.
So this workbook is online if you
want to download it and have a play with it
to do classification yourself and to see
how easy it is for you to train a neural network to do this.
And then once you have that model,
you can implement that model in your applications
and then maybe play rock, paper, scissors in your apps.
Can we switch back to the slides, please?
So just to quickly show the idea of how convolutions really
help you with an image, this is what that model
looks like when I defined it.
And at the top here, it might look like a little bit of a bug
at first, if you're not used to doing this.
But at the top here-- remember, we said my image is coming
in 150 x 150--
it's actually saying, hey, I'm going to pass out
an image that's 148 x 148.
Anybody know why?
Is it a bug?
No, it's not a bug.
OK.
So the reason why is if my filter was
3 x 3, for me to be able to look at a pixel, I have to throw--
for me to start on the image, I have
to start one pixel in and one pixel down in order for it
to have neighbors.
So as a result, I have to throw away all the pixels
at the top, at the bottom, and either side of my image.
So I'm losing one pixel on all sides.
So my 150 x 150 becomes a 148 x 148.
And then when I pool that, I halved each of the axes.
So it becomes 74 x 74.
Then through the next iteration, it becomes 36 x 36, then
17 x 17, and then 7 x 7.
So if you think about all of these 150 squared images
passing through all of these convolutions
are coming up with lots of little 7 x 7 things.
And those little 7 x 7 things should
be highlighting a feature--
it might be a fingernail.
It might be a thumb.
It might be a shape of a hand.
And then those features that come through the convolutions
are then passed into the neural network
that we saw earlier on to generate those parameters.
And then from those parameters, hopefully, it
would make a guess, and a really accurate guess,
about something being a rock, a paper, or scissors.
So if you prefer an IDE instead of using Collab,
you can do that also.
I tend to really like to use PyCharm for my developments.
Any PyCharm fans here, out of interest?
Yeah, nice.
A lot of you.
So here's a screenshot of PyCharm
when I was writing this rock, paper, scissors thing before I
pasted it over to Collab, where you can run it from Collab.
So PyCharm is really, really nice.
And you can do things like step-by-step debugging.
If we can switch to the demo machine for a moment.
Now, I'll do a quick demo of PyCharm
doing step-by-step debugging.
So here, we can see we're in rock, paper, scissors.
And for example, if I hit the Debug,
I can even set breakpoints.
So now, I have a breakpoint on my code.
So I can start taking a look at what's happening
in my neural network code.
Here, this is where I'm preloading the data into it.
And I can step through, and I can
do a lot of debugging to really make
sure my neural network is working the way
that I want it to work.
It's one of the things that I hear
a lot from developers when they first
get started with machine learning
is that, this seems to be your models
are very much a black box.
You have all this Python code for training a model,
and then you have to do some rough guesswork.
With TensorFlow being open-sourced,
I can actually step into the TensorFlow code
in PyCharm, like I'm doing here, to see
how the training is going on, to help me to debug my models.
And Karmel, later, is also going to show
how something called TensorBoard can
be used for debugging models.
Can we switch back to the slides, please?
So with that in mind, we've gone from really just beginning
to understand what neural networks are all
about and basic "Hello, world!" code
to taking a look at how we can use something called
convolutions.
And they're something that sounds really complicated
and really difficult. But once you start using them,
you'll see they're actually very, very easy to use,
particularly for image and text classification.
And we saw then how, in just a few minutes,
we were able to train a neural network to be
able to recognize rock, paper, and scissors with 97%, 98%
accuracy.
So that's just getting started.
But now, to show us how to actually stretch the framework,
and to make it real, and to do really
cool and production-quality stuff,
Karmel is going to share with us.
Thank you.
[APPLAUSE]
KARMEL ALLISON: Hi.
So quick show of hands for, how many of you
was that totally new, and now you're
paddling as fast as you can to keep your head above water?
All right, a fair number of you.
I'm going to go over, now, some of the tools and features
that TensorFlow has to take you from when you've actually
got your model to all the way through production.
Don't worry, there is no test at the end.
So for those of you who are just trying to keep up right now,
track these words, store somewhere
in the back of your head that this is all available.
For the rest of you where you've already got a model
and you're looking for more that you can do with it,
pay attention now.
All right.
So Laurence went through an image classification problem.
In slides, we love image classification problems,
because they look nice on slides.
But maybe your data isn't an image classification problem.
What if you've got categorical data or text-based data?
TensorFlow provides a number of tools
that allow you to take different data types
and transform them before loading them
into a machine learning model.
In particular, for example, here, maybe we've
got some user clickstreams, right?
And we've got a user ID.
Now, if we fed that directly into a deep learning model,
our model would expect that that is real valued and numeric.
And it might think that user number 125 has some relation
to user 126, even though in reality, that's not true.
So we need to be able to take data like this
and transform it into data that our model can understand.
So how do we do that?
Well, in TensorFlow, one of the tools
that we use extensively inside of Google are feature columns.
These are configurations that allow
you to configure transformations on incoming data.
So here, you can see we're taking our categorical column,
user ID, and we're saying, hey, this
is a categorical column when we pass in data for it.
And we don't want the model to use it as a categorical column.
We want to transform this, in this case, into an embedding,
right?
So you could do a one-hot representation.
Here, we're going to do an embedding that actually gets
learned as we train our model.
This embedding and other columns that you have can then get
directly fed into Keras layers.
So here, we have a Dense Features layer
that's going to take all these transformations
and run them when we pass our data through.
And this feeds directly downstream into our Keras model
so that when we pass input data through,
the transformations happen before we actually
start learning from the data.
And that ensures that our model is
learning what we want it to learn, using
real-value numerical data.
And what do you do with that layer
once you've got it in your model?
Well, in Keras, we provide quite a few layers.
Laurence talked you through convolutional layers,
pooling layers.
Those are some of the popular ones in image models.
But we've got a whole host of layers
depending on what your needs are--
so many that I couldn't fit them in a single screenshot here.
But there are RNNs, drop out layers,
batch norm, all sorts of sampling layers.
So no matter what type of architecture
you're building, whether you're building something
for your own small use case and image classification model,
whatever it is, or the latest and greatest research model,
there are a number of built-in layers
that are going to make that a lot easier for you.
And if you've got a custom use case that's actually not
represented in one of the layers,
and maybe you've got custom algorithms or custom
functionality, one of the beauties of Keras
is that it makes it easy to subclass layers
to build in your own functionality.
Here, we've got a Poincare normalization layer.
This represents a Poincare embedding.
This is not provided out-of-the-box with TensorFlow,
but a community member has contributed this layer
to the TensorFlow add-ons repository,
where we provide a number of custom special use case layers.
It's both useful, if you need Poincare normalization,
but also a very good example of how you might write a custom
layer to handle all of your needs,
if we don't have that out-of-the-box for you.
Here, you write the call method, which handles
the forward pass of this layer.
So you can check out the TensorFlow add-ons repository
for more examples of layers like this.
In fact, everything in Keras can be subclassed,
or almost everything.
You've got metrics, losses, optimizers.
If you need functionality that's not provided out-of-the-box,
we try to make it easy for you to build on top of what Keras
already provides, while still taking advantage of the entire
Keras and TensorFlow ecosystem.
So here, I'm subclassing a model.
So if I need some custom forward pass in my model,
I'm able to do that easily in the call method.
And I can define custom training loops within my custom model.
This makes it easy to do-- in this case, a trivial thing,
like multiply by a magic number.
But for a lot of models where you
need to do something that's different than the standard fit
loop, you're able to customize in this way
and still take advantage of all the tooling
that we provide for Keras.
So one of the problems with custom models
and more complicated models is it's
hard to know whether you're actually
doing what you think you're doing
and whether your model is training.
One of the tools we provide for Keras, and TensorFlow
more broadly, is TensorBoard.
This is a visualization tool.
It's web based, and it runs a server
that will take in the data as your model trains
so that you can see real time, epoch by epoch,
or step by step, how your model is doing.
Here, you can see accuracy and loss as the model trains
and converges.
And this allows you to track your model as you train
and ensure that you're actually progressing
towards convergence.
And when you're using Keras, you can also
see that you get the full graph of the layers that you've used.
You can dig into those and actually get the op-level graph
in TensorFlow.
And this is really helpful in debugging, to make sure
that you've correctly wired your model
and you're actually building and training
what you think you are training.
In Keras, the way you add this is
as easy as a few lines of code.
Here, we've got our TensorBoard callback that we define.
We add that to our model during training.
And that's going to write out to the logs,
to disk, a bunch of different metrics
that then get read in by the TensorBoard web GUI.
And as an added bonus, you get built-in performance
profiling with that.
So one of the tabs in TensorBoard
is going to show you where all of your ops
are being placed, where you've got performance bottlenecks.
This is extremely useful as you begin
to build larger and more models, because you
will see that performance during training
can become one of the bottlenecks in your process.
And you really want to make that faster.
Speaking of performance, this is a plot
of how long it takes ResNet-50, one of the most popular machine
learning models for image classification,
to train using one GPU.
Don't even ask how long it takes with one CPU,
because nobody likes to sit there and wait
until it finishes.
But you can see that it takes a better
part of a week with one GPU.
One of the beauties of deep learning
is that it is very easily parallelizable.
And so what we want to provide as TensorFlow
are ways to take this training pipeline and parallelize it.
The way we do that in TensorFlow 2.0 is we're
providing a series of distribution strategies.
These are going to make it very easy for you to take
your existing model code.
Here, we've got a Keras model that
looks like many of the others you've
seen throughout this talk.
And we're going to distribute it over multiple GPUs.
So here, we add the mirrored strategy.
With these few lines of code, we're
now able to distribute our model across multiple GPUs.
These strategies have been designed from the ground up
to be easy to use and to scale with lots of different
architectures and to give you great out-of-the-box
performance.
So what this is actually doing--
here, you can see that with those few lines of code,
by building our model under the strategy scope, what we've done
is we've taken the model, we've copied it across all
of our different devices.
In this picture, let's say we've got four GPUs.
We copy our model across those GPUs,
and we shard the input data.
That means that you're actually going
to be processing the input in parallel
across each of your different devices.
And in that way, you're able to scale
model training approximately linearly with the number
of devices you have.
So if you've got four GPUs, you can run approximately four
times faster.
What that ends up looking like-- on ResNet,
you can see that we get great scaling.
And just out-of-the-box, what you're getting with that is
that your variables are getting mirrored and synced across all
available devices.
Batches are getting prefetched.
All of this goes into making your models much more
performant during training time, all without changing code
when you're using Keras.
All right.
And mirrored strategy with multi GPUs is just the beginning.
As you scale models, as we do at Google, for example,
you might want to use multiple nodes and multiple servers,
each of which have their own set of GPUs.
You can use the multiworker mirrored strategy
for that, which is going to take your model,
replicate it across multiple machines,
all working synchronously to train your model,
mirroring variables across all of them.
This allows you to train your model faster than ever before.
And this API is still experimental,
as we're developing it.
But in TensorFlow 2.0, you'll be able to run this out-of-the-box
and get that great performance across large scale clusters.
All right.
So everything I've talked about so far
falls under the heading of training models.
And you will find that a lot of model builders
only ever think about the training portion.
But if you've got a machine learning model
that you're trying to get into production,
you know that's only half the story.
There's a whole other half, which is, well,
how do I take what I've learned and actually serve
that to customers or to whoever the end user is, right?
In TensorFlow, the way we do that is you're
going to have to serialize your model into a saved model.
This saved model becomes the serialized format of your model
that then integrates with the rest of the TensorFlow
ecosystem.
That allows you to deploy that model into production.
So for example, we've got a number of different libraries
and utilities that can take this saved model.
For TensorFlow Serving, we're going
to be able to take that model and do
web-based serving requests.
This is what we use at Google for some of our largest scale
systems.
TensorFlow Lite is for mobile development.
TensorFlow.js is a web-native solution
for serving your models.
I'm not going to have time to go over
all of these in the next few minutes,
but I will talk about TensorFlow Serving and TensorFlow Lite
a little bit more.
But first, how do you actually get to a saved model?
Again, in TensorFlow 2.0, this is going to be easy
and out-of-the-box where you're going to take your Keras model,
you call .save.
And this is going to write out the TensorFlow saved model
format.
This is a serialized version of your model.
It includes the entire graph, and all of the variables,
and weights, and everything that you've learned,
and it writes that out to disk so that you can take it,
pass it to somebody else, let's say.
You can load it back into Python.
You're going to get all of that Python object state back,
as you can see here.
And you could continue to train, continue to use that.
You could fine-tune based on that.
Or you could take that model and load it into TF Serving.
So TensorFlow Serving responds to gRPC or REST requests.
It acts as a front end that takes the requests,
it sends them to your model for inference.
It's going to get the result back.
So if you're building a web app for our rock, paper,
scissors game, you could take a picture,
send it to your server.
The server is going to ask the model, hey, what is this?
Send back the answer, based on what the model found.
And in that way, you get that full round trip.
TensorFlow Serving is what we use internally
for many of our largest machine learning models.
So it's been optimized to have low latency and high
throughput.
You can check it out at TensorFlow.org.
There's an entire suite of production pipelining
and processing components that we call
TensorFlow Extended, or TFX.
You can learn more about those at TensorFlow.org,
using that handy dandy QR code right there.
And maybe you've got a model, and you've got your web app.
But really, you want it on a phone, right?
Because the future is mobile.
You want to be able to take this anywhere.
So TensorFlow Lite is the library
that we provide for converting your saved model into a very
tiny, small footprint.
So that can fit on your mobile device.
It can fit on embedded devices--
Raspberry Pis, Edge TPUs.
We now run these models across a number of different devices.
The way you do this is you take that same saved model
from that same model code that you wrote originally.
You use the TF Lite converter, which shrinks
the footprint of that model.
And then it can be loaded directly onto device.
And this allows you to run on-device, without internet,
without a server in the background, whatever
your model is.
And you can take it, take TensorFlow,
wherever you want to be.
Now, we've run through, really quickly,
from some machine learning fundamentals,
through building your first model, all the way
through some of the tools that TensorFlow
provides for taking those and deploying those to production.
What do you do now?
Well, there's a lot more out there.
You can go to google.dev.
You can go to TensorFlow.org where
we've got a great number of tutorials.
You can go to GitHub.
This is all open source.
You can see the different libraries there, ask questions,
send PRs.
We love PRs.
And with that, I'd like to say, thank you.
LAURENCE MORONEY: Thank you, very much.
KARMEL ALLISON: Laurence, back out.
[APPLAUSE]
[MUSIC PLAYING]

機械学習 ゼロからヒーローへ（Google I/O'19 (Machine Learning Zero to Hero (Google I/O'19))

機械学習ゼロからヒーローへ（Google I/O'19 (Machine Learning Zero to Hero (Google I/O'19))