字幕表 動画を再生する
ASHLEY: My name is Ashley.
I'll be your host for today.
We'll get this show on the road.
I'd like to introduce our first speaker, Josh Gordon, who's
on the TensorFlow team.
Josh is going to talk about the ease of TensorFlow 2.0
and will walk us through the three
styles of model building APIs complete with coding examples.
So please help me in welcoming Josh.
[APPLAUSE]
JOSHUA GORDON: Thanks so much.
How's it going everybody?
AUDIENCE: Good.
JOSHUA GORDON: So let me just unlock this laptop.
And we will get started.
So I have only good news about TensorFlow 2.0,
which is about a month old.
It's wonderful.
It is massively easier to use.
And it's great both for beginners and for experts.
And also from a teaching perspective,
it has a lot of things to recommend it
that I'll talk about too.
And no big deal.
But maybe we can close the doors in the back.
It's a little bit loud.
Thanks.
All right.
So I'm going to get into an outline in a sec.
But one thing I wanted to mention right off the bat
is TensorFlow 2.0 has all the power of graphs that we
had in TensorFlow 1.0 except they're massively, massively,
massively easier to use.
In TensorFlow 1.0, the name "TensorFlow"-- a tensor
is basically a fancy word for an array.
So a scalar's a tensor.
A list is a tensor.
A cube is a tensor.
And flow refers to a data flow graph.
And in TensorFlow 1.0, you would manually define a graph.
And you would execute it with a session.
And this felt a little bit like metaprogramming.
And this is exactly the system you
would have wanted several years back
if you were an engineer and the challenge you faced was
massively distributed training.
That's great for that.
However, as a developer or a student or as a researcher,
you want something that feels a lot more like Python.
And on the right, you're seeing how TensorFlow 2.0 looks.
Basically, you can think of a tensor in a very similar way
to a NumPy ndarray.
And you can work with them imperatively,
exactly as you would expect in Python.
So you no longer need to use things like sessions
or anything like that.
And it works as you would expect, which is great.
But that's low-level details.
And here's some of the things I'd like to talk about.
So this is a rough schematic of how TensorFlow 2.0 looks.
And TensorFlow 2.0 is a very, very large system
with many moving pieces.
It's a whole framework for doing machine learning.
And what I'd like to do here is show you a couple of the pieces
and what some of your options are to use it.
So we will start with designing models using my favorite API
of all time, which is Keras.
And what's awesome in TensorFlow 2.0--
and this is a really important point--
there is a spectrum of use cases.
And all of these are built into the same framework.
And you can mix and match as you go.
And what this means is if you're a total novice
to deep learning, you can start with something
called the Sequential API, which is
by far the easiest and clearest way
to develop deep learning models today.
And it's wonderful.
You can build a stack of layers.
You can call things like compile and fit.
And that is 100% valid TensorFlow 2.0 code.
It is just as fast as any other way of writing code.
There is no downsides to it at all
if your use case falls into that bucket.
And what's really important in TensorFlow 2.0 is, as helpful
to you, you can optionally scale up in complexity
using things like the Functional API
or going all the way to subclassing
all in the same framework.
And you can mix and match as you go.
And what this means is when I'm teaching this, for example,
I can start in these really simple, clear, easy ways.
And then when I want to write gradient descent from scratch,
I can do that.
Or write custom layers from scratch, I can do that.
It's very easy.
So let's take a look at what some of these pieces look like.
And I only have one slide on sequential.
And the reason is we have so many tutorials on it.
There's an entire book on it, which
I'll recommend at the end.
So I've just one slide on this.
But in case you're new to this, the Sequential API
lets you define a stack of layers.
And this is by far the most common way
to build your models.
And something like, you know, I'd
say 80% to 90% a machine learning models
will fit into this framework.
So you define a stack of models.
And that's great.
What's interesting is when you're using the Sequential
API and the Functional, although a lot of developers
don't realize this, what you're actually doing
is defining a data structure.
And this means you can do things like model.summary
and see a printout of all the layers and all the weights.
It also means that we can do compile time checks.
So when you call model.compile, we
can make sure all your layers are compatible.
It also means that when you share your model
with other people, for example, imagine
that you want to do fine tuning for transfer learning.
If you have a model that's defined with a Sequential
API or the Functional API, because it
has a data structure of a stack of layers or, with a Functional
API, a graph of layers, you can inspect that data structure
and you can pull layers out of it and get the activations.
And you can do fine tuning things like that really easily.
Anyway, defining a stack of layers is very common.
In TensorFlow 2.0, this works exactly
like it does in Keras.io with the multipack in Keras.
So that's great.
One thing that's also extremely powerful that a lot of people
are new to is the Functional API.
And the Sequential API is for building stacks.
The Functional API is for building DAGS
or Directed Graphs.
And I just want to show you how powerful this is.
So to be honest, most of what I've been doing myself
is using either the Sequential API
or going all the way to subclassing and just
write everything from scratch.
But I heard a really awesome talk on the Functional API
a couple weeks ago in Montreal.
And I've been using it a lot since.
And I love it.
So I just want to show you what it can do.
And so I just want to show you what a quick model would
look like for something like visual question answering.
And a lot of the time, when you start with machine learning,
you spend a lot of your time-- or deep learning,
rather-- you spend a lot of your time building image classifiers
and doing things like cats and dogs.
But we can take a look at a slightly more sophisticated
model.
And this is VQA.
And in VQA, you're given two inputs.
One, you're given an image.
So here we have a pair of dogs.
And you're given a question in natural language.
And here, the question is asking, what color
is the dog on the right?
And so to answer a question like this,
you need a much, much, much more sophisticated model
than just an image classifier.
You can still phrase this as a classification problem.
And if you Google for VQA, there's
two really excellent papers that will go into detail.
But you can imagine you have some model--
well, let's talk about how we would do this.
Here, we have a model with two inputs.
We have an image and a question.
And if you take a machine learning course or deep
learning course, rather, you'll learn
about processing images with convolutional layers
and max pooling layers.
And you'll learn about processing text with things
like LSTMs and embeddings.
And one thing that's really powerful about deep learning
is all of these layers, regardless of what they are,
they take vectors as input.
And if you're a dense layer, you don't
care if your input happens to be the output
of some convolutional layer or if your input happens
to be the output of some LSTM.
It's just numbers that you're taking as input.
So in deep learning, there's no reason
that we can't process the image with the CNN,
process the text with an LSTM, and then concatenate the result
and feed that into a dense layer and phrase
this as a classification problem.
So you can imagine the output of our dense layer
might have 1,000 different classes.
And each class corresponds to one possible answer.
So here, if the answer is golden,
we want to classify both of these inputs jointly as golden.
And I want to show you how quickly we can design something
like this with the Functional API, which is really amazing.
So, cool, I probably should've flipped to this slide.
But this is what I was just talking through.
So this is the architecture that we want.
This is one model.
And it's going to have two heads.
And the first head is going to be a standard stack of CNNs
and max pooling layers.
And this is exactly the same model
you would use to classify cats and dogs.
And you can do all the same tricks
that you will learn about there.
You can import, like here, I want to show you
how to write it from scratch.
But there's no reason that you couldn't import like MobileNet
v whatever, and use that to get activations for the images.
But basically we're going to go from an image to a vector.
And in the other head, we're going to process the question.
And we're going to go from a question to a vector.
And to do that, we use an embedding in an LSTM.
At the end, we can concatenate the results and classify it.
And this is nearly the complete code for this entire VQA model.
Actually it is the complete code for the model-- a Hello World
version of it, which is nuts when you look at it.
So here, this is our image classifier.
And you would want something much deeper.
But this would be your Hello World image classifier.
So a vector is going to go in.
And just some random tips.
I can see my slides.
That's cool.
So you'll notice in the first layer,
this is a convolutional layer.
It has 64 filters, each of which is 3 by 3.
And it has relu activation.
And you see in the input shape there,
I'm specifying how large my image is.
Although you'll find that Keras can often
infer the input shape, whenever you can you should specify it.
And it's just one less thing that can go wrong.
So fully specify it, catch bugs early.
After that we're doing max pooling.
And the important part is we're flattening it.
So that's actually a sequential model
that we're using inside the Functional API.
After that, we're creating an input layer.
And this is for the Functional API.
And we're beginning to chain layers
together to build up a graph.
So here what we're doing is we're changing the vision model
to the input layer.
So that's the first half of our model.
And here's the second half.
And we're almost done.
So this is the model that's going to process the question.
Here we're creating another input.
And I don't have the preprocessing here.
But you can imagine that we've tokenized the text.
And we vectorized it.
We've padded it.
And then what we're doing is we're
feeding that into an embedding and then into an LSTM.
And this is exactly what you might
do if you were training a text classifier.
The important thing is that a vector is coming out,
and we're chaining these together to build a graph.
And at the very end--
and this is the magic bit about deep learning-- we can simply
concatenate the results.
Nothing simple, but it's one line of code, which is nice.
Nothing simple conceptually.
But what we can do is we can concatenate the results.
And now we just have a vector.
And just like any other problem, now that we have this vector,
we can feed it into dense layers, and we can classify it.
And so here's the tail of our model.
And now we have a TensorFlow 2.0 model that will do VQA.
And this will work exactly like any other Keras model.
So if you want, you can call model.fit on this thing.
You can call model.train_on_batch.
You can use callbacks.
If you want, you can write a custom training
loop using GradientTape.
And so I think this is really powerful.
And what's nice about these Functional APIs,
just like sequential models, because there's
a data structure behind the scenes, there's a graph.
TensorFlow 2.0 can run compatibility checks
and make sure your layers work with each other.
So it's really, really nice.
So basically, if you haven't used the Functional API,
either you just learned about Keras Sequential
from a lot of books or you're coming from PyTorch
and you've only used things like subclassing,
I really encourage you to try it out.
I've had only positive results in the last few weeks.
So I love it.
Other things, of course, so that graph
I just made in Google Slides.
But because you have a data structure,
instead of calling something like model.summary,
you can call a model plot model.
And you'll actually get a nice rendering
of a graph that looks exactly like what I just showed you.
So for complicated models--
so this is cool.
The only time I found it really useful
is when you have complicated models, things like ResNets,
you can actually plot out the whole graph
and just make sure that it looks as you
expect as you're assembling it.
So it's really nice.
Anyway, and then there's another style in TensorFlow 2.0.
So the last two things I showed you
were built into what you'll find at Keras.io
and that's multipack in Keras.
Keras in TensorFlow 2.0 is a superset
of what you find in Keras.io.
And this is something that's new.
So this is subclassing.
And this is a Chainer/PyTorch style
of developing models, which is also really, really nice.
And basically you'll see this little spectrum here.
You're getting increasing control as you move up.
And so in this style, you can be a researcher
or a student learning this for the first time.
But what you're saying is, I just
want to write everything from scratch to learn how it works
or because I have some special use case that doesn't
fit into the other ones.
So here what we're doing is we're
defining a subclass model.
And I also really love this.
This feels a lot like object-oriented NumPy
development.
So what we're doing-- and the idea is basically--
it's very, very similar in all these frameworks.
The framework gives you a class.
And here this class happens to be model.
And there's two chunks to writing a subclass model.
You have the constructor and the call method
or the forward method or the predict method.
And in the constructor, you define your layers.
So here I'm creating a pair of dense layers.
And it's the exact same layers that you'd
find in the Sequential and Functional model APIs,
which is great.
So basically, you learn these layers once,
you can use them all over the place.
And in the call method or the forward method,
you describe how these layers are chained together.
So here I have some inputs.
And I'm feeding the inputs through my dense layer.
And then I'm feeding that result through my second dense layer
and returning it.
And what's nice is this is not symbolic.
So if you're curious what x, is you can just
do print x, of course, like you would in Python,
and that will give you the activations
of that first dense layer.
If you want you can modify it.
So for example, here I've highlighted relu.
Let's say for some reason I'm not
interested in using the built-in relu activation.
I want to write my own.
What you can do simply is just remove that.
And you can write your own with just regular Python flow right
there.
So there, I've written relu using the built-in method.
But I could also just write that using regular Python.
So this is great for hacking on things.
It's great.
If you want to really know the details of what exactly
is flowing in and out of these layers,
it's a perfect way to do it.
Also for example, there's nothing here
that saying that you have to use these built-in dense layers.
So if you look at the code for the dense layer, that's
doing something like wx plus b, you
can absolutely just write that from scratch in Python
and use that here instead.
And then these are just a whole bunch of references.
I'm going to tweet out these slides when we finish so you
don't have to write it down.
But what these are, these are guides
from TensorFlow 2.0 that will go into detail on how you write--
how do you use each of these three styles of the Keras API?
How do you write custom layers and stuff like that?
They're great.
And then we have a couple of tutorials
that I recommend that are really, really nice
that show different ways of building stuff.
If you haven't seen these, by the way,
the segmentation one is super-nice.
We wrote it this summer.
It runs really, really fast.
And I think you'll like it.
Also, by the way, because this is an Intro to TensorFlow 2.0
talk, let me just show you what the tutorials are
in case you're new to these.
And there's just one thing I want to point out.
So this is the tutorial on our website.
Big surprise.
What I wanted to mention, and I think
this is a really nice feature of TensorFlow.org,
obviously this is a web page HTML.
But this is just a direct rendering
of this Jupyter Notebook.
So the web page is just the Jupyter Notebook.
And the reason we've done that is all the tutorials
are runnable end to end.
So you can install TensorFlow 2.0 locally
and run this tutorial.
Or if you click run in Colab, this is exactly the same page
as you have on TensorFlow.org.
For all of these, you can do Runtime, Run all.
And this has the complete code to reproduce the results
that you see here.
And what this means is that our tutorials are testable.
And if you know the expression, trust but verify?
So for a long time, I've seen nifty tutorials
that are like, yeah, like let me show you
how to write this like neural machine translation model.
And then the code doesn't work or it's
missing a key piece that's left as an exercise to the reader.
So at least all the tutorials on the website, all of them
run end to end, which is really, really nice.
Some we still have plenty of work to do cleaning them up.
But at least they guarantee you they have the complete code
to do the thing.
So I really like that a lot.
All right.
I wanted to talk a little bit about training models.
And so basically, there are several ways to train models.
And again, you can use-- the nice thing
about TensorFlow 2.0 is you can use the one that's
most helpful for your use case.
So you don't always need to write a custom training
loop from scratch.
So you have other options.
And the first is that you might be familiar with from Keras
is just simply calling model.fit.
And what's really nice about model.fit,
it doesn't care if you have a sequential model, a functional
model, or a subclass model.
It works for all them.
And model.fit-- it's fast.
It's performant.
It's simple.
One thing that's a little bit less obvious,
when you do model.fit, this is not just
the baby way of training models.
So if you're working in a team and you call model.fit,
you've reduced your code footprint by a lot.
This is one less thing that your friends
need to worry about when they're playing with your models
down the road.
So if you can use the simple things,
you should unless there's a reason for more
complexity, of course, just like in regular software
engineering.
The nice thing about fit is you can pass in different metrics.
In a lot of examples, you'll see things like accuracy.
By the way, TensorFlow 2.0 has really nice metrics
for things like precision and recall built in.
You can also write custom metrics.
And something that's really helpful too is callbacks.
So for instance, these are things
that I don't see a lot of new developers using.
And they're super helpful.
So callbacks, and one of my favorites is EarlyStopping.
And so typically, when we're training models,
we need to prevent overfitting.
And a really wonderful way to do that
is to make plots of your loss over time
and so on and so forth.
These callbacks can do things like that for you
automatically, which can be helpful as well.
You can also write custom callbacks.
So a cool thing would be like let's say
you're training a model that takes
a very long time to train.
You could write a callback to send you a Slack notification
after every epoch of training completes.
And so that can be really nifty too.
So callbacks are great.
And then I don't have slides for train_on_batch here.
But I did want to show you custom training
with a GradientTape because this is also very powerful,
especially for students who are learning this
for the first time and don't want a black box
or for researchers.
And so here is a custom training loop.
And I have an example of this for you in a minute
just with linear regression so you
can see exactly how this works.
But this is a custom training loop.
And what we're doing here is we have some function.
And for now you can pretend that atf.function in the orange box
doesn't exist.
So that's optional.
So just pretend that doesn't exist.
We have some function that's taking features and labels
as input.
And whenever we're doing training in deep learning,
we're doing gradient descent.
The first step in doing gradient descent
is getting the gradients.
And the way all frameworks do this is by backdrop,
which is reverse-mode autodiff.
And the implementation in TensorFlow
is we start recording operations on a tape.
So here we're creating a tape.
And we're recording what's happening beneath that tape.
So we have just regular Python code
that's calling the forward method or the call
method, rather, on your model.
So we forward the features through our model.
And we're computing some loss.
And maybe we're doing regression.
And that's squared error.
And then what we're doing is we're
getting the gradients of the loss with respect
to all the variables in the model.
And if you print those out, you'll
see exactly what the gradients are.
And then here, we're doing gradient descent manually.
We're applying them on an optimizer.
We can also write our own optimizer.
And I'll show you that in a second.
Anyway, this is a custom training loop from scratch.
And what this means is that if you--
so in model.fit, you can use optimizers
like RMSprop and Adam and all this stuff.
But if you'd like to write like the [? Sarah ?] optimizer,
you can go ahead and write it in Python.
And it will fit right in with your model.
So this is great for research.
tf.function, by the way, first of all,
you never need to write it.
Your code will work the same.
But if you do want a graph in TensorFlow 2.0 or if you--
basically, if you want to compile your code
and have it run faster, you can write that tf.function
annotation.
And what this means is that TensorFlow 2.0 will trace
your computation, compile it, and the second and on time that
you run this function it will be much, much,
much faster because it's running entirely in C++.
So all there is to graphs in TensorFlow 2.0 is basically
at tf.function.
But it's optional.
You don't even need to use it.
But it's easy performance if you need it.
And then I just want to make this super concrete
because this is a Getting Started with TensorFlow 2.0
talk.
There's a lot of awesome tutorials
that will quickly show you on the website
how to train image classifiers and whatnot.
But I think a good place to start too
is just looking at linear regression.
And the reason is it's gradient descent.
All deep neural networks are trained by gradient descent.
And a nice place to start is seeing exactly what that is.
And because I have a--
I know it's tiny.
But because I have a graphic here on the left,
I'm just briefly going to explain
how linear regression works.
And it's the same pattern for deep neural networks
too, which is really surprising.
So in linear regression or deep neural networks,
you need three things.
The first thing you need is a model,
which is a function that makes a prediction.
And so a model for linear regression,
you might have learned in high school, could be y
equals mx plus b.
We're trying to find the best fit line.
And we can define a line y equals mx plus b.
That means we have two parameters or variables
that we need to set.
We have m, which is the slope, right?
And we have b, which is the intercept.
And by wiggling those variables, we
can fit the line to our data.
So on the right, you'll see a plot.
And we have a scatter plot with a bunch of points.
And we have the best fit line.
And the idea is now that we have a model or a line
that we can wiggle, we need a way of saying or quantifying,
how well does this line fit the data?
One way to quantify how well the line fits the data
is squared error.
What that means is you drop a line on the page.
And then you measure the distance
from your line to all the points.
And you take the sum of the squares of that.
The higher the sum of the squares is,
the worse your line fits the data.
The better your line fits the data,
the lower the sum of the squares.
So you can have a single number, which
is called loss, that describes how badly your line fits
the data.
And then you want to reduce that loss.
And you know that if the loss gets to a minimum,
you're line will fit the data well.
And you found the best fit line.
The way we reduce the loss is gradient descent.
And on the left, you'll see a gradient descent plot.
And we're looking at loss or a squared error
as a function of two variables-- m and b.
And you can see that if we set m and b with a random guess
to start, our loss is pretty high.
And then as we wiggle them, we can reduce the loss.
The trick is, how do we figure out
which way to wiggle m and b?
And briefly-- I don't want to go on too much of a tangent--
there's two ways to do that.
If you forget calculus, you can find the gradient numerically.
And it's not rocket science.
You take m, and you wiggle it up a little bit,
recompute your loss.
Then you take m, and you wiggle it down a little bit.
And you recompute your loss.
You figure out which way makes the loss go down.
That's the direction you're going to be wiggling m.
Do the same thing or b.
That's very, very slow.
And there's faster ways to do it too.
But I want to show you what this code looks
like in TensorFlow 2.0.
So basically, you don't have to use Keras at all.
You can also use TensorFlow 2.0 a lot like you would use NumPy.
And basically, whenever you see something like tensor,
just replace that in your head with NumPy ndarray.
So we have constants.
And you see as you print out a constant,
it has shape and a data type.
One really nice thing about TensorFlow tensors
is they have a NumPy method.
So you can go straight from tensors to NumPy,
which is great.
So you're free of the clutches of TensorFlow 2.0.
Tensors have a shape and a data type.
And then just like you would expect in NumPy,
we have things like distributions.
So if you want to create some random normal,
here's how you would do it.
I just want to fly through this really quick.
And you can do math in TensorFlow 2.0 a lot
like you would do math in NumPy.
So basically, just like you have things like numpy.square
and numpy.matrixmultiply, TensorFlow
has all these same things too.
And the idea is the same.
The names might be slightly different.
You might have to poke around a little bit.
But the names are all there.
And here's an example of like very, very simple.
Here's how we get the gradients using the GradientTape.
But this is more concrete.
So here we have a constant.
That's 3.
And we have a function, which is x squared.
And so if you think about your rules of calculus,
if we have 3x squared, you take the 2.
And you multiply it by the 3.
And you get 6.
And if you walk through this code,
you'll see that it returns you 6 too.
So basically, this is how we get the gradients using
GradientTape.
And you can also do that with all the variables and layers
at once.
So here we have a pair of dense layers.
And we're calling the dense layers on some data.
And we're getting the gradients also under the tape.
And let me just show you what this looks
like in linear regression just to make this concrete.
So this is code for y equals mx plus b.
And the first thing I wanted to mention
is how you install TensorFlow 2.0.
If you're running in Colab--
right now in Colab, TensorFlow 1.0 is installed by default.
But there's a magic flag you can run.
So that magic command will give you the latest
version of TensorFlow 2.0.
If you're running this locally, you can visit
Tensorflow.org/install.
And you can do pip install tensorflow.
So anything you can do in Colab, you can do locally.
But I have this here.
This is convenient.
TensorFlow is just a Python library.
You can import as you always would.
And the first thing we do in this notebook--
I know I flew through some of those code examples--
but what we're going to do is just create a scatterplot,
so just some random data.
And then we're going to find the best fit line.
So here what we're doing is we're creating some data.
Let me see if I've run this before.
Yeah.
And then we're plotting the data.
And here's what we get.
The slides had TensorFlow constants.
And these are TensorFlow variables.
And basically, constants are constant,
and variables can be adjusted over time.
You almost never need to write code this low level.
This is pretending that we don't have Keras.
We don't have any built-in fit methods.
We just want to do this from scratch.
But here's how you would do it from scratch.
So I'm creating some variables.
Here I've initialized them to 0.
I probably should have initialized them
to a random number.
But this will work too.
And then here, this doesn't look scary at all.
This is the predict function for linear regression.
So this is our equation for a line y equals mx plus b.
And our goal is going to be to find good values for m and b.
Here's our loss function.
And what we're doing is we're taking the results
that we predicted minus the results that we wanted.
We're squaring it.
And we're taking the average.
So that's squared error.
And then as we go through this notebook,
we can see our squared error when we start.
And here's gradient descent from scratch
pretending that we didn't have anything like model.fit.
So for some number of steps, what we're doing is
we're taking our x's, and we're forwarding through the model
to predict our y's.
We're getting the squared error, which is a single number.
And then we're getting the gradients of m
and b with respect to the loss.
And this will literally tell us-- if you print those out,
those aer just numbers.
And the gradients point in the direction of steepest ascent.
So if we move in the direction of gradients,
our loss will increase.
We want the loss to decrease.
So we move in the reverse direction of the gradient,
which is gradient descent.
And here, again, this is like the lowest possible level
way to write this code.
We're doing gradient descent from scratch.
So we don't have any optimizer.
What we're doing is we're taking a step
in the negative gradient multiplied by our learning
rate.
And that adjusts m and b as we go.
And if you run this code, you'll see the losses decreasing.
And you'll see the final values for m and b
and plot the best fit line.
And then what I did for you is I wrote--
this is a little bit uglier.
But I wrote some code to produce this diagram
just so you can see exactly what the gradient descent is doing.
So that's how you would write things
from scratch in TensorFlow 2.0.
And what's really, really awesome,
when you move to things like neural networks,
this code is basically copy and pasted.
So if you can compare this custom training
loop for linear regression to the custom training
loop for DeepDream or any of the fancy models on the website,
it looks almost identical.
You always have this tape, and the steps are the same.
You make a prediction.
You get your loss.
You get the gradients.
And you go from there.
So that's really nice.
All right.
Other things I wanted to mention.
Oh, I got to move a lot faster here.
So in terms of data sets, basically
you have two options in TensorFlow 2.0.
The first option is at the top.
And these are all the existing Keras data
sets that you find at Keras.io.
And these are great.
They're good to start with.
They in NumPy format.
And they're usually really tiny.
They fit into memory no problem.
Then we have this enormous collection of research data
sets, which is awesome.
And that's called TensorFlow Datasets.
And here, I'm showing you how you can download something
like cycle_gan in TensorFlow data sets.
What's important to be aware of, I just
have a couple of quick tips.
If you're downloading a data set in TensorFlow data format,
by the way, it's going to give you
something called-- the data sets not going to be NumPy.
It's going to be in tf.data.
And tf.data, it's a high-performance format
for data.
It's slightly trickier to use than what you might be used to.
And so if you're using TensorFlow data sets,
you have to be very, very careful to benchmark your input
pipeline.
If you just import a data set and try and call
model.fit on the data set, it might be slow.
So it's important to take your time
and make sure that your data pipeline can read images
off disk and things like that efficiently.
And I have just a couple tips that might be helpful.
TensorFlow data sets recently added an in_memory flag.
So if you don't want to write fast input pipelines,
you can pass in_memory.
And you can insert the whole thing into RAM.
So that will make it really easy.
And it also added this caching function,
which is really, really nice.
So here's some code for tf.data.
And maybe we have an image data set,
and we have some code to preprocess the images.
And let's pretend that that preprocessing code is expensive
and we don't want to run it every time, on every epoch.
What you can do is you can add this cache line at the end.
Cache will keep the results of the preprocessing in memory.
And it will make subsequent runs of your pipeline much faster.
So cache is a really handy thing to be aware of.
The goal here is not to give you all the details for this.
It's just to point you to some things that
are useful to know about.
You can also cache to files.
So cache without any parameters will cache it into RAM.
If you pass a file name, you can actually
cache to a file on disk too.
One thing that's awesome in TensorFlow 2.0 that you--
if you're an expert, you'll care about,
and if not, you'll care about down
the road-- is distributed training.
And I'm going to skip some slides to move faster.
What I wanted to say briefly is distributed training
in TensorFlow 2.0 is awesome.
And it's awesome because if you're doing single machine,
multiple GPU synchronous data parallel training
or you're doing multimachine, multi-GPU synchronous data
parallel training, you don't need
to change the code of your model, which
is exactly what I care about.
And so it's awesome.
And basically, here is some Keras model.
This happens to be a built-in application for ResNet.
But it doesn't matter.
And I just want to show you how we run this code on one
machine with multiple GPUs.
We just wrap it in a block.
That's it.
And model.fit is distribute aware and will work.
So you don't need to change your model to run on multiple GPUs.
And this particular strategy is called the MirroredStrategy.
There's different strategies you can
use to distribute your models.
There's another strategy-- it's like Mirrored MultiWorker,
which you can change that one line.
And then if you have a network with multiple machines on it,
again, your code doesn't change.
So that's awesome.
I really, really like the design of this.
All right.
Other things that are awesome about TensorFlow 2.0
that I'd really encourage you to check out, especially
if you're learning or you have students,
is going beyond Python.
So we've talked about training models in Keras.
And I just want to show you some of the cool things you
can do to deploy them.
And roughly, there's a bunch of different ways
that you can deploy your models.
The way that I was used to a few years ago
as a Python developer was I'd throw up REST API.
And I can do that using like TensorFlow Serving or Flask
or whatever you want.
And I'd serve the thing behind an API
because that's what I know how to do.
There's a couple of things that I've been learning since then,
which have been great, and that's basically
deploying models in the browser with TensorFlow.js,
deploying them on Android and iOS
using TensorFlow Lite, and very recently,
running them on Arduino using TensorFlow Lite Micro.
And I have a couple of suggested projects for you
that I just wanted to point you to.
So the first is tinyML.
And this was a blog post--
this was a guest article on our blog
a few weeks ago by the Arduino team.
And this article is basically a tutorial.
And what we're looking at is an Arduino.
It's a microcontroller.
And if you're new to microcontrollers,
it's a system on a chip.
But it also has pins that it can run voltage to
or read voltage from.
And so for instance, you can plug an LED light bulb
into one of those pins, and you can
have C code which runs voltage to the pin
and turns on the light.
Likewise, you could have an accelerometer attached
to one of the pins, and you could
have C code that reads from the accelerometer
and gives you some time series data.
So that's what a microcontroller is.
It's a computer plus these, basically, pins
that you can read and write to.
TensorFlow Lite is the code we used to deploy
TensorFlow models onto phones.
But recently, TensorFlow Lite Micro
now lets you deploy them onto Arduino.
And these things are smaller than a stick of gum.
This is a really nice one.
It has-- this is the Nano that has built-in sensors.
I have one at home.
But it's still about $30.
And it has a built-in accelerometer, a temperature
sensor, stuff like that.
But anyway, what we're looking at here,
this is a demo using the accelerometer someone trained
to model to recognize two gestures.
One is like a punch, and one is an uppercut.
And you can see they're holding the Arduino in their hand.
And as they're moving it, the laptop
is recognizing the gestures.
But the workflow-- and this is in the blog post--
is not bad at all considering how much power you're
getting out of it.
So what you're doing--
and let me see if I can show you the steps.
Basically, the first thing you need to do is capture the data.
And I wanted to bring an Arduino with me.
I thought it would be better just
to quickly show you these GIFs.
But you can plug the Arduino into your laptop
with a USB cable.
You need to collect training data for your model.
So what you do is you hold the Arduino in your hand.
And you collect a bunch of data for your punching gesture.
And you save that to disk as a time series.
And the diagram on the right there--
I was just capturing the IDE this morning--
as you move the accelerator-- as you move the Arduino around,
you're reading from the accelerometer.
And what you get out is just a CSV file with time series data,
exactly like you would have if you were looking
through the time series forecasting tutorial
on TensorFlow.org.
And you can do the same thing for your other gestures.
So what you do is you gather data.
You save CSV files.
You upload them to Colab.
In Colab, you write a model using TensorFlow 2.0 in Keras
to classify the data.
And that's just a regular Python model.
You don't need to know anything special about TensorFlow Lite
to do it.
And then what I wanted to show you--
and you can find the complete code in the blog post--
there's a very small amount of code
that you need to convert your model from Python down
into TensorFlow Lite format to run on device.
And this is a tiny amount of code.
Once you have this model--
and this is a little unusual--
we're going to convert it to a C array.
And that's because this Arduino, it
has like 1 megabyte of disk space
and I think like 256 KB of RAM.
So we're converting it into the smallest,
simplest possible format.
But you convert it to a C array.
And then what you can do-- we have an example.
And you can paste the C array into the example.
And now you're running your TensorFlow model on device.
And it's amazing.
I've had so much fun doing this.
The reason I wanted to point it to you, if you've kids or you
have students that want to play with this stuff,
if you teach them just to train a time series forecasting
model, which is a really valuable skill.
We have a pretty good tutorial for it.
I mean, it's interesting.
But it's vastly more interesting if they can then
deploy it on device.
It gives them something tangible that they can play with
and show their friends.
It's super cool.
Also this is a brand new area.
So tinyML referring to doing machine
learning on small devices I think has a lot of promise.
Another way to deploy your models,
which is super-powerful, is in JavaScript.
And this is using TensorFlow.js.
Here's another project suggestion.
One of the first tutorials you might run through
in deep learning is sentiment analysis.
So given a sentence, predict if it's positive or negative.
Also really valuable skill can be somewhat dry, right?
But the first time I've ever been super-excited about
sentiment analysis, other than many years ago when I found
that I can make money for it--
that was not the link that I wanted.
If you go into this GitHub repo link from the slides,
you will find that you can run sentiment analysis live
in JavaScript.
So for example, this is just a web page.
And down at the bottom here, the movie was awesome.
So I wrote a sentence.
And you can see that's predicted positive.
And here, you can see that's predicted negative.
And what's nice about doing JavaScript in the browser
from a user perspective is there's nothing to install,
which means if you're a Python developer
and your goal is to have a cool demo,
instead of throwing up a REST API, you can create a web page
and share it with your friends.
And what's nice, this model was written
in TensorFlow 2.0 using Keras.
And we have a converter script that
will convert it into TensorFlow.js format
to run in the browser.
So following examples, I am not a JavaScript developer at all.
I can get through the examples.
And it's not too bad.
So it's possible to do.
But it's a really good opportunity
if you have friends that are good JavaScript developers.
This is a huge collaboration opportunity
where you can develop models, and your friends
can help you deploy them in the browser.
Also if you haven't seen it, you can also write models
from scratch in JavaScript.
And I just wanted to show you a couple of demos here.
I'm almost certainly going to accidentally unplug
this laptop.
But another super-convincing thing
about why you might want to do JavaScript in the browser,
this is a model called posenet.
This could be the end of the presentation.
This is a model called posenet.
It's running entirely client-side in the browser.
So nothing's being sent to a server.
And it's not meant for this many people.
But you can see that it's starting to recognize where
people are in the audience.
And what's cool--
I know this is all obvious for web developers.
But I'm not.
So this is all new to me.
For me to do this in Python would have been a nightmare.
Like I would have to be streaming data
from the video camera, sending it to a server,
classifying it server-side, sending the results back.
There's no way I'd be able to do that in real time like that.
Also for privacy reasons, that would not be cool.
But because this is running client-side in JavaScript,
we can do that.
And so immediately, like you can see all the things
that we can do on top of this.
There's other models like that too that are just
really compelling.
And then for people looking for applications,
so there is sentiment analysis.
But a model built right on top of that in the browser
is just a text classification.
But this is multiclass text classification.
And this is a toxicity detector.
It's basically a sentiment analysis model.
But the idea is you're given a sentence,
and you want to figure out if this is like a comment
that you might want to post on YouTube or something like this.
But you can build tools like this
that analyze text privately, quickly, client-side.
So you can imagine, for example, if you
had a job as like a Wikimedia moderator
and you wanted to take a look at article edits
to see if they were something you wanted to publish or not,
you might spend a lot of your time looking
for toxic comments.
With something like this, you could very quickly
preprocess-- you could immediately have code
that right in the web page highlights
the bad parts of the article.
And I need to move a little bit quicker.
So let me just point you to one demo and then I'll stop.
If you're new TensorFlow.js, by far the best demo--
there's two, and they're right here.
One is posenet, which you can get at that link.
There's also Pac-Man.
If you haven't seen Pac-Man, you can
control Pac-Man with your face.
You can train a model live in the browser
to do that, which is awesome.
And then flying through this, last comment,
then learning more.
If you're working in Colab and you're
used to using Keras, what you don't want to do
is import keras.
You want to say from tensorflow import keras.
And that will give you the version
of Keras in TensorFlow 2.0 that's
a superset of regular Keras.
This is only a problem in Colab because Keras is installed
there by default. If you ever see a message that
says using TensorFlow backend, you've
imported the wrong version of Keras.
And then last slide, here are four books that I'd recommend.
The very first book is about TensorFlow 2.0.
And this will give you low-level details.
It's great.
Only buy the second edition.
The first edition teaches TensorFlow 1.0,
which you do not need to learn how
to use to use TensorFlow 2.0.
The second book doesn't mention the word "TensorFlow" at all.
That's the Keras book by Francois Chollet.
But it's outstanding if you're new to deep learning.
It's a perfectly good place to start.
All the code from the second book
will also work inside TensorFlow 2.0
just by saying from tensorflow import keras.
Nothing else will change.
It's all completely good.
The next book is Keras in JavaScript,
so deep learning with JavaScript, which is great.
The fourth book is brand new.
It's by Pete Warden, TinyML.
So thanks very much.
And I'll stop there.
And I'll be around after for questions.
[APPLAUSE]
ASHLEY: Thank you so much, Josh.
Where can we find you?
JOSHUA GORDON: I'll be right outside.
ASHLEY: OK.
Awesome.
All right.
We're going to take a 10-minute break.
For those of you who are standing in the back,
there's some more seats over on this side of the room.
We'll be back in about 10 minutes.
See you soon.
Thanks again.
JOSHUA GORDON: Thanks a lot.