Swift for TensorFlow: 次世代の機械学習フレームワーク (TF Dev Summit '19) (Swift for TensorFlow: The Next-Generation Machine Learning Framework (TF Dev Summit '19))

字幕表動画を再生する

[MUSIC PLAYING]
CHRIS LATTNER: Hi, everyone.
I'm Chris.
And this is Brennan.
And we're super excited to tell you about a new approach
to machine learning.
So here in the TensorFlow team, it
is our jobs to push the state of the art
in machine learning forward.
And we've learned a lot over the last few years
with deep learning.
And we've incorporated most of that all into TensorFlow 2.
And we're really excited about it.
But, here, we're looking a little bit further
beyond TensorFlow 2.
And what do I mean by further?
Well, eager mode makes it really easy to train a dynamic model.
But deploying it still requires you take that and then write
a bunch of C++ code to help drive it.
And that could be better.
Similarly, some researchers are interested in taking machine
learning models and integrating them into larger applications.
That also often requires writing C++ code.
We always want more flexible and expressive autodifferentiation
mechanisms.
And one of things we're excited about
is being able to define reusable types that
then can be put into new places and used
with automatic differentiation.
And we always love improving your developer workflow.
We want to make you more productive
by taking errors in your code and bringing them
to your source and also by just improving your iteration time.
Now, what we're really trying to do here
is lift TensorFlow to entirely new heights.
And to do that, we need to be able to innovate
at all levels of the stack.
This includes the compiler and the language.
And that's what Swift for TensorFlow is all about.
We think that applying new solutions to old problems
can help push machine learning even further than before.
Well, let's jump into some code.
So first, what is Swift?
Swift is a modern and cross-platform programming
language that's designed to be easy to learn and use.
Swift uses types.
And types are great, because they can
help you catch errors earlier.
And also, they encourage good API design.
Now, Swift uses type inference, so it's really easy to use
and very elegant.
But it's also open source and has an open language evolution
process, which allows us to change the language
and make it better for machine learning which is really great.
Let's jump into a more relevant example.
This is how you define a simple model in Swift for TensorFlow.
As you can see, we're laying out our layers here.
And then we can find a forward function, which composes them
together in a linear sequence.
You've probably noticed that this looks a lot like Keras.
That's no accident, of course.
We want you to be able to take what you know about Keras
and bring it forward into this world as well.
Now, once we have a simple model, let's train it.
How do we do that?
All we have to is instantiate our model,
pick an optimizer and some random input data,
and then pick a training loop.
And, here, we'll write it by hand.
One of the reasons we like writing by hand
is that it gives you the maximum flexibility
to play with different kinds of constructs.
And you can do whatever you want, which is really great.
But some of the major advantages of Swift for TensorFlow
are the workflow.
And so instead of telling you about it, what do you think,
Brennan, should be show them?
BRENNAN SAETA: Let's do it.
All right, the team has thought long and hard
about what's the easiest way for people to get started
using Swift for TensorFlow.
And what could be easier than just opening up a browser tab?
This is Google Colab, hosted Jupyter notebooks.
And it comes with Swift for TensorFlow built right in.
Let's see it in action.
Here is the layer model, the model
that Chris just showed you a couple of slides ago.
And we're going to run it using some random training
data right here in the browser.
So we're going to instantiate the model.
We're going to use the stochastic gradient descent SGD
optimizer.
And here we go.
We have now just trained a model using
Swift for TensorFlow in our browser on some training data
right here.
Now, we can see the training loss is decreasing over time.
So that's great.
But if you're ever like me and whenever I try and use
machine learning in any application,
I start with a simple model.
And I've got to iterate.
I've got to tweak the model to make it fit better
to the task at hand.
So since we're trying to show you the workflow,
let's actually edit this model.
Let's make it more accurate.
So here we are.
Now, let's think a little for a moment.
What changes do we want to make to our model?
Well, this is deep learning after all.
So the answer is always to go deeper, right?
But you may have been following the recent literature in state
of the art in that not just sequential layers,
but skip connections or residual connections
are a really good idea to make sure your model continues
to train effectively.
So let's go through and actually add an extra layer
to our model.
Let's add some skip connections.
And we're going to do it all right now in under 90 seconds.
Are you ready?
All right, here we go.
So the first thing that we want to do
is we need to define our additional layer.
So we're going to fill in this dense layer.
Whoops.
Flow.
And one thing you can see is that we're
using Tab autocomplete to help fill
in code as we're trying to develop and modify our model.
Now, we're going to fix up the shapes right here really
quick, so that the residual connections will all work.
If I can type properly, that would go better.
All right, great.
We have now defined our model with the additional layers.
All we need to do is modify the forward pass,
so that we add those skipped connections.
So here we go.
The first thing we need to do is we
need to store in a temporary variable
the output of the flattened layer.
Then we're going to feed the output of the flattened layer
to our first dense layer.
So dense.applied to tmp in context.
Now, for the coup de grace, here is our residual connection.
So dense2.applied to tmp + tmp2 in context.
Run that.
And, yes, that works.
We have now just defined a new model
that has residual connections and is
one additional layer deeper.
Let's see how it does.
So we're going to reinstantiate our model
and rerun the training loop.
And if you recall from the loss that we saw before,
this one is now substantially lower.
This is great.
This is an example of what it's like to use Swift
for TensorFlow to develop and iterate as you apply models
to applications and challenges.
But Swift for TensorFlow-- thank
[APPLAUSE]
But Swift for TensorFlow was designed for researchers.
And researchers often need to do more than just change models
and change the way the architecture fits together.
Researchers often need to define entirely
new abstractions or layers.
And so let's actually see that live right now.
Let's define a new custom layer.
So let's say we had the brilliant idea
that we wanted to modify the standard dense layer that
takes a weights and biases and we
wanted to add an additional bias set of parameters, OK?
So we're going to define this double bias dense layer right
here.
So I'm going to type this really quickly.
Stand by 15 seconds.
Here we go.
[LAUGHTER]
Woo, all right, that was great.
So let's actually walk through the codes
that you can see what's going on.
So the first thing that we have is we define our parameters.
So these are a W, like our weights for our neurons,
and B1, bias one, and B2, our second bias.
We defined an initializer that takes
an input size and an output size just like dense does.
We use that to initialize our parameters.
The forward pass is very simple to write.
So here's just applied to.
And we just take the matrix multiplication of input
by our weights, and we add in our bias terms.
That's it.
We've now just defined a custom layer right in Colab
in just a few lines of code.
All right, let's see how it goes.
Here's model two.
And so we're going to use our double bias dense layer.
And we're going to instantiate and.
We're going to train it using, again, our custom
handwritten training loop.
Here's an example of another way that we
think Swift for TensorFlow makes your life easier.
Because Swift for TensorFlow can statically analyze your code,
it can be really helpful to you.
I don't know about you, but I regularly put typos in my code.
I don't if you saw me typing earlier.
And Swift for TensorFlow here is helping you out, right?
It's saying, look, you mistyped softmaxCrossEntropy.
This should be labels, OK?
All right, so we run it.
We train it.
And our loss isn't as good.
This was not the right idea.
But this is an example of how easy
it is for researchers to experiment with new ideas
really easily in Swift for TensorFlow.
But let's go deeper.
Swift for TensorFlow is, again, designed for researchers.
And researchers need to be able to customize everything, right?
That's the whole point of research.
And so let's show an example of how
to customize something other than just a model or a layer.
So you may have heard that large GPU clusters or TPU super pods
are, like, delivering massive breakthroughs in research
and advancing the state of the art in certain applications
and domains.
And you may have also heard that, as you scale up
to effectively utilize these massive hardware pools,
you need to increase your batch size.
And so let's say you're a researcher,
and you want to try and figure out
what are the best ways to train deep neural networks at larger
batch sizes.
Well, if you're a researcher, you probably
can't buy a whole GPU cluster or rent a whole TPU super pod all
the time for your experiments.
But you often have a GPU under your desk.
So let's see how we can simulate running on a super large data
parallel GPU or TPU cluster on a single machine.
We're going to do it all in a few lines of code right here.
So here's our custom training loop.
Well, here's the standard part, right?
This is 1 to 10 training epics.
And what we're going to do is, instead of just applying
our model forward once, we have an additional inner loop,
right?
So we're going to run our forward pass.
We're going to run our model--
whoops-- four times.
And we're going to take the gradients for each step.
And we're going to aggregate them in this grads variable.
OK?
This simulates running on four independent accelerators, four
GPUs or four TPUs in a data parallel fashion
on a batch that's actually four times as large
as what we actually run.
We're going to then use our optimizer
to update our model along these aggregated gradients,
again simulating a data parallel synchronous training process.
That's it.
That's all there is to it.
We're really excited by this sort of flexibility
and capabilities that Swift for TensorFlow
brings to researchers.
Back over to you, Chris.
CHRIS LATTNER: Thanks, Brennan.
[APPLAUSE]
So I think that the focus on catching errors early and also
productivity enhancements like code completion
can help you in a lot of ways.
And it's not just about, like, automating typing of code.
But it can also be about discovery of APIs.
So another thing that's really cool about Swift as a language
is that it has really good interoperability with C code.
And so in Swift, you can literally just
import a C header file and call symbols directly
from C without wrappers, without boilerplate or anything
involved.
It just works.
So we've taken this approach.
In the TensorFlow team, we've taken this approach
and brought it to the world of Python.
And one of the cool things about this
is that that allows you to combine the power of Swift
for TensorFlow with all the advantages of the Python
ecosystem.
How about we take a look?
BRENNAN SAETA: Thanks, Chris.
The Python data science ecosystem
is incredibly powerful and vibrant.
And we wanted to make sure that, as you start
using Swift for TensorFlow, you didn't
miss all your favorite libraries and utilities that you
were used to.
And so we've built a seamless Python interoperability
capability to Swift for TensorFlow.
And let's see how it works in the context
of my favorite Python data science library, NumPy.
So the first thing you need to do
is import TensorFlow and import Python.
And once you do that, that defines this Python object
that allows you to import arbitrary Python libraries.
So here we import pyplot from the matplotlib library
and NumPy.
And we assign it to np, OK?
After that, we can just use np just as if we were in Python.
So, here, we call linspace.
We're going to call sine and cosine.
And we're going to pass those values to pyplot.
When we run the cell, it just works exactly as you'd expect.
[APPLAUSE]
Thank you.
[APPLAUSE]
Now, this sort of kind of looks like the Python code
you're used to writing, but this is actually pure Swift.
It just works seamlessly.
But this is maybe a bit of a toy example.
So let's see this a little bit more in context.
OpenAI has done a lot of work in the area
of reinforcement learning.
And in order to help that along, they
developed a Python library called OpenAI Gym.
Gym contains a collection of environments
that are very useful when you're trying
to train a reinforcement learning agent
across a variety of different challenges.
Let's use OpenAI Gym to train a reinforcement learning
agent in Swift for TensorFlow right now our browsers.
So the first thing we need to do is we need to import Gym.
We're going to define a few hyperparameters here.
And, now, we define our neural network.
In this case, we're going to pick
a simple two-layer dense network.
And it's just a sequential model, OK?
After that, we have some helper code
to filter out bad or short episodes and whatnot.
But here's the real meat of it.
We're going to use Gym to instantiate
the CartPole v0 environment.
So that's our env.
We're going to then instantiate our network right
here and our optimizer.
And here's our training loop.
There we go.
So we're going to get a bunch of episodes.
We're going to run our model, get the gradients.
And we're going to apply those to our optimizer.
And we're going to record the mean rewards as we train, OK?
It's all very simple, straightforward Swift.
And here you can see us training a Swift for TensorFlow model
in an OpenAI Gym environment using the Python bridge,
totally seamless.
And of course, afterwards, you can
keep track of the parameters of the rewards.
In this case, we're going to plot the mean rewards
as the model trained using Python NumPy, totally seamless.
You can get started using Swift for TensorFlow using
all the libraries you know and love
and take advantage of what Swift for TensorFlow
brings to the table.
Back over to you, Chris.
CHRIS LATTNER: Thanks, Brennan.
So one of the things that I love about this is it's
not just about being able to leverage
big important libraries like NumPy.
We're working on the ability to integrate Swift for TensorFlow
and Python for TensorFlow code together,
which we think will provide a nice transition
path to make you able to incrementally move code
from one world to the other.
Now, I think it's fair to say that calculus
is an integral part of machine learning.
[LAUGHTER]
And we think that differentiable programming
is so important that we've built it right into the language.
This has a number of huge advantages,
including enabling more flexible and custom work
with differentiables, with derivatives.
And we think this is really cool.
So I'd like to take a look.
BRENNAN SAETA: So we've been using
Swift for TensorFlow's differential programming
capabilities throughout all of our demos so far.
But let's really break it down and see what's going on
at a fundamental level.
So here we define my function that
takes two doubles and returns a double based on some products,
and sums, and quotients.
If we want Swift for TensorFlow to automatically compute
the derivative for us, we just annotate it at differential.
Swift for TensorFlow will then derive the derivative
for this function right when we run the cell.
To use this autogenerated derivative, use gradient.
So gradient takes two things.
It takes a closure to evaluate and a point that you want
to evaluate your closure at.
So here we go.
This is what it is to take the derivative of a function
at a particular point.
So we can change it surround.
This one's my favorite tasty number.
And that works nicely.
Now, one thing to note, we've just
been taking the partial derivatives of my function
with respect to a.
But, of course, you can take the partial derivatives
and get a full gradient of my function, like so.
Often with neural networks, however, you
want to get not just the gradients for your network
as you're trying to train it and optimize your loss function.
You often want what the network predicted, right?
This is really useful to compute accuracy or other debugging
sort of information.
And for that you can use value with gradient.
And that returns a tuple containing
both the value and the gradient, shockingly enough.
Now, one thing to note, in Swift,
tuples can actually have named parameters.
They aren't just ordered.
And so you can actually see that it prints out really nicely.
And you can access values.
We think this is, again, another nice little thing that helps
makes writing and debugging code and, more importantly,
reading it and understanding it later a little bit easier.
But the one thing that I want to call out
is that throughout this we've been using just normal types.
These aren't tensor of something.
It's just plain old double.
This is because automatic differentiation
is built right into the language in Swift for TensorFlow.
It makes it really easy to express your thoughts
very clearly.
But even though it's built into the language,
we've actually worked very hard to make sure
that automatic differentiation is totally flexible so that you
can customize it to whatever it needs you have.
And instead of telling you about that, let's show you.
So let's say you want to define an algebra in 2D space.
Well, you're certainly going to need a point data type.
So here we define a point struct with x and y.
And we just market differentiable.
We can define helper functions on it like dot or other helper
functions.
And Swift for TensorFlow, when you try and use your code,
will often automatically infer when
you need gradients to be automatically computed
for you by the compiler.
But often, it's a good idea to document your intentions.
And so you can annotate your helper functions
as @differentiable.
The other reason why we recommend doing this
is because this helps catch errors.
So here, Swift for TensorFlow is actually
telling you that, hey, you can only differentiate functions
that return values that conform to differentiable.
But int doesn't conform to differentiable, right?
What this is telling you is that my helper function
returns an int.
And int is all about taking infinitesimally small steps
as you optimize and take gradients, right?
And integers just are very discrete.
And so Swift for TensorFlow is helping to catch errors,
you know, right when you write the code very easily
and tell you what's going on.
So the solution, of course, is just
to not mark that as @differentiable.
The cell runs just fine.
But let's say we also wanted to go beyond just defining the dot
product.
Let's say we also wanted to define the magnitude helper
function.
That is the magnitude of the vector defined by the origin
to the point in question.
So to do that, we can use the distance formula
if you're going to do Euclidean distance.
And we can define an extension on point that does this.
But we're going to pretend for a moment
that Swift doesn't include a square root function,
because I want a good excuse for you
to see the interoperability with C, OK?
So we're actually going to use C's square root function that
operates on doubles, OK?
So based on the definition of Euclidean distance,
we can define the magnitude.
And it totally just--
no, it doesn't quite work.
OK.
Let's see what's going on.
So we wanted magnitude to be differentiable.
And it's saying that you can't differentiate the square root
function, because this is an external function that hasn't
been marked as differentiable.
OK.
What's that saying?
Well, the square root, it's a C function.
It was compiled by the C compiler.
And as of today, the C compiler can't automatically
compute derivatives for you.
So Swift for TensorFlow is saying like, hey, this
isn't going to work.
This is excellent, because it gives me a great excuse
to show you how to write custom gradients.
All right, so here we define a wrapper function,
mySqrt square root, that just calls down
in the forward pass to the C square root function.
In the backwards pass, we take our double
and we return a tuple of two values.
Rather, the first element in the tuple
is the normal value in the forward pass.
And the second is a pullback closure.
And this is where you define the backwards pass
capturing whatever values you need from the forward pass.
OK?
So we're going to run that.
We're going to go back up to our definition of magnitude
and change it from square root to my square root,
rerun the cell, and it works.
We've now defined point and two methods
on it, dot and magnitude.
And we can now combine these in arbitrary
other silly differentiable functions.
So, here, I've defined the silly function.
And we've marked it as differentiable.
And we're going to take two points.
We're also going to take a double, right?
You can mix and match differentiable data types
totally fluidly.
We're going to return double, and we're
going to take magnitudes and do dot products.
It's a silly function after all.
And we can then use it, compute the gradient of this function
at arbitrary data points.
Just like you'd expect, you can get the value of the function,
get full gradients in addition to partial derivatives
with respect to individual values.
That's been a quick run through of how
to use customization, custom gradients, custom data
types with a language integrated automatic differentiation built
into Swift for TensorFlow.
But let's go one step further.
Let's put all this together and show
how you can write your own debuggers
as an example of how this power is all in your hands.
So often when you're debugging models,
you often want to be able to see the gradients
at different points within your model.
And so here, we can just define in regular Swift code
a gradient debugger.
Now, it's going to take as input a double.
And it's going to return it just like normal
for the forward pass, right?
It's an identity function.
On the backwards pass, we're going to get the gradient.
We're going to print the gradient.
And then we're going to return it.
So we're just passing it through just printing it out.
Now that we've defined this gradient debugger ourselves,
we can use it in our silly function
to see what's going on as we take derivatives.
So gradient debugger, there we go.
We can rerun that.
And when we take the gradients, we can now see that for that
point in the silly function of a dot b, the gradient is 3.80.
That's been a brief tour through how automatic differentiation
works in Swift for TensorFlow and how it's customizable
so that you can harness the power in whatever abstractions
or systems you need to build.
Back over to you, Chris.
CHRIS LATTNER: Thanks, Brennan.
[APPLAUSE]
So the funny thing about all this
is that the algorithms that we're building on
were defined back in the 1970s.
And so it really took language integration
to be able to bring these things forward
into the world of machine learning.
There's a tremendous amount of depth here.
And I'm really excited to see what you all can do with it.
And we think that this is going to enable new kinds of research
which we're very excited about.
There's also a ton of depth.
And if you're interested in learning more,
we have a bunch of detailed design documents
available online.
Let's talk about performance a little bit.
Now, Swift is fast.
And this comes from a number of different things, one of which
is that the language itself has really good low level
performance.
There's also no GIL to get in the way of concurrency.
Swift for TensorFlow also has some advanced compiler
techniques to automatically identify graphs for you
and extract them.
So you don't have to think about that.
The consequence of all this together
is that we think Swift has the world's
most advanced eager mode.
Now, you may not care that much about performance.
You may wonder, like, why do we care about this stuff?
Well, we're seeing various trends
in the industry where people are defining neural nets
and then want to integrate them into other larger applications.
And typically what this requires is this requires you to export
graphs and then write a bunch of C++ code to load
and orchestrate them in various ways.
So let's take a look at an example of this.
AlphaGo Zero is really impressive work
that combines three major classes of techniques.
Of course, you have deep learning on the one hand.
But it also drives it through Monte Carlo tracers to actually
find and evaluate these spaces.
And then it runs them at scale on industry
leading TPU accelerators.
And so it's the combination of all three of these things
that make AlphaGo Zero possible.
Now, this is possible today.
And if you're an advanced team like DeepMind,
you can totally do this.
But it's much more difficult than it should be.
And we think that breaking down barriers like this
can lead to new breakthroughs in science.
And we think that this is what can drive progress forward.
So instead of talking about it again, let's take a look.
BRENNAN SAETA: MiniGo is an open source go
player inspired by DeepMind's AlphaGo Zero project.
It's available on GitHub.
And you can certainly check out the code.
And I encourage you to.
They're also going to be here.
And they have some other presentations tomorrow.
But the MiniGo project, when they started out,
they were getting everything in normal TensorFlow.
And it was working great until they
started trying to run at scale on large clusters of TPUs.
There, they ran into performance problems and had to rewrite
things like Monte Carlo tree search into C++ in order
to effectively utilize modern accelerators.
Here, we've reimplemented Monte Carlo tree
search and the rest of the MiniGo self-play in pure Swift.
And we're going to let you see it running right here in Colab.
So here we define a helper function
where we take in a game configuration
and a couple of participants.
These are our white and black players.
And we're going to run, basically, play
the game until we have a winner or loser.
And so let's actually run this.
Here, we define a game configuration.
We're going to play between a Monte Carlo tree search powered
by neural networks versus just a random player just
to see how easy it is to flip back and forth
or mix and match between deep learning
and other arbitrary machine learning algorithms
right here in Swift.
So here you go.
You can see them playing white, black, playing different moves
back and forth.
And it just goes.
We think that Swift for TensorFlow is going to unlock
whole new classes of algorithms and research,
because of how easy it is to do everything in one language with
no barriers, no having to rewrite things into C++.
Back over to you, Chris.
CHRIS LATTNER: Thank, Brennan.
The cool thing about this, of course,
is that you can actually do something
like this in a workbook, which is pretty phenomenal.
And we've seen many different families of new techniques
that can be combined together and fused in different ways.
And bringing this to more people we think
will lead to new kinds of interesting research.
Now, our work on usability and design
is not just about high-end researchers.
So we love them, but Swift is also
widely used to teach new programmers how to code.
And education is very close to our hearts.
And so I'm very excited to announce a collaboration
that we're embarking on with none other than Jeremy Howard.
But instead of talking about this,
I'd rather have Jeremy speak about it now.
JEREMY HOWARD: At Fast.AI, we're always
looking to push the boundaries of what's
possible with deep learning, especially
pushing to make recent advances more accessible.
We've been involved with setting image net speed
records at a cost of just $25 and building the world's best
document classifier.
Hundreds of thousands have become deep learning
practitioners through our courses
and are producing state of the art results with our library.
We think that with Swift for TensorFlow,
we can go even further.
So we're announcing today that our next course will include
a big Swift component co-taught by someone
that knows Swift pretty well.
BRENNAN SAETA: Chris, I think he means you.
CHRIS LATTNER: Yeah, we'll see how this goes.
So I'm super excited to be able to help
teach the next generation of learners.
But I'm also really excited that Jeremy
will be bringing his expertise in API design
and helping us shape the high level
APIs in Swift for TensorFlow.
So we've talked about many things.
But the most important part is Swift
for TensorFlow is really TensorFlow at its core.
And we think this is super important,
because we've worked really hard to make sure that it integrates
with all the things going on in the big TensorFlow family.
And we're very excited about that.
Now, you may be wondering where you could get this.
So Swift for TensorFlow is open source.
You can find out it on GitHub now.
And you can join our community.
It also works great in Colab as you've seen today.
We have tutorials.
We have examples.
And all the demos you saw today are available now
in Colab, which is great.
We've also released our 0.2 release, which
includes all the basic infrastructure
and underlying technology to power these demos and examples.
And we're actively working on high level APIs right now.
So this is not ready for production
yet as you could guess.
But we're very excited about shaping this future,
building this out, exploring this new programming model.
And this is a great opportunity for advanced researchers
to get involved and help shape the future of this platform.
So we'd love it for you to try it out and let
us know what you think.
Thank you.
[APPLAUSE]
[MUSIC PLAYING]