Placeholder Image

字幕表 動画を再生する

  • ALEX PASSOS: I'm Alex Passos, and I'm

  • here again to talk about functions not sessions.

  • This function is the new way of using graphs

  • in TensorFlow in TF2.

  • All the material I'm going to cover here,

  • the design and the motivation, is mostly

  • described in one of the RFCs in the TensorFlow community GitHub

  • repo.

  • So if you go to GitHub.com/tenso rflow/communities/rfcs,

  • you will see on our see an RFC with exactly this title

  • in there where we go for a bunch of the motivation and a bunch

  • of the high-level design.

  • So here I'm mostly going to focus

  • on some nitty gritty details of the motivation

  • and more details about the implementation.

  • And things that if you're working on TensorFlow

  • and you're using functions to do something

  • or you're curious about function internals,

  • I hope to at least point you to the right

  • places to start reading the code to understand what's happening.

  • I'm mostly going to focus today on the high-level Python

  • side of things.

  • And there's another training session later.

  • I think the title's going to be eager execution runtime.

  • That's going to focus more on the C++ side of stuff.

  • So I think to understand functions,

  • it helps if you understand where we're coming from,

  • which is the session about run world in TensorFlow one.

  • And I think in TF1, when TF was originally designed,

  • it was designed as a C++ runtime first and only later came

  • a Python API.

  • And as far as a C++ runtime goes,

  • the API of graphs and sessions is pretty reasonable.

  • So you build a graph by some function

  • that the runtime does not care about,

  • and then you connect to the runtime

  • by opening this session.

  • This connection is important because a runtime can

  • be local, can be distributed.

  • There are all sorts of in between things.

  • And to actually run computation, you just call session at run.

  • Because you have a graph, you give it

  • the names of your inputs, the names of your outputs,

  • the names of particular nodes that you want to run.

  • And the runtime will go do its thing and return to you

  • the results in C++ normal arrays that you can use to manipulate

  • your data.

  • So this is very nice and convenient if you're writing

  • in C++ and if you're programming at this level.

  • You generally write the code that looks like this once,

  • and you spend your entire life as a TensorFlow developer

  • writing the little part that I abstracted out

  • called BuildMyGraph.

  • And I think it's an understatement

  • to say that just manually writing protocol buffers

  • is very awkward.

  • So we very, very quickly decided this is not a good way to go

  • and built an API around it.

  • And the first version of the API was very explicit.

  • So you created a graph, and then every time you created an op,

  • you pass the graph as an argument,

  • and this is still fine because it's very explicit

  • that you're building a graph.

  • So you can have this mental model

  • that you're building a graph that then you're

  • going to give to a runtime to execute.

  • This is not really idiomatic Python code.

  • So it's also very easy to see how to make

  • this idiomatic Python code.

  • You just stick the graph in a global context manager

  • and add a bunch of operator overloads and things like that.

  • And you end up with code that looks

  • like what TensorFlow code looks like today, which

  • is a little unfortunate, because the same code, by reading it,

  • you can't really tell whether an object is a tensor,

  • and hence, only has a value doing

  • an execution of a session, and is

  • this the third quantity et cetera, et cetera, that

  • has a name that might have some known

  • properties about the shape, but not all.

  • Or if this is just a normal Python object or non-pi array.

  • And this creates a lot of confusion

  • and I think leads to a very unnatural programming model.

  • The session.run thing also has a granularity problem,

  • which is that in the way it was originally built,

  • the graph, like, the stuff that you pass a session of run,

  • is a quantum of all the stuff you want to execute.

  • And around it is this very rigid boundary

  • where you keep stuff in the host memory of your client program,

  • give it to session.run, and then get results back

  • into host memory of your client program.

  • So one example that I think is illustrative of why this is not

  • ideal is if you have a reinforcement learning agent

  • that's implemented over a recurrent neural network,

  • in that scenario, your agent's going to run a loop where it's

  • going to read an observation from your environment, which

  • is some arbitrary code that runs in your host,

  • and has some state.

  • The states initialize at 0.

  • They look at the observation.

  • It runs it for a neural network.

  • And that neural network spits out a new stage and an action

  • for the agent to perform in the environment.

  • You take that action, bring it to client memory,

  • give it to the C++ code for an environment, your Atari game,

  • or whatever.

  • That will run for a while and then give you back a new state.

  • You want to shift this new observation,

  • you want to ship this new observation in the old state

  • back to the RNN.

  • But if your RNN is running on another device, say, a GPU,

  • there was really no reason for you to ship your RNN state back

  • to your client and then from the client back to the device.

  • So the boundary for stuff you want to run here

  • is not really working.

  • The boundary for stuff you want to run

  • is not the same as the boundary for stuff

  • that wants to live in on a device

  • or wants to live in the host.

  • And this gets even more complicated

  • once you put automatic differentiation into the story,

  • because TensorFlow uses the symbolic representation

  • for your computation that we call a graph.

  • We do automatic differentiation on

  • this symbolic representation.

  • So now the graph not only has to be a quantum for stuff

  • you want to run, but it has to be a quantum for stuff you

  • differentiate.

  • So if you stick to this reinforcement learning agent

  • example, a popular thing that people

  • used to do before we have now substantially better

  • deep reinforcement learning algorithms is policy gradient.

  • And the simplest policy gradient, it's got reinforced.

  • And what it amounts to doing is it will run your agent

  • for your m time steps.

  • You'll get a probability for the agents

  • to take the actions that it actually took.

  • And you'll take the gradient of that probability,

  • multiply by the [? reward ?] your agent got,

  • and apply this to the weights.

  • And now not only do we want to avoid transferring the RNN

  • state back and forth between the host and your accelerator,

  • but also you want to back prop through a number of steps

  • that might not even be known before you

  • start your computation.

  • Another issue is that session.run has a kind

  • of like--

  • it asks for too much information every single time.

  • So what every training loop or inference loop or anything user

  • TensorFlow looks like is not--

  • well, not every, but what most look

  • like is not a single culture session.run,

  • but a bunch of culture session.run in the loop.

  • And in all those calls, you're executing the same tensors,

  • you're fetching the same tensors,

  • and you're feeding the same symbolic tensors slightly

  • different numerical values.

  • And because the session.run API doesn't

  • know that you're going to be calling

  • those things in a loop where most of the arguments-- where

  • some things don't change and some things do change,

  • it has to re-perform a bunch of validation.

  • And so we put a cache in front of that validation.

  • And a cache key becomes a performance problem.

  • Derek had the idea of just separating the stuff

  • the changes from the stuff that doesn't change

  • into this session.makecallable API

  • where you call it once with this stuff that doesn't change,

  • and you get back a function that you

  • call with just the stuff that changes.

  • So now all the validation that you're performing n times

  • is off the stuff that changed.

  • And the validation that you're performing only once

  • is of the stuff that stays the same.

  • This is not just a performance win,

  • but it's also kind of a usability win,

  • because just by looking at the call to your code,

  • you know what is fixed and what is variant.

  • And finally, the last like, very awkward thing about session.run

  • is that graph pruning is a very complicated model to program

  • to when you're writing in an imperative host programming

  • language.

  • So for example, I have my first function in there

  • where I create a variable, I assign a value to it,

  • I incremented a little bit, and then

  • it returns something that uses the variable time

  • some constant.

  • And if you just write code like this, because I didn't look

  • at the return value of a [INAUDIBLE] assign add,

  • that assignment will never happen.

  • Like, there's no way to make that assignment happen

  • in TensorFlow because you created a tensor

  • and you threw it away, and you did not

  • keep a reference to it so that you can session.run it later.

  • And you think well, that's crazy.

  • Why don't you just keep those references under the hood

  • and do something magical to fix it?

  • And the problem is that it's very easy for you

  • as a user to rely on the fact that this pruning is going

  • to be performed to try to encapsulate

  • your code a little better.

  • So design pattern that I've seen a lot

  • is that when you have some structure--

  • so for example, my fn2 there has a reinforcement

  • learning environment.

  • And that nth object is some complicated Python thing

  • that knows how to build a bunch of graphs.

  • And you can encapsulate that in a single function

  • in your code that returns how to get the current observation,

  • how to apply an action, and how to reset that environment.

  • So your code is now very concise,

  • but in practice, you have a function

  • that returns three things.

  • And you never want those three things to run together.

  • You always want, at most, one of them

  • to run at any point in time.

  • So this is a little frustrating, because we've

  • kind of locked ourselves out of being able to fix this problem.

  • And TensorFlow has a few partial solutions to this problem.

  • I think the most comprehensive [? parts ?] solution

  • to the problems in session.run is called a partial run.

  • But it's inherently limited, because it requires you

  • to have a fully enrolled graph.

  • It does not work with arbitrary control flow.

  • And it requires a complicated dance of like,

  • specifying everything you're likely you're

  • going to fetch in the future, then

  • the things you're going to fetch now,

  • and keeping them, passing tensor handles around.

  • And it's very, very easy to make mistakes when you're doing it.

  • Plus, what happens is that you as a user often

  • writes a Python function.

  • TensorFlow then runs the function to create a graph.

  • Then we take a graph, we validate it, we prune it,

  • we do a bunch of transformations.

  • And we hope that what we got out to run

  • is exactly the nodes that you had

  • intended to run in that Python function in the first place.

  • But because we have all these steps in the middle,

  • it's very easy to drop things and confuse things.

  • So all these usability problems are

  • inherent I think, to coupling this session run

  • API with this host programming language that

  • tries to make your code look very imperative,

  • like native Python code.

  • And so the way we break this and solve those problems

  • is with tf.function.

  • So what are the core ideas of tf.function?

  • It's that your functions inputs and outputs,

  • they live on devices.

  • They don't have to live on the host.

  • Another thing is that a function is differentiable,