Placeholder Image

字幕表 動画を再生する

  • [MUSIC PLAYING]

  • ALEXANDRE PASSOS: Hello, my name is Alex,

  • and I work on TensorFlow.

  • I am here today to tell you all a little bit about how

  • you can use TensorFlow to do deep learning research more

  • effectively.

  • What we're going to do today is we're

  • going to take a little tour of a few TensorFlow features that

  • show you how controllable, flexible, and composable

  • TensorFlow is.

  • We'll take a quick look at those features, some old

  • and some new.

  • And not, by far, all the features

  • are useful for research.

  • But these features let you accelerate

  • your research using TensorFlow in ways

  • that perhaps you're not aware of.

  • And I want to start by helping you control how TensorFlow

  • represents state.

  • If you've used TensorFlow before,

  • and I am sure you have at this point,

  • you know that a lot of our libraries

  • use TF variables to represent state,

  • like your model parameters.

  • And for example, a Keras dense layer

  • has one kernel matrix and an optional bias

  • vector stored in it.

  • And these parameters are updated when you train your model.

  • And part of the whole point of training models

  • is so that we find out what value those parameters should

  • have had in the first place.

  • And if you're making your own layers library,

  • you can control absolutely everything about how

  • that state is represented.

  • But you can also crack open the black box

  • and control how state is represented,

  • even inside the libraries that we give you.

  • So for example, we're going to use this little running example

  • of what if I wanted to re-parametrize a Keras

  • layer so it does some computation to generate

  • the kernel matrix, say to save space

  • or to get the correct inductive bias.

  • The way to do this is to use tf.variable_creator_scrope.

  • It is a tool we have that lets you take control of the state

  • creation process in TensorFlow.

  • It's a context manager, and all variables created under it

  • go through a function you specify.

  • And this function can choose to do nothing.

  • It can delegate.

  • Or it can modify how variables are created.

  • Under the hood, this is what distributionstrategy.scope

  • usually implies.

  • So it's the same tool that we use

  • to build TensorFlow that we make available to you,

  • so you can extend it.

  • And here, if I wanted to do this re-parametrization of the Keras

  • layer, it's actually pretty simple.

  • First, I define what type I want to use to store those things.

  • Here, I'm using this vectorize variable type,

  • which is a tf.module.

  • tf.modules are a very convenient type.

  • You can have variables as members,

  • and we can track them automatically

  • for you and all sorts of nice things.

  • And once we define this type, it's

  • really just a left half and right half.

  • I can tell TensorFlow how do I use

  • objects of this type as a part of TensorFlow computations.

  • And what we do here is we do a matrix multiplication

  • of the left component and the right component.

  • And now that I know how to use this object, I can create it.

  • And this is all that I need to make

  • my own little, variable_creator_scope.

  • In this case, I want to peek at the shape.

  • And if I'm not creating a matrix,

  • just delegate to whatever TensorFlow

  • would have done, normally.

  • And if I am creating a matrix, instead

  • of creating a single matrix, I'm going

  • to create this factorized variable that

  • has the left half and the right half.

  • And finally, I now get to just use it.

  • And here, I create a little Keras layer.

  • I apply it.

  • And I can check that it is indeed using

  • my vectorized representation.

  • This gives you a lot of power.

  • Because now, you can take large libraries of code

  • that you did not write and do dependency injection

  • to change how they behave.

  • Probably if you're going to do this at scale,

  • you might want to implement your own layer

  • so you can have full control.

  • But it's also very valuable for you

  • to be able to extend the ones that we provide you.

  • So use tf.variable_creator_scope to control the stage.

  • A big part of TensorFlow and why we

  • use these libraries to do research at all,

  • as opposed to just writing plain Python code,

  • is that deep learning is really dependent on very

  • fast computation.

  • And one thing that we're making more and more

  • easy to use in TensorFlow is our underlying compiler, XLA, which

  • we've always used for TPUs.

  • But now, we're making it easier for you to use

  • for CPUs and GPUs, as well.

  • And the way we're doing this is using tf.function with

  • the experimental_compile=True annotation.

  • What this means is if you mark a function as a function

  • that you want to compile, we will compile it,

  • or we'll raise an error.

  • So you can trust the code you write

  • inside a block is going to run as quickly as if you had

  • handwritten your own fuse TensorFlow kernel for CPUs,

  • and a Fuse.ko kernel, and then all the machinery, yourself.

  • But you get to write high level, fast, Python TensorFlow code.

  • One example where you might easily

  • find yourself writing your own little custom kernel

  • is if you want to do research on activation functions, which

  • is something that people want to do.

  • In activation functions, this is a terrible one.

  • But they tend to look a little like this.

  • They have a bunch of nonlinear operations

  • and a bunch of element-wise things.

  • But in general, they apply lots of

  • little element-wise operations to each element of your vector.

  • And these things, if you try to run them

  • in the normal TensorFlow interpreter,

  • they're going to be rather slow, because they're

  • going to do a new memory allocation and a copy of things

  • around for every single one of these little operations.

  • While if you were to make a fused, single kernel,

  • you just write a single thing for each coordinate that

  • does the explanation, and logarithm, and addition,

  • and all the things like that.

  • But what we can see here is that if I take this function,

  • and I wrap it with experimental_compile=True,

  • and I benchmark running a compiled version versus running

  • a non-compiled version, on this tiny benchmark,

  • I can already see a 25% speedup.

  • And it's even better than this, because we

  • see speedups of this sort of magnitude or larger,

  • even on fairly large models, including Bert.

  • Because in large models, we can fuse more computation

  • into the linear operations, and your reductions,

  • and things like that.

  • And this can get you compounding wins.

  • So try using experimental_compile=True

  • for automatic compilation in TensorFlow.

  • You should be able to apply it to small pieces of code

  • and replace what you'd normally have to do with fused kernels.

  • So you know what type of researching code a lot

  • of people rely on that has lots of very small element-wise

  • operations and that which would greatly benefit from the fusion

  • powers of a compiler--

  • I think it's optimizers.

  • And a nice thing about doing your optimizer research

  • in TensorFlow is that Keras makes it very easy

  • for you to implement your own stochastic gradient in style

  • optimizer.

  • You can make a class that subclasses

  • that TF Keras optimizer and override three methods.

  • You can define your initialization

  • while you compute your learning rate or whatever,

  • and you're in it.

  • You can create any accumulator variables, like your momentum,

  • or higher order powers of gradients, or anything else

  • you need, and create slots.

  • And you can define how to apply this optimizer

  • update to a single variable.

  • Once you've defined those three things,

  • you have everything TensorFlow needs

  • to be able to run your custom optimizer.

  • And normally, TensorFlow optimizers

  • are written with hand-fused kernels, which

  • can make the code very complicated to read,

  • but ensures that they run very quickly.

  • What I'm going to show here is an example

  • of a very simple optimizer-- again, not

  • a particularly good one.

  • This is a weird variation that has

  • some momentum and some higher order powers,

  • but it doesn't train very well.

  • However, it has the same sorts of operations that you

  • would have on a real optimizer.

  • And I can just write them as regular TensorFlow operations

  • in my model.

  • And by just adding this line with experimental_compile=True,

  • I can get it to run just as fast as a hand-fused kernel.

  • And the benchmarks are written here.

  • It was over a 2x speed up.

  • So this can really matter when you're

  • doing a lot of research that looks like this.

  • Something else-- so Keras optimizes in compilation.

  • You experiment really fast or with fairly intricate things,

  • and I hope you will use this to accelerate your research.

  • The next thing I want to talk about is vectorization.

  • It's, again, super important for performance.

  • I'm sure you've heard, at this point,

  • that Moore's Law is over, and we're no longer

  • going to get a free lunch in terms

  • of processes getting faster.

  • The way we're making our machine learning models faster

  • is by doing more and more things in parallel.

  • And this is great, because we get to unlock

  • the potential of GPUs and TPUs.

  • This is also a little scary, because now,

  • even though we know what we want to do to a single, little data

  • point, we have to write these batched operations, which

  • can be fairly complicated.

  • In TensorFlow, we've been developing,

  • recently, automatic vectorization for you,

  • where you can write the element-wise code that you want

  • to write and get the performance of the batched computation

  • that you want.

  • So the working example I'm going to use here is Jacobians.

  • If you're familiar with TensorFlow's gradient tape,

  • you know that tape.gradient computes an element-wise--

  • computes a gradient of a scalar, not a gradient

  • of a vector value or a matrix value function.

  • And if you want the Jacobian of a vector value

  • to a matrix valued function, you can just

  • call tape.gradient many, many times.

  • And here, I have a very, very simple function

  • that is just the explanation of the square of a matrix.

  • And I want to compute the Jacobian.

  • And I do this by writing this double

  • for loop, where for every row, for every column,

  • I compute the gradient with respect to the row and column

  • output, and then stack the results together

  • to get my higher order, Jacobian tensor.

  • This is fine.

  • This has always worked.

  • However, you can replace these explicit loops with

  • tf.vectorized_map.

  • And one, you get a small readability win.

  • Because now we're saying that, yes, you're

  • just applying this operation everywhere.

  • But also, you get a very big performance win.

  • And this version that uses tf.vectorized_map is

  • substantially faster than the version that doesn't use it.

  • But of course, you don't want to have

  • to write this all the time, which

  • is why, really, for Jacobians, we implemented it directly

  • in the gradient tape.

  • And you can call tape.Jacobian to get the Jacobian computer

  • for you.