字幕表 動画を再生する 英語字幕をプリント [MUSIC PLAYING] ALEXANDRE PASSOS: Hello, my name is Alex, and I work on TensorFlow. I am here today to tell you all a little bit about how you can use TensorFlow to do deep learning research more effectively. What we're going to do today is we're going to take a little tour of a few TensorFlow features that show you how controllable, flexible, and composable TensorFlow is. We'll take a quick look at those features, some old and some new. And not, by far, all the features are useful for research. But these features let you accelerate your research using TensorFlow in ways that perhaps you're not aware of. And I want to start by helping you control how TensorFlow represents state. If you've used TensorFlow before, and I am sure you have at this point, you know that a lot of our libraries use TF variables to represent state, like your model parameters. And for example, a Keras dense layer has one kernel matrix and an optional bias vector stored in it. And these parameters are updated when you train your model. And part of the whole point of training models is so that we find out what value those parameters should have had in the first place. And if you're making your own layers library, you can control absolutely everything about how that state is represented. But you can also crack open the black box and control how state is represented, even inside the libraries that we give you. So for example, we're going to use this little running example of what if I wanted to re-parametrize a Keras layer so it does some computation to generate the kernel matrix, say to save space or to get the correct inductive bias. The way to do this is to use tf.variable_creator_scrope. It is a tool we have that lets you take control of the state creation process in TensorFlow. It's a context manager, and all variables created under it go through a function you specify. And this function can choose to do nothing. It can delegate. Or it can modify how variables are created. Under the hood, this is what distributionstrategy.scope usually implies. So it's the same tool that we use to build TensorFlow that we make available to you, so you can extend it. And here, if I wanted to do this re-parametrization of the Keras layer, it's actually pretty simple. First, I define what type I want to use to store those things. Here, I'm using this vectorize variable type, which is a tf.module. tf.modules are a very convenient type. You can have variables as members, and we can track them automatically for you and all sorts of nice things. And once we define this type, it's really just a left half and right half. I can tell TensorFlow how do I use objects of this type as a part of TensorFlow computations. And what we do here is we do a matrix multiplication of the left component and the right component. And now that I know how to use this object, I can create it. And this is all that I need to make my own little, variable_creator_scope. In this case, I want to peek at the shape. And if I'm not creating a matrix, just delegate to whatever TensorFlow would have done, normally. And if I am creating a matrix, instead of creating a single matrix, I'm going to create this factorized variable that has the left half and the right half. And finally, I now get to just use it. And here, I create a little Keras layer. I apply it. And I can check that it is indeed using my vectorized representation. This gives you a lot of power. Because now, you can take large libraries of code that you did not write and do dependency injection to change how they behave. Probably if you're going to do this at scale, you might want to implement your own layer so you can have full control. But it's also very valuable for you to be able to extend the ones that we provide you. So use tf.variable_creator_scope to control the stage. A big part of TensorFlow and why we use these libraries to do research at all, as opposed to just writing plain Python code, is that deep learning is really dependent on very fast computation. And one thing that we're making more and more easy to use in TensorFlow is our underlying compiler, XLA, which we've always used for TPUs. But now, we're making it easier for you to use for CPUs and GPUs, as well. And the way we're doing this is using tf.function with the experimental_compile=True annotation. What this means is if you mark a function as a function that you want to compile, we will compile it, or we'll raise an error. So you can trust the code you write inside a block is going to run as quickly as if you had handwritten your own fuse TensorFlow kernel for CPUs, and a Fuse.ko kernel, and then all the machinery, yourself. But you get to write high level, fast, Python TensorFlow code. One example where you might easily find yourself writing your own little custom kernel is if you want to do research on activation functions, which is something that people want to do. In activation functions, this is a terrible one. But they tend to look a little like this. They have a bunch of nonlinear operations and a bunch of element-wise things. But in general, they apply lots of little element-wise operations to each element of your vector. And these things, if you try to run them in the normal TensorFlow interpreter, they're going to be rather slow, because they're going to do a new memory allocation and a copy of things around for every single one of these little operations. While if you were to make a fused, single kernel, you just write a single thing for each coordinate that does the explanation, and logarithm, and addition, and all the things like that. But what we can see here is that if I take this function, and I wrap it with experimental_compile=True, and I benchmark running a compiled version versus running a non-compiled version, on this tiny benchmark, I can already see a 25% speedup. And it's even better than this, because we see speedups of this sort of magnitude or larger, even on fairly large models, including Bert. Because in large models, we can fuse more computation into the linear operations, and your reductions, and things like that. And this can get you compounding wins. So try using experimental_compile=True for automatic compilation in TensorFlow. You should be able to apply it to small pieces of code and replace what you'd normally have to do with fused kernels. So you know what type of researching code a lot of people rely on that has lots of very small element-wise operations and that which would greatly benefit from the fusion powers of a compiler-- I think it's optimizers. And a nice thing about doing your optimizer research in TensorFlow is that Keras makes it very easy for you to implement your own stochastic gradient in style optimizer. You can make a class that subclasses that TF Keras optimizer and override three methods. You can define your initialization while you compute your learning rate or whatever, and you're in it. You can create any accumulator variables, like your momentum, or higher order powers of gradients, or anything else you need, and create slots. And you can define how to apply this optimizer update to a single variable. Once you've defined those three things, you have everything TensorFlow needs to be able to run your custom optimizer. And normally, TensorFlow optimizers are written with hand-fused kernels, which can make the code very complicated to read, but ensures that they run very quickly. What I'm going to show here is an example of a very simple optimizer-- again, not a particularly good one. This is a weird variation that has some momentum and some higher order powers, but it doesn't train very well. However, it has the same sorts of operations that you would have on a real optimizer. And I can just write them as regular TensorFlow operations in my model. And by just adding this line with experimental_compile=True, I can get it to run just as fast as a hand-fused kernel. And the benchmarks are written here. It was over a 2x speed up. So this can really matter when you're doing a lot of research that looks like this. Something else-- so Keras optimizes in compilation. You experiment really fast or with fairly intricate things, and I hope you will use this to accelerate your research. The next thing I want to talk about is vectorization. It's, again, super important for performance. I'm sure you've heard, at this point, that Moore's Law is over, and we're no longer going to get a free lunch in terms of processes getting faster. The way we're making our machine learning models faster is by doing more and more things in parallel. And this is great, because we get to unlock the potential of GPUs and TPUs. This is also a little scary, because now, even though we know what we want to do to a single, little data point, we have to write these batched operations, which can be fairly complicated. In TensorFlow, we've been developing, recently, automatic vectorization for you, where you can write the element-wise code that you want to write and get the performance of the batched computation that you want. So the working example I'm going to use here is Jacobians. If you're familiar with TensorFlow's gradient tape, you know that tape.gradient computes an element-wise-- computes a gradient of a scalar, not a gradient of a vector value or a matrix value function. And if you want the Jacobian of a vector value to a matrix valued function, you can just call tape.gradient many, many times. And here, I have a very, very simple function that is just the explanation of the square of a matrix. And I want to compute the Jacobian. And I do this by writing this double for loop, where for every row, for every column, I compute the gradient with respect to the row and column output, and then stack the results together to get my higher order, Jacobian tensor. This is fine. This has always worked. However, you can replace these explicit loops with tf.vectorized_map. And one, you get a small readability win. Because now we're saying that, yes, you're just applying this operation everywhere. But also, you get a very big performance win. And this version that uses tf.vectorized_map is substantially faster than the version that doesn't use it. But of course, you don't want to have to write this all the time, which is why, really, for Jacobians, we implemented it directly in the gradient tape. And you can call tape.Jacobian to get the Jacobian computer for you.