字幕表 動画を再生する
-
JIRI SIMSA: Hi, everyone.
-
My name is Jiri.
-
I'm a software engineer on the TensorFlow team.
-
And today, I'm going to be talking to you about tf.data
-
and tf.distribute, which are TensorFlow's APIs for input
-
pipeline and distribution strategy, respectively.
-
To set the stage for what I'm going to be talking about,
-
let's think about what are the basic building
-
blocks for a machine learning workflow?
-
Machine learning operates over data.
-
It runs some computation.
-
And it uses some sort of hardware to do this task.
-
This hardware can either be a single CPU on your laptop.
-
Or possibly it can be on your workstation that
-
has either one or multiple accelerators,
-
either GPUs or TPUs, attached to it.
-
But you can also run the computation
-
across a large number of machines
-
that each have one or multiple accelerators attached to it.
-
Now, let's talk about the how the machine learning building
-
blocks are being served or reflected in the APIs
-
that TensorFlow provides.
-
So for the data handling part of the machine learning task,
-
TensorFlow provides a tf.data API.
-
It's the input pipeline API for TensorFlow.
-
For the computation itself, such as supervised learning,
-
TensorFlow offers a number of different both high level
-
and low level APIs.
-
You might be familiar with Keras or Estimators--
-
they've been mentioned in earlier talks today--
-
as well as lower level APIs for building custom training loops.
-
And finally, to hide the hardware details
-
of your computation, TensorFlow provides a tf.distribute API,
-
which allows you to create your input pipeline
-
and model in a way that's agnostic to the environment
-
in which it's going to execute.
-
So kind of thinking that your program is
-
going to run, perhaps, on a single device,
-
and with minimal changes being able to deploy it
-
on a large set of different devices, a possibility
-
of different machine learning architectures.
-
In this talk, I'm going to talk about the tf.data input
-
pipeline API.
-
And then, in the second part, I'm
-
also going to talk about the tf.distribute, the distribution
-
strategy API.
-
I'm not going to talk about Keras, and Estimator,
-
and other APIs for the modeling itself,
-
as that has been covered in previous talks.
-
So without further ado, let's get
-
started with tf.data, which is TensorFlow input pipeline API.
-
So let's ask ourselves a question.
-
Why do we need an input pipeline API in the first place?
-
Why don't we just load the data in memory,
-
maybe in our Python program as a non py array,
-
and pass it into a Keras model?
-
Well, there is actually a number of good reasons
-
why we need an API or why using one will benefit us.
-
First of all, the data might not fit into memory.
-
For example, the ImageNet data set
-
is 140 gigabytes of data, which do not necessarily
-
fit into memory on every laptop or workstation.
-
The data itself might also require randomized
-
preprocessing, which means that we cannot preprocess everything
-
ahead of time offline and then have the data to be ready
-
for training.
-
We actually need to have an input pipeline that
-
performs the preprocessing, such as, in the case of ImageNet,
-
perhaps image cropping or randomized image distortions
-
or transformations on the fly as we're
-
running dimensional learning computation.
-
Having an input pipeline API as an abstraction might also
-
allow us to, in the runtime of this API,
-
implement things in a way that allows the computation
-
to efficiently utilize the underlying hardware.
-
And I'm actually going to spend a fair amount of the first part
-
of my talk talking about how to efficiently utilize
-
the hardware through the tf.data input pipeline abstraction.
-
Last, but not least, which is something that ties the tf.data
-
API to the tf.distribution API, using an input pipeline API
-
abstraction allows us to decouple
-
the task of loading and preprocessing of the data
-
from the task of distributing the computation.
-
We are using the abstraction, which
-
allows you to create your input pipeline assuming
-
it's going to run on one place.
-
And then the distribution strategy
-
will somehow distribute the data without you
-
having to worry about the fact that the input pipeline might
-
actually be evaluated in multiple places in parallel.
-
So for those reasons, we created tf.data, TensorFlow's input
-
pipeline API.
-
And the way I like to think about tf.data is an input
-
pipeline API's created through tf.data--
-
it's an ETL process.
-
What I mean by that is the E, T, and L
-
stand for different parts of the input pipeline stages.
-
E stands for Extract.
-
This is the stage in which we read the data,
-
either from a memory or local or remote storage.
-
And we possibly parse the file format
-
that the data is stored in.
-
Perhaps it's compressed.
-
Then the T, the Transform stage, in this stage,
-
we perform either domain specific or domain
-
agnostic transformations.
-
So the domain specific transformations
-
are specific to the type of data we're dealing with.
-
So, for instance, text vectorization,
-
image transformation, or temporal video sampling
-
are examples of domain specific transformations.
-
While domain agnostic transformations
-
include things like shuffling of your data
-
during training or batching.
-
That is combining multiple elements
-
into a single higher dimensional element.
-
And, finally, the last stage of the input pipeline, Loading,
-
pertains to efficiently transferring
-
the data onto the accelerator, which is either a GPU or TPU.
-
What I should point out here is that, traditionally, the input
-
pipeline portion of your machine learning computation
-
happens on a CPU.
-
Because some of the operations are naturally
-
only possible on the CPU, which leaves the GPU and TPU
-
resources available for your machine
-
learning specific computations, such as your map models.
-
This makes-- this puts an extra pressure
-
on the efficiency with which the input pipeline performs.
-
And the reason for that is--
-
which is what I'm trying to illustrate here
-
with the graph--
-
is that over time the rate at which CPU performs
-
has plateaued, while the computational power of GPUs
-
and TPUs, thanks to recent hardware advances,
-
continues to accelerate at an exponential rate, which
-
opens up this performance gap between a raw CPU
-
and GPU/TPU processing power available in a single machine.
-
And that can-- the consequence of this
-
could be that the CPU part of your machine learning
-
computation, namely the input pipeline,
-
can be a bottleneck of your computation.
-
So it's really important that the CPU input pipeline performs
-
as efficiently as it can.
-
So let's take a look at an example of what a tf.data-based
-
input pipeline actually looks like.
-
Here, I'm using an example for a common image,
-
or how a common image processing input pipeline would look like.
-
We're first creating a data set using the TFRecordDataset
-
operation.
-
It's a data set constructor that takes a set of file names
-
or a set of file patterns and produces
-
elements that are stored in those files
-
in a sequence-like manner.
-
And once you create a data set, you
-
can chain transformations onto the data set,
-
thus creating new types of data sets.
-
A very common one and very powerful
-
one is the map transformation, which
-
allows you to apply an arbitrary processing on the elements
-
of the data set.
-
And this preprocessing can be expressed
-
as a function that ends up being traced
-
using the mechanisms available in TensorFlow,
-
meaning this function that is being used to transform
-
elements of the data set is executed as a data flow
-
graph, which has important implications
-
for the performance and how the runtime can actually
-
execute this function.
-
And the last thing that I illustrate here
-
is the batch transformation, which
-
combines multiple elements of the input data set
-
and produces a single element as an output that
-
has a higher dimension, which is a common practice for training
-
efficiency.
-
Now one thing that's not illustrated here,
-
but it actually does happen under the hoods inside
-
of tf.data runtime is that for certain combinations
-
of transformations, a. tf.data provides more efficient
-
fused implementations.
-
For instance, if a map transformation is followed
-
by a batch transformation, we actually have a highly
-
efficient C++ based implementation
-
for the combination of the two that can give you up to 2x
-
speed up in the performance of your input pipeline.
-
And that happens kind of magically behind the scenes.
-
And the important bit that I want to highlight here
-
is that the user doesn't need to worry about it.
-
The user here doesn't really need
-
to do anything with respect to optimizing the performance.
-
They focus on creating an input pipeline with the functional
-
preprocessing in mind.
-
And once you create the data set that you would like,
-
you can pass it into TensorFlow high level API such as Keras
-
or Estimator, which all support data set abstraction
-
as an input for the data.
-
So let's talk a bit more about the input pipeline performance.
-
If you were to implement the input pipeline
-
in naive fashion using CPU for the input pipeline processing
-
or data preparation and the GPU and TPU for the training
-
computation, you might end up in a situation
-
like is illustrated on the slide where
-
at any given point in time you're
-
only utilizing one of two resources available to you.
-
And you could probably tell that this seems rather inefficient.
-
Well, a common technique that can
-
be used to make this style of computation more efficient
-
is called software pipeline.
-
And the idea is that while you're
-
working on the current element for training step
-
on a GPU and a TPU, you're already
-
started preprocessing data for the next training
-
step on a CPU.
-
And thus, you overlap the computation
-
that happens on the two devices or two
-
resources available to you.
-
To achieve that, the effect of software pipelining in tf.data
-
is pretty straight forward.
-
All you do is you chain a .prefetch transformation
-
to a particular point in your input pipeline.
-
And the effect of doing that will
-
be that the producer of the data up to that point
-
will be decoupled from the consumer of the data,
-
in this case, the Keras model.
-
And the two will be operating independently,
-
coordinating through an internal buffer.
-
And this will have the desired effect of software
-
pipelining that I illustrated in the previous slide.
-
Another opportunity for improving
-
the performance of your input pipeline
-
is to parallelize the transformation.
-
So the top part of this diagram illustrates
-
that we're using sequential processing for applying the map
-
transformation of the individual elements of the batch
-
that we are then going to create.
-
But there is no reason that you need
-
to do that unless there would, in effect, be some sort of data
-
or control dependency.
-
But commonly, there is not.
-
An in that case, you can parallelize
-
and overlap the preprocessing of all the individual elements
-
for which we're going to create the batch out of.
-
So let's take a look at how we would
-
do that using the tf.data API.
-
And similar to the software pipelining idea,
-
this is pretty straightforward.
-
You simply add a single argument,
-
num_parallel_calls, to the map transformation,