字幕表 動画を再生する
[MUSIC PLAYING]
CHRISTINA GREER: Hi.
My name is Christina and I'm a software engineer
on the Google Brain team.
I'm here today to tell you about some tools
that my team and I have built to help
make the end-to-end lifecycle of the machine learning pipeline
easier.
I'm going to start by talking about model analysis
and validation.
These are two different components in TFX,
but they are very similar in how they're actually executed.
The main difference is how you as an end user will use them.
I'm going to start by talking about the evaluator.
So why is model evaluation important?
Well, for one thing, we have gathered data.
We've cleaned that data.
We've trained a model.
But we really want to make sure that model works.
And so, model evaluation can help
you assess the overall quality of your model.
You also may want to analyze how your model is performing
on specific slices of the data.
So in this case, with the Chicago taxi example
that Clemens started this off with,
why are my tip predictions sometimes wrong?
Slicing the data and looking at where you're doing poorly
can be a real benefit, because it identifies some low hanging
fruit where you can get gains in accuracy by adding more data
or making some other changes to make some of these segments
improve.
You also want to track your performance over time.
You're going to be continuously training models and updating
them with fresh data, so that your models don't get stale.
And you want to make sure that your metrics are improving
over time and not regressing.
And model evaluation can help you with all of this.
The component of TFX that supports this
is called the evaluator.
And it is based on a library called TensorFlow Model
Analysis.
From the pipeline perspective, you
have inputs, which is your eval set that
was generated by your ExampleGen.
You have the trainer outputting a saved model.
You also need to specify the splits in your data
that you find most interesting, so that the evaluator can
precompute metrics for these slices of data.
Your data then goes into the evaluator.
And a process is run to generate metrics
for the overall slice and the slices that you have specified.
The output of the evaluator is evaluation metrics.
This is a structured data format that has your data,
the splits you specified, and the metrics that correspond
to each one of these splits.
The TensorFlow Model Analysis library
also has a visualization tool that
allows you to load up these metrics
and dig around in your data in a user friendly way.
So going back to our Chicago taxi example,
you can see how the model evaluator can help you look
at your top line objective.
How well can you predict trips that result in large tips?
The TFMA visualization shows the overall slice of data here.
The numbers are probably small, but accuracy is 94.7%.
That's pretty good.
You'd get an A for that.
But maybe you want to say 95%.
95% accuracy is a lot better number than 94, 94.7.
So maybe you want to bump that up a bit.
So then you can dig into why your tip predictions are
sometimes wrong.
We have sliced the data here by the hour of day
that the trip starts on.
And we've sorted by poor performance.
When I look at this data, I see that trips
that start, like, 2:00, 3:00 AM were performing quite poorly
in these times.
Because of the statistics generation tool
that Clemens talked about, I do know
that the data is sparse here.
But if I didn't know that, perhaps I would think,
maybe there's something that people
that get taxis at 2:00 or 3:00 in the morning
might have in common that causes erratic tipping behavior.
Someone smarter than me is going to have to figure that one out.
You also want to know if you can get better
at predicting trips over time.
So you are continuously training these models for new data,
and you're hoping that you get better.
So the TensorFlow Model Analysis tool
that powers the evaluator and TFX
can show you the trends of your metrics over time.
And so here you see three different models
and the performance over each with accuracy in AUC.
Now I'm going to move on to talking
about the ModelValidator component.
With the evaluator, you were an active user.
You generated the metrics.
You loaded them up in the UI.
You dug around in your data.
You looked for issues that you could
fix to improve your model.
But eventually, you're going to iterate.
Your data is going to get better.
Your model's going to improve.
And you're going to be ready to launch.
You're also going to have a pipeline continuously
feeding new data into this model.
And every time you generate a new model with new data,
you don't want to have to do a manual process of pushing
this to a server somewhere.
The ModelValidator component of TFX
acts as a gate that keeps you from pushing
bad versions of your model, while allowing you to automate
pushing of quality models.
So why model validation is important--
we really want to avoid pushing models with degraded quality,
specifically in an automated fashion.
If you train a model with new data
and the performance drops, but say
it increases in certain segments of the data
that you really care about, maybe you
make the judgment call that this is an improvement overall.
So we'll launch it.
But you don't want to do this automatically.
You want to have some say before you do this.
So this acts as your gatekeeper.
You also want to avoid breaking downstream components.
If your model suddenly started outputting something
that your server binary couldn't handle,
you'd want to know that also before you push.
The TFX component that supports this
is called the ModelValidator.
It takes very similar inputs and outputs to the model evaluator.
And the libraries that compute the metrics
are pretty much the same underneath the hood.
However, instead of one model, you provide two--
the new model that you're trying to evaluate and the last good
evaluated model.
It then runs on your if eval split data
and compares the metrics on the same data between the two
models.
If your metrics have stayed the same or improved,
then you go ahead and bless the model.
If the metrics that you care about have degraded,
you will not bless the model.
Get some information about which metrics failed,
so that you can do some further analysis.
The outcome of this is a validation outcome.
It just says blessed if everything went right.
Another thing to note about the ModelValidator
is that it allows you to do next day eval of your previously
pushed model.
So maybe the last model that you blessed,
it was trained with old data.
With the ModelValidator, it evaluates it on the new data.
And finally, I'm going to talk about the pusher.
The pusher is probably the simplest component
in the entire TFX pipeline.
But it does serve quite a useful purpose.
It has one input, which is that blessing that you
got from the ModelValidator.
And then the output is if you passed your validation,
then the pusher will copy your saved model into a file system
destination that you've specified.
And now you're ready to serve your model
and make it useful to the world at large.
I'm going to talk about model deployment next.
So this is where we are.
We have a trained SavedModel.
A SavedModel is a universal serialization format
for TensorFlow models.
It contains your graph, your learned variable weights,
your assets like embeddings and vocabs.
But to you, this is just an implementation detail.
Where you really want to be is you have an API.
You have a server that you can query
to get answers in real time or provide
those answers to your users.
We provide several deployment options.
And many of them are going to be discussed
at other talks in the session.
TensorFlow.js is optimized for serving in the browser
or on Node.js.
TensorFlow Lite is optimized for mobile devices.
We already heard a talk about how Google Assistant is using
TensorFlow Lite to support model inference on their Google Home
devices.
TensorFlow Hub is something new.
And Andre is going to come on in about five minutes
and tell you about that, so I'm not going to step on his toes.
I'm going to talk about TensorFlow Serving.
So if you want to put up a REST API that
serves answers for your model, you
would want to use TensorFlow Serving.
And why would you want to use this?
For one thing, TensorFlow Serving
has a lot of flexibility.
It supports multi-tenancy.
You can run multiple models on a single server instance.
You can also run multiple versions of the same model.
This can be really useful when you're
trying to canary a new model.
Say you have a tried and tested version of your model.
You've created a new one.
It's passed your evaluator.
It's passed your validation.
But you still want to do some A/B testing with real users
before you completely switch over.
TensorFlow Serving supports this.
We also support optimization with GPU and TensorRT.
And you can expose a gRPC or a REST API.
TensorFlow Serving is also optimized for high performance.
It provides low latency, request batching--
so that you can optimize your throughput
while still respecting latency requirements--
and traffic isolation.
So if you are serving multiple models on a single server,
a traffic spike in one of those models
won't affect the serving of the other.
And finally, TensorFlow Serving is production-ready.
This is what we used to serve many
of our models inside of Google.
We've served millions of QPS with it.
You can scale in minutes, particularly
if you use the Docker image and scale up on Kubernetes.
And we support dynamic version refresh.
So you can specify a version refresh policy
to either take the latest version of your model,
or you can pin to a specific version.
This can be really useful for rollbacks
if you find a problem with the latest version
after you've already pushed.
I'm going to go into a little bit more detail
about how you might deploy a REST API for your model.
We have two different options for doing this presented here.
The first, the top command is using Docker,
which we really recommend.
It requires a little bit of ramp up at the beginning,
but you will really save time in the long run
by not having to manage your environment
and not having to manage your own dependencies.
You can also run locally on your own host,
but then you do have to do all of that stuff long term.
I'm going to go into a little bit more detail on the Docker
run command.
So you start with Docker run.
You choose a port that you want to bind your API to.
You provide the path to the saved model that
was generated by your trainer.
Hopefully, it was pushed by the pusher.
You provide the model name.
And you tell Docker to run the TensorFlow Serving binary.
Another advantage of using Docker
is that you can easily enable hardware acceleration.
If you're running on a host with a GPU
and the Nvidia Docker image installed,
you can modify this command line by a few tokens,
and then be running on accelerated hardware.
If you need even further optimization,
we now support optimizing your model
for serving using TensorRT.
TensorRT is a platform for Nvidia
for optimized deep learning inference.
Your Chicago taxi example that we've been using here
probably wouldn't benefit from this.
But if you had, say, an image recognition model, a ResNet,
you could really get some performance boosting
and cost savings by using TensorRT.
We provide a command line that allows
you to convert the saved model into a TensorRT
optimized model.
So then again, a very simple change to that original command
line.
And you're running on accelerated GPU hardware
with TensorRT optimization.
So to put it all together again, we
introduced TensorFlow Extended or TFX.
We showed you how the different components that TFX consists of
can work together to help you manage
the end-to-end lifecycle of your machine learning pipeline.
First, you have your data.
And we have tools to help you make sense of that
and process it and prepare it for training.
We then support training your model.
And after you train your model, we
provide tools that allow you to make sense
of what you're seeing, of what your model's doing,
and to make improvements.
Also, to make sure that you don't regress.
Then we have the pusher that allows
you to push to various deployment options
and make your model available to serve users in the real world.
To get started with TensorFlow Extended,
please visit us on GitHub.
There is also more documentation at TensorFlow.org/tfx.
And some of my teammates are running a workshop tomorrow.
And they'd love to see you there.
You don't need to bring a laptop.
We have machines that are set up and ready to go.
And you can get some hands-on experience
using TensorFlow Extended.
[MUSIC PLAYING]