字幕表 動画を再生する
CLEMENS MEWALD: Hi, everyone.
My name is Clemens.
I'm a product manager in Google Research.
And today I'm going to talk about TensorFlow Extended,
which is a machine learning platform that we built
around TensorFlow at Google.
And I'd like to start this talk with a block
diagram and the small yellow box, or orange box.
And that box basically represents
what most people care about and talk about when they
talk about machine learning.
It's the machine learning algorithm.
It's the structure of the network
that you're training, how you choose
what type of machine learning problem you're solving.
And that's what you talk about when you talk about TensorFlow
and using TensorFlow.
However, in addition to the actual machine learning,
and to TensorFlow itself, you have
to care about so much more.
And these are all of these other things
around the actual machine learning algorithm
that you have to have in place, and that you actually
have to nail and get right in order
to actually do machine learning in a production setting.
So you have to care about where you get your data from,
that your data are clean, how you transform them,
how you train your model, how to validate your model,
how to push it out into a production setting,
and deploy it at scale.
Now, some of you may be thinking, well,
I don't really need all of this.
I only have my small machine learning problem.
I can live within that small orange box.
And I don't really have these production worries as of today.
But I'm going to propose that all of you
will have that problem at some point in time.
Because what I've seen time and time
again is that research and experimentation
today is production tomorrow.
It's like research and experimentation never
ends just there.
Eventually it will become a production model.
And at that point, you actually have to care about all
of these things.
Another side of this coin is scale.
So some of you may say, well, I do
all of my machine learning on a local machine, in a notebook.
Everything fits into memory.
I don't need all of these heavy tools to get started.
But similarly, small scale today is large scale tomorrow.
At Google we have this problem all the time.
That's why we always design for scale from day one,
because we always have product teams that say, well,
we have only a small amount of data.
It's fine.
But then a week later the product picks up.
And suddenly they need to distribute the workload
to hundreds of machines.
And then they have all of these concerns.
Now, the good news is that we built something for this.
And TFX is the solution to this problem.
So this is a block diagram that we published
in one of our papers that is a very simplistic view
of the platform.
But it gives you a broad sense of what
the different components are.
Now, TFX is a very large platform.
And it contains a lot of components
and a lot of services.
So the paper that we published, and also
what I'm going to discuss today, is only a small subset of this.
But building TFX and deploying it at Google
has had a profound impact of how fast product teams at Google
can train machine learning models
and deploy them in production, and how ubiquitous machine
learning has become at Google.
You'll see later I have a slide to give you some sense of how
widely TFX is being used.
And it really has accelerated all of our efforts
to being an AI first company and using machine learning
in all of our products.
Now, we use TFX broadly at Google.
And we are very committed to make
all of this available to you through open sourcing it.
So the boxes that are just highlighted in blue
are the components that we've already open sourced.
Now, I want to highlight an important thing.
TFX is a real solution for real problems.
Sometimes people ask me, well, is this the same code that you
use at Google for production?
Or did you just build something on the side and open source it?
And all of these components are the same code base
that we use internally for our production pipelines.
Of course, there's some things that
are Google specific for our deployments.
But all of the code that we open source
is the same code that we actually
run in our production systems.
So it's really code that solves real problems for Google.
The second part to highlight is so far
we've only open sourced libraries, so
each one of these libraries that you can use.
But you still have to glue them together.
You still have to write some code
to make them work in a joint manner.
That's just because we haven't open
sourced the full platform yet.
We're actively working on this.
But I would say so far we're about 50% there.
So these blue components are the ones
that I'm going to talk about today.
But first, let me talk about some of the principles
that we followed when we developed TFX.
Because I think it's very informative
to see how we think about these platforms,
and how we think about having impact at Google.
The first principle is flexibility.
And there's some history behind this.
And the short version of that history
is that I'm sure at other companies as well there used
to be problem specific machine learning platforms.
And just to be concrete, so we had a platform
that was specifically built for large scale linear models.
So if you had a linear model that you
wanted to train at large scale, you
used this piece of infrastructure.
We had a different piece of infrastructure
for large scale neural networks.
But product teams usually don't have one kind of a problem.
And they usually want to train multiple types of models.
So if they wanted to train linear [INAUDIBLE] models,
they had to use two entirely different technology stacks.
Now, with TensorFlow, as I'm sure you know,
we can actually express any kind of machine learning algorithm.
So we can train TensorFlow models
that are linear, that are deep, unsupervised and supervised.
We can train tree models.
And any single algorithm that you can think of either
has already been implemented in TensorFlow,
or is possible to be implemented in TensorFlow.
So building on top of that flexibility,
we have one platform that supports
all of these different use cases from all of our users.
And they don't have to switch between platforms just
because they want to implement different types of algorithms.
Another aspect of this is the input data.
Of course, also product teams don't only have image data,
or only have text data.
In some cases, they may even have both.
Right.
So they have models that take in both images and text,
and make a prediction.
So we needed to make sure that the platform that we built
supports all of these input modalities,
and can deal with images, text, sparse data
that you will find in logs, videos even.
And with a platform as flexible as this,
you can ensure that all of the users
can represent all of their use cases on the same platform,
and don't have to adopt different technologies.
The next aspects of flexibility is
how you actually run these pipelines
and how you train models.
So one very basic use case is you have all of your data
available.
You train your model once, and you're done.
This works really well for stationary problems.
A good example is always, you want
to train a model that classifies an image whether there's
a cat or a dog in that image.
Cats and dogs have looked the same for quite a while.
And they will look the same in 10 years,
or very much the same as today.
So that same model will probably work well in a couple of years.
So you don't need to keep that model fresh.
However, if you have a non stationary problem where
data changes over time, recommendation systems
have new types of products that you want to recommend,
new types of videos that get uploaded all the time, you
actually have to retrain these models, or keep them fresh.
So one way of doing this is to train a model
on a subset of your data.
Once you get new data, you throw that away.
You train a new model either on the superset, so on the old
and on the new data, or only on the fresh data, and so on.
Now, that has a couple of disadvantages.
One of them being that you throw away
learning from previous models.
In some cases, you're wasting resources,
because you actually have to retrain over the same data over
and over again.
And because a lot of these models
are actually not deterministic, you
may end up with vastly different models every time.
Because the way that they're being initialized,
you may end up in different optimum
every time you train these models.
So a more advanced way of doing this
is to start training with your data.
And then initialize your model from the previous weights
from these models and continue training.
So we call that warm starting of models that may seem trivial
if you just say, well, this is just
a continuation of your training run.
You just added more data and you continue.
But depending on your model architecture,
it's actually non-trivial.
Which in some cases, you may only
want to warm start embeddings.
So you may only want to transfer the weights of the embeddings
to a new model and initialize the rest of your network
randomly.
So there's a lot of different setups
that you can achieve with this.
But with this you can continuously
update your models.
You retain the learning from previous versions.
You can even, depending on how you set it up,
bias your model more on the more recent data.
But you're still not throwing away the old data.
And always have a fresh model that's updated for production.
The second principle is portability.
And there's a few aspects to this.
The first one is obvious.
So because we rely on TensorFlow,
we inherit the properties of TensorFlow,
which means you can already train your TensorFlow
models in different environments and on different machines.
So you can train a TensorFlow model locally.
You can distribute it in a cloud environment.
And by cloud, I mean any setup of multiple clusters.
It doesn't have to be a managed cloud.
You can train or perform inferences with your TensorFlow
models on the devices that you care about today.
And you can also train and deploy them on devices
that you may care about in the future.
Next is Apache Beam.
So when we open sourced a lot of our components
we faced the challenge that internally we
use a data processing engine that
allows us to run these large scale
data processing pipelines.
But in the open source world and in all of your companies,
you may use different data processing systems.
So we were looking for a portability layer.
And Apache beam provides us with that portability layer.
It allows us to express a data graph once with the Python SDK.
And then you can use different runners
to run those same data graphs in different environments.
The first one is a direct runner.
So that allows you to run these data graphs
on a single machine.
There's also the one that's being used in notebooks.
So I'll come back to that later, but we
want to make sure that all of our tools
work in notebook environments, because we know that that's
where data scientists start.
Then there's a data flow runner, with which
you can run these same pipelines at scale
on the cloud's dataflow in this case.
There's a Flink runner that's being developed
right now by the community.
There's a [INAUDIBLE] ticket that you can follow
for the status updates on this.
I'm being told it's going to be ready at some point
later this year.
And the community is also working on more runners
so that these pipelines are becoming more portable
and can be run in more different environments.
In terms of cluster management and managing your resources,
we work very well together with Kubernetes and the KubeFlow
project, which actually is the next talk right after mine.
And if you're familiar with Kubernetes,
there's something called Minikube,
with which you can deploy your Kubernetes
setup on a single machine.
Of course, there's managed Kubernetes solutions
such as GKE.
You can run your own Kubernetes cluster
if you want to, on prem.
And, again, we inherit the portability aspects
of Kubernetes.
Another extremely important aspect is scalability.
And I've alluded to it before.
I'm sure many of you know the problem.
There's different roles in companies.
And some very commonly, data scientists work on-- sometimes
it's down sampled set of data on their local machines,
maybe on their laptop, in a notebook environment.
And then there's data engineers or product software engineers
who actually either take the models that
were developed by data scientists
and deploy them in production.
Or they're trying to replicate what
data scientists did with different frameworks,
because they work with a different toolkit.
And there's this almost impenetrable wall
between those two.
Because they use different toolsets.
And there is a lot of friction in terms
of translating from one toolset to the other,
or actually deploying these things from the data science
process to the production process.
And if you've heard the term, throw
over the wall, that usually does not have good connotations.
But that's exactly what's happening.
So when we built TFX we paid particular attention
to make sure that all of the toolsets we build
are usable at a small scale.
So you will see from my demos, all of our tools
work in a notebook environment.
And they work on a single machine with small datasets.
And in many cases, or actually in all cases,
the same code that you run on a single machine
scales up to large workloads in a distributed cluster.
And the reason why this is extremely important
is there's no friction to go from experimentation
on a small machine to a large cluster.
And you can actually bring those different functions together,
and have data scientists and data engineers work together
with the same tools on the same problems,
and not have to dwell in between them.
The next principle is interactivity.
So the machine learning process is not a straight line.
At many points in this process you actually
have to interact with your data, understand your data,
and make changes.
So this visualization is called Facets.
And it allows you to investigate your data, and understand it.
And, again, this works at scale.
So sometimes when I show these screenshots,
they may seem trivial when you think
about small amounts of data that fit into a single machine.
But if you have terabytes of data,
and you want to understand them, it's less trivial.
And on the other side--
I'm going to talk about this in more detail later--
this is a visualization we have to actually understand how
your models perform at scale.
This is a screen capture from TensorFlow Model Analysis.
And by following these principles,
we've built a platform that has had a profound impact on Google
and the products that we build.
And it's really being used across many of our Alphabet
companies.
So Google, of course, is only one company
under the Alphabet umbrella.
And within Google, all of our major products
are using TensorFlow Extended to actually deploy machine
learning in their products.
So with this, let's look at a quick overview.
I'm going to take questions later, if it's possible.
Let's look at a quick overview of the things
that we've open sourced yet.
So this is the familiar graph that you've seen before.
And I'm just going to turn all of these boxes blue
and talk about each one of those.
So data transformation we have open sourced
as TensorFlow Transform.
TensorFlow Transform allows you to express your data
transformation as a TensorFlow graph,
and actually apply these transformations at training
and at serving time.
Now, again, this may sound trivial,
because you can already express your transformations
with a TensorFlow graph.
However, if your transformations require
an analyze phase of your data, it's less trivial.
And the easiest example for this is mean normalization.
So if you want to mean normalize a feature,
you have to compute the mean and the standard deviation
over your data.
And then you need to subtract the mean and divide
by standard deviation.
Right.
If you work on a laptop with a dataset that's a few gigabytes,
you can do that with NumPy and everything is great.
However, if you have terabytes of data,
and you actually want to replicate these
transformations in serving time, it's less trivial.
So Transform provides you with utility functions.
And for mean normalization there's
one that's called Scale to Z-score that is a one liner.
So you can say, I want to scale this feature such
that it has a mean of zero and a standard deviation of one.
And then Transform actually creates a Beam graph for you
that computes these metrics over your data.
And then Beam handles computing those metrics
over your entire dataset.
And then Transform injects the results of this analyze phase
as a constant in your TensorFlow graph,
and creates a TensorFlow graph that
does the computation needed.
And the benefit of this is that this TensorFlow graph that
expresses this transformation can now
be carried forward to training.
So training time, you applied those transformations
to your training data.
And the exact same graph is also applied to the inference graph,
such that at inference time the exact same transformations
are being done.
Now, that basically eliminates training serving skew,
because now you can be entirely sure
that the exact same transformations is
being applied.
It eliminates the need for you to have code
in your serving system that tries to replicate this
transformation, because usually the code paths that you
use in your training pipelines are different from the ones
that you use in your serving system,
because that's very low latency.
Here's just a code snippet of how such a pre processing
function can look like.
I just spoke about scaling to the Z-score.
So that's mean normalization.
String_to_int is another very common transformation
that does string to integer mapping by creating a vocab.
And bucketizing a feature, again,
is also a very common transformation
that requires an analyze phase over your data.
And all of these examples are relatively simple.
But just think about one of the more advanced use cases
where you can actually chain together transforms.
You can do a transform of your already transformed feature.
And Transform actually handles all of these for you.
So there's a few common use cases.
I've talked about scaling and bucketization.
Text transformations are very common.
So if you want to compute ngrams,
you can do that as well.
And the particularly interesting one
is actually applying a safe model.
And applying a safe model in Transform
takes an already trained or created TensorFlow model
and applies it as a transformation.
So you can imagine if one of your inputs is an image,
and you want to apply an inception model to that image
to create an input for your model,
you can do that with that function.
So you can actually embed other TensorFlow models
as transformations in your TensorFlow model.
And all of this is available on TensorFlow/Transform on GitHub.
Next, we talk about the trainer.
And the trainer is really just TensorFlow.
We're going to talk about the Estimate API and the Keras API.
This is just a code snippet that shows you how
to train a wide and deep model.
A wide and deep model combines a deep [INAUDIBLE],,
just a [INAUDIBLE] of a network, and the linear part together.
And in the case of this estimator,
it's a matter of instantiating this estimator.
And then the Estimate API is relatively straightforward.
There's a train method that you can call to train the model.
And the estimators that are up here
are the ones that are in core TensorFlow.
So if you just install TensorFlow,
you get DNNs, Linear, DNN and Linear combined, and boosted
trees, which is a great [INAUDIBLE]
tree implementation.
But if you want to do some searching
in TensorFlow Contrib, or in other repositories
under the TensorFlow [INAUDIBLE] on GitHub,
you will find many, many more implementations
of very common architectures with the estimator framework.
Now, the estimator, there's a method
that's currently in Contrib.
But it will move to the Estimate API with 2.0.
It has a method called Export Safe Models.
And that actually exports a TensorFlow graph
as a safe model, such that it can
be used by a TensorFlow model analysis in TensorFlow Survey.
This is just a code snippet from one
of our examples of how this looks.
For an actual example, in this case,
it's the Chicago taxi dataset.
We just instantiated the non-linear combined classifier,
called train, and exported it for use
by downstream components.
Using tf.Keras, it looks very similar.
So in this case, we used the Keras sequential API,
where you can configure the layers of your network.
And the Keras API is also getting
a method called Save Keras model that
exports the same format, which is the safe model, such
that it can be used again by downstream components.
Model evaluation validation is open sourced as TensorFlow
model analysis.
And that takes that graph as an input.
So the graph that we just exported
from our estimator or Keras model flows as an input
into TFMA.
And TFMA computes evaluation statistics
at scale in a sliced manner.
So now, this is another one of those examples where
you may say, well, I already get my metrics from TensorBoard.
TensorBoard metrics are computed in a streaming manner
during training on minute batches.
TFMA uses Beam pipelines to compute
metrics in an exact manner with one pass over all of your data.
So if you want to compute your metrics or a terabyte of data
within exactly one pass, you can use TFMA.
Now, in this case, you run TFMA for that model
and some dataset.
And if you just call this method called random slicing metrics
with the result by itself, the visualization looks like this.
And I pulled this up for one reason.
And that reason is just to highlight
what we mean by sliced metrics.
This is the metric that you may be
used to when someone trains a model and tells you, well,
my model has a 0.94 accuracy, or a 0.92 AUC.
That's an overall metric.
Over all of your data, it's the aggregate
of those metrics for your entire model.
That may tell you that the model is doing well on average,
but it will not tell you how the model is
doing on specific slices of your data.
So if you, instead, render those slices for a specific feature--
in this case we actually sliced these metrics
by trip start hour--
so, again, this is from the Chicago taxicab dataset.
You actually get a visualization in which you can now--
in this case, we look at a histogram and [INAUDIBLE]
metric.
We filter for buckets that only have 100 examples so
that we don't get low buckets.
And then you can actually see here
how the model performs on different slices of feature
values for a specific trip start hour.
So this particular model is trained to predict
whether a tip is more or less than 20%.
And you've seen overall it has a very high accuracy, and very
high AUC.
But it turns out that on some of these slices,
it actually performs poorly.
So if the trip start hour is seven, for some reason
the model doesn't really have a lot of predictive power
whether the tip is going to be good or bad.
Now, that's informative to know.
Because maybe that's just because there's
more variability at that time.
Maybe we don't have enough data during that time.
So that's really a very powerful tool
to help you understand how your model performs.
Some other visualizations that are available in TFMA
are shown here.
We haven't shown that in the past.
So the calibration plot, which is the first one,
shows you how your model predictions
behave against the label.
And you would want your model to be well calibrated,
and not to be over or under predicting in a specific area.
The prediction distribution just shows you
that this distribution, precision recall, and our C
curves are commonly known.
And, again, this is the plot for overall.
So this is the entire model and the entire eval dataset.
And, again, if you specify a slice here,
you can actually get the same visualization only
for a specific slice of your features.
And another really nice feature is
that if you have multiple models or multiple eval sets
over time, you can visualize them in a time series.
So in this case, we have three models.
And for all of these three models,
we show accuracy and AUC.
And you can imagine if you have long running training jobs,
and as I mentioned earlier, in some cases you want to refresh
your model regularly.
And you train a new model every day for a year,
you end up with 365 models, and you can
see how it performs over time.
So this product is called TensorFlow Model analysis.
And it's also available on GitHub.
And everything that I've just shown you
is already open sourced.
So next serving, which is called TensorFlow Serving.
So serving is one of those other areas where
it's relatively easy to set something up
that performs inference with your machine learning models.
But it's harder to do this at scale.
So some of the most important features of TensorFlow Serving
is that it's able to deal with multiple models.
And this is mostly used for actually upgrading a model
version.
So if you are serving a model, and you
want to update that model to a new version,
that server needs to load a new version at the same time,
and then switch over to request to that new version.
That's also where isolation comes in.
You don't want that process of loading a new model to actually
impact the current model serving requests, because that
would hurt performance.
There's batching implementations in TensorFlow Serving
that make sure that throughput is optimized.
In most cases when you have a high requests
per second service, you actually don't want to perform inference
on a batch of size one.
You can actually do dynamic batching.
And TensorFlow Serving is adopted, of course,
widely within Google, and also outside of Google.
There's a lot of companies that have started
using TensorFlow Serving.
What does this look like?
Again, the same graph that we've exported
from either our estimator or our Keras model
goes into the TensorFlow model server.
TensorFlow Serving comes as your library.
So you can build your own server if you want,
or you can use the libraries to perform inference.
We also ship a binary.
And this is the command of how you would just
run the binary, tell it what port to listen to,
and what model to load.
And in this case, it will load that model
and bring up that server.
And this is a code snippet again from our Chicago text
example of how you put together a request
and make, in this case, a GRPC call to that server.
Now, not everyone is using GRPC, for whatever reason.
So we built a REST API.
That was the top request on GitHub for a while.
And we built it such that the TensorFlow model
server binary ships with both the GRPC and the REST API.
And it supports the same APIs as the GRPC one.
So this is what the API looks like.
So you specify the model name.
And, as I just mentioned, it also supports classify,
regress, and predict.
And here's just two examples of an [? iris ?]
model with the classify API, or an [INAUDIBLE] model
with a particular API.
Now, one of the things that this enables
is that instead of Proto3 JSON, which
is a little more verbose than most people would like,
you can actually now use Idiomatic JSON.
That seems more intuitive to a lot of developers
that are more used to this.
And as I just mentioned, the model server ships
with this by default. So when you bring up
the TensorFlow model server, you just specify the REST API port.
And then, in this case, this is just
an example of how you can make a request to this model
from the command line.
Last time I spoke about this was earlier this year.
And I had to make an announcement
that it will be available.
But now we've made that available earlier this year.
So all of this is now in our GitHub repository
for you to use.
Now, what does that look like if we put all of this together?
It's relatively straightforward.
So in this case, you start with the training data.
You use TensorFlow Transform to express your transform graph
that will actually deal with the analyze phase
to compute the metrics.
It will output the transform graph itself.
And, in some cases, you can also materialize the transform data.
Now, why would you want to do that?
You pay the cost of materializing your data again.
In some cases, where throughput for the model at training time
is extremely important, namely when
you use hardware accelerators, you
may actually want to materialize expensive transformations.
So if you use GPUs or TPUs, you may
want to materialize all of you transforms
such that at training time, you can feed the model
as fast as you can.
Now, from there you can use an estimator or Keras model,
as I just showed you, to export your eval graph
and your inference graph.
And that's the API that connects the trainer with TensorFlow
Model Analysis and TensorFlow Serving.
So all of this works today.
I'll have a link for you in a minute that
has an end to end example of how you use all of these products
together.
As I just mentioned earlier, for us
it's extremely important that these products
work in a notebook environment, because we really
think that that barrier between data scientists and product
engineers, or data engineers, should not be there.
So you can use all of this in a notebook,
and then use the same code to go deploy
it in a distributed manner on a cluster.
For the Beam runner, as I mentioned,
you can run it on a local machine in a notebook
and on the Cloud Dataflow.
The Flink runner is in progress.
And there's also plans to develop a Spark
runner so that you can deploy these pipelines on Spark as
well.
This is the link to the end to end example.
You will find it currently lives in the TensorFlow Model
Analysis repo.
So you will find it on GitHub there,
or you can use that short link that takes you directly to it.
But then I hear some people saying, wait.
Actually, we want more.
And I totally understand why you would want more, because maybe
you've read that paper.
And you've certainly seen that graph,
because it was in a lot of the slides that I just showed you.
And we just talked about four of these things.
Right.
But what about the rest.
And as I mentioned earlier, it's extremely important
to highlight that these are just some of the libraries
that we use.
This is far from actually being an integrated platform.
And as a result, if you actually use these together,
you will see in the end to end example it works really well.
But it can be much, much easier once they're integrated.
And actually there is a layer that
pulls all of these components together and makes it
a good end to end experience.
So I've announced before that we will
release next the components for data analysis and validation.
There's not much more I can say about this today other
than these will be available really, really soon.
And I'll leave it at that.
And then after that, the next phase
is actually the framework that pulls all of these components
together.
That actually will make it much, much easier
to configure these pipelines, because then there's
going to be a shared configuration
layer to configure all of these components
and actually pull all of them together, such
that they work as a pipeline, and not
as individual components.
And I think you get the idea.
So we are really committed to making
all of this available to the community, because we've
seen the profound impact that it has had
at Google and for our products.
And we are really excited to see what you
can do with them in your space.
So these are just the GitHub links of the products
that I just discussed.
And, again, all of the things that I showed you today
are already available.
Now, because we have some time, I
can also talk about TensorFlow Hub.
And TensorFlow Hub is a library that
enables you to publish, consume, and discover
what we call modules.
And I'm going to come to what we mean by modules,
but it's really reusable parts of machine learning models.
And I'm going to start with some history.
And I think a lot of you can relate to this.
I've actually heard the talk today that mentioned
some of these aspects.
In some ways, machine learning and machine learning tools
are 10, 15 years behind the tools
that we use for software engineering.
Software engineering has seen rapid growth
in the last decade.
And as there was a lot of growth,
and as more and more developers started working together,
we built tools and systems that made collaboration much more
efficient.
We built version control.
We built continuous integration.
We built code repositories.
Right.
And machine learning is now going through that same growth.
And more and more people want to deploy machine learning.
But we are now rediscovering some of these challenges
that we've seen with software engineering.
What is the version control equivalent for these machine
learning pipelines?
And what is the code repository equivalent?
Well, the code repository is the one
that I'm going to talk to you about right now for TensorFlow
Hub.
So code repositories are an amazing thing,
because they enable a few really good practices.
The first one is, if, as an engineer, I want to write code,
and I know that there's a shared repository,
usually I would look first if it has already been implemented.
So I would search on GitHub or somewhere else
to actually see if someone has already implemented the thing
that I'm going to build.
Secondly, if I know that I'm going
to publish my code on a code repository,
I may make different design decisions.
I may build it in such a way that it's more reusable
and that's more modular.
Right.
And that usually leads to better software in general.
And in general, it also increases velocity
of the entire community.
Right.
Even if it's a private repository within a company,
if it's a public repository and open source, such as GitHub,
code sharing is usually a good thing.
Now, TensorFlow Hub is the equivalent
for machine learning.
In machine learning, you also have code.
You have data.
You have models.
And you would want a central repository
that allows you to share these reusable parts of machine
learning between developers, and between teams.
And if you think about it, in machine learning
it's even more important than in software engineering.
Because machine learning models are much,
much more than just code.
Right.
So there's the algorithm that goes into these models.
There's the data.
There's the compute power that was used to train these models.
And then there's the expertise of people
that built these models that is scarce today.
And I just want to reiterate this point.
If you share a machine learning model,
what you're really sharing is a combination of all of these.
If I spent 50,000 GPU hours to train an embedding,
and share it with TensorFlow Hub,
everyone who uses that embedding can benefit
from that compute power.
They don't have to go recompute that same model
and those same data.
Right.
So all of these four ingredients come together
in what we call a module.
And module is the unit that we care
about that can be published in TensorFlow Hub,
and that can now be reused by different people
in different models.
And those modules are TensorFlow graphs.
And they can also contain weights.
So what that means is they give you
a reusable piece of TensorFlow graph
that has the trained knowledge of the data
and the algorithm embedded in it.
And those modules are designed to be composable so they have
common signatures such that they can be
attached to different models.
They're reusable.
So they come with the graph and the weights.
And importantly, they're also retrainable.
So you can actually back propagate
through these modules.
And once you attach them to your model,
you can customize them to your own data an
to your own use case.
So let's go through a quick example
for text classification.
Let's say I'm a startup and I want
to build a new model that takes restaurant reviews
and tries to predict whether they are positive or negative.
So in this case, we have a sentence.
And if you've ever tried to train some of these text
models, you know that you need a lot of data
to actually learn a good representation of text.
So in this case we would just want to put in a sentence.
And we want to see if it's positive or negative.
And we want to reuse the code in the graph.
We want to reuse the trained weights
from someone else who's done the work before us.
And we also don't want to do this with fewer data
than is usually needed.
An example of these text modules that are already published
are TensorFlow Hub are the Universal Sentence Encoder.
There's language models.
And we've actually added more languages to these.
Word2vec is a very popular type of model as well.
And the key idea behind TensorFlow Hub,
similarly to code repositories, is that the latest research
can be shared with you as fast as possible, and as
easy as possible.
So the use of our Universal Sentence Encoder paper
was published by some researchers at Google.
And in that paper, the authors actually
included a link to TensorFlow Hub
with the embedding for that Universal Sentence Encoder.
That link is like a handle that you can use.
So in your code now, you actually
want to train a model that uses this embedding.
In this case, we train a DNN classifier.
It's one line to say, I want to pull from TensorFlow Hub
a text embedding column with this module.
And let's take a quick look of what that handle looks like.
So the first part is just a TF Hub domain.
All of the modules that we publish,
Google and some of our partners publish,
will show up on TFHub.dev.
The second part is the author.
So in this case, Google published this embedding.
Universal Sentence Encoder is the name of this embedding.
And then the last piece is the version.
Because TensorFlow Hub modules are immutable.
So once they're uploaded, they can't change,
because you wouldn't want a module
to change underneath you.
If you want to retrain a model, that's
not really good for reproducibility.
So if and when we upload a new version of the Universal
Sentence Encoder, this version will increment.
And then you can change to the new code as well.
But just to reiterate this point,
this is one line to pull this embedding column
from TensorFlow Hub, and uses it as an input to your DNN
classifier.
And now you've just basically benefited
from the expertise and the research that
was published by the Google Research
team for text embeddings.
I just mentioned earlier that these modules are retrainable.
So if you set retrainable true, now the model
will actually back propagate through this embedding
and update it as you train with your own data.
Because in many cases, of course,
you still have some small amount of data
that you want to train on, such that the model adapts
to your specific use case.
And if you take the same URL, the same handle, and type it
in your browser, you end up on the TensorFlow website,
and see that documentation for that same module.
So that same handle that you saw in the paper,
you can use in your code as a one liner
to use this embedding, and you can
put in your browser to see documentation
for this embedding.
So the short version of the story
is that TensorFlow Hub really is the repository
for reusable machine learning models and modules.
We have already published a large number of these modules.
So the text modules are just one example that I just showed you.
We have a large number of image embeddings
that are both cutting edge.
So there's a [? neural ?] architecture search module
that's available.
There's also some modules available for image
classification that are optimized for devices so
that you can use them on a small device.
And we are also working hard to keep publishing
more and more of these modules.
So in addition to Google, we now have
some modules that have been published by DeepMind.
And we are also working with the community
to get more and more modules up there.
And, again, this is available on GitHub.
You can use this today.
And a particularly interesting aspect
that we haven't highlighted so far,
but it's extremely important, is that you can use the TensorFlow
Hub libraries also to store and consume your own modules.
So you don't have to rely on the TensorFlow Hub platform
and use the modules that we have published.
You can internally enable your developers
to write out modules to disc with some shared storage.
And other developers can consume those modules.
And in that case, instead of that
handle that I just showed you, you
would just use the path to those modules.
And that concludes my talk.
I will go up to the TensorFlow booth
to answer any of your questions.
Thanks.
[CLAPPING]