字幕表 動画を再生する
[MUSIC PLAYING]
JACQUES PIENAAR: Good afternoon, everybody.
I am Jacques, and I'll be filling in for Tatiana today,
presenting on MLIR, accelerating TensorFlow with compilers.
Now, I don't think I need to tell anybody in this room
that machine learning is everywhere.
There's a wide range of deployments happening
in the industry today--
inference and training happening on the cloud and to the edge.
We also have models getting larger and larger,
and the computational requirements for training
these models ever increasing.
We see a near exponential growth of the complexity and size
and the computational requirements
for training these models.
Now, if you combined the growth in different deployment
strategies, as well as models, velocity is a must.
We need a faster, more scalable way
to build infra to keep up with these bigger complex models
and deployment scenarios.
So we need to build these email systems faster.
We want to unify efforts for extensibility and reusability,
while allowing customization as needed.
So we want to be able to standardize representation
of some basic concepts such as operations and types.
What defines an operation?
How do you define operation or a type?
We want to create a common framework of reusable passes
that you can combine to create your own solutions.
And also, we want to make it such
that it's fully customizable and extensible.
The deployment scenarios and models of five years ago
differ greatly from what we have today,
and so we want a system that's able to scale and adapt
for all the future needs.
With that inter MLIR, we designed
MLIR which stands for multi-level intermediate
representation.
It's an intermediate representation and compiler
framework for TensorFlow and beyond as part
of the [INAUDIBLE] project.
So what is MLIR, and why do we believe
it's a compiler infrastructure for machine learning?
Well, for one, MLIR is state of the art compiler technology.
It's not just a serialization format,
and there's nothing like it.
MLIR is modular and extensible.
You can build different solutions using MLIR--
building blocks that suit your solution.
Importantly, MLIR is not opinionated.
MLIR does not try and force you into a box.
It allows you to create a solution for your problem
space.
MLIR is also fully customizable.
These different deployment scenarios
needs different ways of integrating the components,
and with MLIR, we want to make it
easy for all of these different deployment scenarios to work.
Finally, MLIR is part of the [INAUDIBLE] project.
It's [INAUDIBLE] governments and effectively
on the desk of many compiler developers all around the world
already.
And the industry agrees.
MLIR is strongly supported by our partners.
Some of our partners includes the largest hardware partners
in the world, consisting of 95% of the data center
hardware, four billion mobile phones,
and countless of IoT devices.
Importantly, MLIR is an open community
of academia and industry all working together
to solve this problem of compiling machine learning
models.
So if MLIR-- what if we want to use it for TensorFlow?
Well, we want to use it to build a better TensorFlow.
We want to build better user experience, as well as better
pluggable hardware support.
Now, if you're a user, we want to make it easier
for you to debug your model.
We want to make optimizations transparent
and see what's going on.
We want to make it-- if you have an error
message in your optimized model, we
want to be able to track it back to your original location,
and MLIR's location tracking enables this.
And, of course, we want faster performance.
So going from writing a model to actually
being able to get good performance on your hardware
is essential.
And speaking of hardware, for our hardware partners,
we know it's an awesome time.
There is so many new generations of accelerators coming up
and new accelerators, and we want
to make it simpler and easier to integrate with TensorFlow.
Because while [INAUDIBLE] accelerators are great time,
it's only really interesting when it's usable for our users.
And, of course, for researchers, we
want to provide the standard infrastructure for research.
So going from being able to represent
the different optimization passes
and running it in an end to end workflow on some production
models, we want to make it easy for these researchers
to try new approaches, see new effects, and if it works well,
of course, contribute it.
So let's take another of a closer
look at MLIR, the progressive lowering,
and the infrastructure around MLIR.
Now, you've seen this before in the TensorFlow architecture.
And if we zoom in a little bit, we
can expand the different opponents.
But let's focus on the parts where MLIR will be used.
So a lot of these, as I mentioned before,
the graph representation and optimization format
for these TensorFlow models, but also particularly
for-- in the compilation.
So for optimization and conversion
passes between different computing frameworks,
to compilation of modules, as well as actually
for writing AOT kernels, or generating AOT kernels,
or exploiting these [? handwritten ?] kernels,
MLIR will be involved in all of these different parts.
So as the previous slide showed, we
can and will be using MLIR to do many tasks in TensorFlow
from graph organizations, operation rewrites
and lowerings, graph transformations,
creating frameworks and components, to code generation.
So you think of MLIR as a common graphic representation
and legalization framework.
It's a common set of optimizations and conversion
passes, as well as a full code generation pipeline.
But importantly, as I mentioned, MLIR is modular,
so you can tailor it for your use case.
You can use what you need to solve your problems.
So for example, you can reconfigure MLIR
for a [? graphic ?] writing so you can--
and that's, for example, how we use it for the new TensorFlow
or TensorFlow Lite converter--
just using the parts we actually need to get the final product
that we want.
So what about-- so let's talk a little bit
about progressive lowering.
The ML in MLIR stands for multi-level.
MLIR enables you to represent multiple different levels
of operations, all in the same IR.
So from a TensorFlow operation to XLA HLO to LLVM IR
all can be represented in MLIR.
You can lower, progressively, from one form to another,
and you don't need-- and all of these can coexist together.
So for example, you can have a function that
actually has [INAUDIBLE] and HLO up and LLVM IR.
This ability to mix and match these different levels
of abstractions and dialects gives great power
in actually modeling the problems
to suit what your hardware specialization needs.
But what about XLA?
So we're using what we learned from XLA to build MLIR.
XLA is a great exhilaration tool for models with stable tensor
shapes.
And so for example, the TF function API in TensorFlow 2.2
enables great performance improvements, exploiting XLA,
and we've made sure that they work really well together.
And we are working on ensuring that there's
full interoperability between MLIR and XLA.
And speaking of full interoperability,
we are working very hard to make MLIR
and all existing TensorFlow components all
interact very well.
So whether you want to import or export
from a graph, XLA HLO proto, or TF Lite Flatbuffer,
all of these are possible.
So you can mix and match your workflows with XLA.
Importantly, MLIR allows for open integration
at any level of the stack.
So you can start with a TensorFlow graph,
import it into MLIR, lower it to HLO, [INAUDIBLE] HLOs,
or go further and lower it to LLVM IR
and then [? Code Gen. ?] MLIR allows you to hook
into any parts of this configuration,
and in particular, MLIR does not require that you only use one,
so if for your problem you need a combination
of these ops, that's possible.
So this makes it very easy to incrementally enable
MLIR in conjunction with your existing tools.
Now, let's look at MLR in action.
So we'll just take a look at the new TF Lite converter,
as well as the new features provided by MLIR there.
Now, the new TF to TF Lite converter
launched just in February this year.
Very excited about it.
So starting from a TensorFlow graph model,
importing it to MLIR, doing all the optimizations,
legalizations, and then finally exporting
to TF Lite Flatbuffer for TensorFlow
Lite to runtime to execute.
All of these with better error messages--
so being able to find out what went wrong during conversions
and give more actual feedback.
To support for TensorFlow control flow,
you can finally deploy some of these models
with control flow on the edge, and also
with a new unified quantization workflow.
Now, looking ahead beyond the converter,
you'll see MLIR in action in a lot of different places
in TensorFlow.
In particular, I mentioned MLIR as being the graph
representation optimization framework in TensorFlow,
so we'll be unifying the different graph optimization
infrastructure that we have, as well
as all the different converters using MLIR.
Another part that's very important for us
is the partner integration and supporting for new hardware.
As I mentioned, new hardware is coming up every day.
We want to make it very easy for folks
to integrate with TensorFlow.
So especially if you're a partner,
please reach out to TensorFlow team if you want to get
involved in this discussion.
And also, for code generation, we're enhancing MLIR.
We're looking at more advanced code generation, particularly
code generation with dynamic shapes.
And MLIR is also integrating very tightly
with optimization and code gen with the new TensorFlow
runtime.
So there's many different ways of getting involved.
Like I mentioned, MLIR is a open community.
We have open design meetings where everybody
can sign in and ask questions.
There's talks from the team, from other teams.
We have the TensorFlow MLIR special interest group.
And of course we have code on GitHub,
[INAUDIBLE] repo, as well as TensorFlow repo.
So feel free to send some [? PRs ?] and fix some bugs
and add new features and get involved.
And with that, thank you.
[MUSIC PLAYING]