Name: TFモデル最適化ツールキットでモデルを最適化 (TF Dev Summit '20) (Optimize your models with TF Model Optimization Toolkit (TF Dev Summit '20))
Uploaded: 2021-01-14T10:37:36.000Z
Duration: 17 min 9 s
Description: VoiceTubeの動画で発音を聞きながら英語表現を覚えよう！学べる英語：

I'm a software engineer on the TensorFlow team.

be talking about the TensorFlow model optimization toolkit.

Model optimization means transforming your machine

learning models to make them efficient to execute.

That means faster computation as well as a lower memory,

And it is focused on inference instead of training.

And because of the above mentioned benefits,

Examples include speech recognition, face unlock,

object detection, music recognition, and many more.

The model optimization toolkit is a suite

that make it simple to optimize your model.

and across various hardware accelerators.

There are two major techniques in the toolkit, quantization

Quantization stimulates flow calculation in lower bits,

Today we are going to focus on quantization

Now let's take a closer look at quantization.

Quantization is a general term describing technologies

that reduce the numerical precision of static parameters

and execute the operations in lower precision.

Precision reduction makes the model smaller,

and a lower precision execution makes the model faster.

Now let's dig a bit more onto how we perform quantization.

In most cases, we are wasting most of the representation

If we can find a linear transformation that

maps the float value onto int8, we can reduce the model size

Then computations can be carried out between int8 values,

and that is where the speed up comes from.

So there are two main approaches to do quantization, post

Post training operates on a already trained model

and is built on top of TensorFlow Lite converter.

During training, quantization performs additional weight

fine-tuning, and since training is required,

it is a build on top of a TensorFlow Keras API.

The most easy to use technique is the dynamic range

quantization, which doesn't require any data.

There can be some accuracy loss but we get a two to three times

to run the model on hardware accelerators,

It runs a small set of unlabeled calibration data

to collect the min-max range on activation.

This removes the floating point calculation

in the computer graph, so there is a speed up on CPU.

But more importantly, it allows the model

to run on hardware accelerators such as DSP and TPU,

which are faster and more energy efficient than CPU.

use Quantization Aware Training to fine-tune the weights.

It has all the benefits of integer quantization,

Now let's have a operator level breakdown on the post training

Dynamic range quantization is fully supported

The missing piece is the recurrent neural network

such as speech and language where a context is needed.

To unblock those use cases, we have recently

added a recurrent neural network quantization

and built a turnkey solution through the post training API.

RNN model build with Keras 2.0 can be converted and quantized

We then set the post training optimization flags

After that, we are able to call the convert method to convert

This is the exact same API and workflow for models

without RNN, so there is no API change for the end users.

字幕リスト動画再生

TFモデル最適化ツールキットでモデルを最適化 (TF Dev Summit '20) (Optimize your models with TF Model Optimization Toolkit (TF Dev Summit '20))

technique

basically

completely

accurate