Placeholder Image

字幕表 動画を再生する

  • QIUMIN XU: Hello, everyone.

  • I am Qiumin.

  • I am a software engineer at Google working

  • on TensorFlow Performance.

  • Today I'm very excited to introduce you

  • to our brand new TensorFlow 2 Performance Profiler.

  • We all like speed, and we want our models to run faster.

  • TensorFlow 2 Profiler can help you

  • improve your model performance like a professional player.

  • In this talk, we're going to first talk

  • about what's new in TF2 Profiler,

  • and then we'll show you a case study.

  • I'm a performance engineer, and this

  • is how I used to start my day.

  • In the morning, I ran a model and a capture trace of it.

  • I would gather the profiling results

  • in a spreadsheet to analyze the bottlenecks

  • and optimize the model.

  • We also have gigabytes of traces,

  • and to process all of them manually

  • is boring and time-consuming.

  • Then, after that, we run the model again

  • to check for performance.

  • If your performance is quite good,

  • hooray, we have done our job.

  • Go and grab coffee.

  • Otherwise, we will go back to step one--

  • recapture a profile, gather results,

  • and find out the reason, fix it, and try again.

  • Repeat this iteration by n times until the performance is good.

  • This is a typical day of a performance engineer.

  • Can we make it more productive?

  • The most repeated work here is to gather the trace information

  • and analyze the result. We always want to work smarter.

  • At Google, we find out a way to build

  • tools to automatically process other traces,

  • analyze them, and provide automated performance guidance.

  • It does intensive trace analysis,

  • learns from how Google internal experts tune the performance

  • and automate it for non-expert users.

  • Here's the thing I'm very excited about.

  • We are releasing this most useful set of internal tools

  • today as a TF2 Profiler.

  • The same set of tools in TF2 Profiler

  • has been used extensively inside Google,

  • and we are making it available to public.

  • Let me introduce you to the toolset.

  • Today, we will launch eight tools.

  • Four of them are common to CPU, GPU, and TPUs.

  • This enables consistent metrics and analysis

  • across different platforms.

  • The first tool is called Overview Page.

  • This tool provides an overview of the performance

  • of the workload running on the device.

  • The second tool is Input Pipeline Analyzer.

  • It is very powerful tool to analyze the TensorFlow Input

  • Pipeline.

  • TensorFlow rates data from the files in the pipeline demand.

  • And an inefficient input pipeline severely

  • slows down your application.

  • This tool presents an in-depth analysis of your model input

  • pipeline performance, based on various performance

  • data collected.

  • At the high level, this tool tells you

  • whether your program is input bound.

  • If that is the case, the tool can also

  • walk you through the device and the host-side analysis

  • to debug which stage of the pipeline is the bottleneck.

  • The third tool we released today is called TensorFlow Stats.

  • TensorFlow Stats presents TensorFlow ops statistic

  • in charts and tables.

  • The fourth tool we released today is called Trace Viewer.

  • Trace Viewer tool displays detailed event timeline

  • for in-depth performance debugging.

  • We also provide four tools that are TPU or GPU specific.

  • They are all available today on TensorFlow.

  • Please check out.

  • Now let's look at the case study.

  • Let's assume that we are running an un-optimized Resnet50

  • Model on a V100 GPU.

  • TF2 Profiler provides a number of ways to capture a profile.

  • In this talk, we will focus on Keras callback.

  • To check out other ways of profiling,

  • including sampling and the programatically profiling,

  • refer to TensorFlow docs for more details.

  • Using Keras TensorBoard callback,

  • we simply need to add an additional line specifying

  • profiling range.

  • The argument profile_batch equals to 150 to 160

  • here indicates we are start to profile from batch 150 to 160.

  • Run a model, launch TensorBoard, and go to the Profile plugin.

  • Here's a Performance Overview.

  • Let's remain and look at the Performance Overview page.

  • It contains three sections--

  • Performance Summary, Step-time Graph,

  • and the Recommendation for the Next Step.

  • Let's zoom into each of them.

  • First, let's look at the performance summary.

  • It shows the average step-time and breaks

  • it down into the time spent on compilation,

  • input output, kernel lunches, and the communication time.

  • The next is a step-time graph.

  • We can see the step-time is broken down

  • into compilation time, kernel launch,

  • compute, compute communication as well,

  • and you can see how these breakdown changes

  • over a number of steps.

  • In this example, there's a lot of redness in this chart,

  • and indicates it is severely input bound.

  • The next is what I feel most excited about.

  • This is the recommendation provided by our tool.

  • Assess-- your program is highly input bound

  • because 81.4% of the total step-time sampled

  • is waiting for input.

  • Therefore, we should first focus on reducing the input time.

  • Overview page also provides a recommendation on which tool

  • you should check out next.

  • In this example, Input Pipeline Analyzer and the Trace Viewer

  • are the next tools to see.

  • In addition, this tool also suggests the related useful

  • resources to check out to improve the input pipeline.

  • Let's follow this recommendation and check out the Input

  • Pipeline Analyzer tool.

  • See, this is the host analysis breaking down,

  • provided by the tool.

  • It automatically detects the most time

  • spent on the data processing.

  • What should we do next?

  • Our tool actually tells you what can

  • be done next to reduce the data preprocessing.

  • This is what is recommended by our tool.

  • You may increase the number of parallel calls

  • in the dataset map or process the data offline.

  • If you follow the link on the dataset map,

  • you will see how to do that.

  • According to the guide, we change the sequential map

  • to use a parallel course.

  • We are also not to forget to try the most convenient

  • autotune team option, which will tune the value

  • dynamically at runtime.

  • After this optimization, let's capture a new profile.

  • Now you can see the redness is all

  • gone in the step-time graph, and the model

  • is no longer input bound.

  • Checking the performance summary again, now you get 5x speedup.

  • Overview page now recommends differently.

  • It says your program is not input bound because only 0.1%

  • of the total step-time sample is waiting for input.

  • Therefore, you should instead focus on reducing other time.

  • Here's another thing we can do.

  • If you look at the other recommendations,

  • the model is all using 32 bits.

  • If you replace all of them by 16 bits, you can get 10x speedup.

  • This release is just the beginning,

  • and we have more features upcoming.

  • We are working on Keras-specific analysis

  • and the multiworker GPU analysis.

  • Stay tuned.

  • We also welcome your feedbacks, and please let us

  • know and contribute your ideas.

  • TensorFlow 2 Profiler is the tool

  • you need for investigating TF2 performance.

  • It works on CPU, GPU, and TPU.

  • Here's more things to read--

  • a tutorial, guide, and Github source code.

  • There are also two more related talks on performance

  • tuning in this afternoon.

  • They are super exciting, and don't miss them.

  • Finally, I want to thank everyone

  • who worked on this project.

  • You are super amazing teammates.

QIUMIN XU: Hello, everyone.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

TF 2におけるパフォーマンスプロファイリング (TF Dev Summit '20) (Performance profiling in TF 2 (TF Dev Summit '20))

  • 2 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語