Placeholder Image

字幕表 動画を再生する

  • Test.

  • >> Hi, everybody.

  • We have a big show for you today.

  • So, if you have -- now would be a

  • great time to turn -- the exit behind you.

  • [ Applause ]

  • >> Hi, everybody.

  • Welcome.

  • Welcome to the 2018 TensorFlow Lite summit.

  • We have a good day with lots of cool talks.

  • As you know -- we are embarking on the

  • --

  • and the controllers in Europe are

  • useing

  • to project the trajectory of flight

  • through the air of Belgium, Luxembourg, Germany and the

  • Netherlands. This has more than 1.

  • 8 million flights and it is one of the

  • most dense air spaces in the world.

  • And teary farming.

  • We know that a cow's health is vital

  • to the survival of the dairy industry.

  • And -- connected our company in the Netherlands, they wondered

  • if they can

  • use machine learning to track the health

  • of cows and be able to provide insights to farmers and

  • veterinarians on actions

  • to be taken to ensure we have happy,

  • healthy cows that are high yielding.

  • In California, and also from the Netherlands.

  • And

  • --

  • music, the machine learning algorithm,

  • the neural networks --

  • >> And changed by machine learning.

  • The popular Google home, or the pixel or search or YouTube or

  • even maps. Do you know what is fascinateing in all of these

  • examples?

  • TensorFlow is at the forefront of them. Makeing it all

  • possible.

  • A machine learning platform that can solve challengeing problems

  • for all of us.

  • Join us on this incredible journey to make TensorFlow

  • powerful, scaleable and the best machine learning platform for

  • everybody.

  • I now -- with TensorFlow to tell us more about this. Thank you.

  • >> So, let's take a look at what we have been doing over the last

  • few years. It's been really amazing.

  • There's lots of new -- we have seen the popularity of

  • TensorFlow Lite grow. Especially over the last year,

  • we focused on makeing TensorFlow easy to

  • use, and the degrees -- and new

  • programming paradigms like -- execution really make that

  • easyier. Earlier this year, we hit the

  • milestone of 11 million downloads. We are really

  • exciteed to see how much users are uses this and how much

  • impact it's had in the world.

  • Here's a map showing self-identifyied

  • locations of folks on Git hub that

  • started TV -- TensorFlow. It goes up and down. In fact,

  • TensorFlow is useed in every time zone in the world.

  • An important part of any open source product is the

  • contributeors themselves. The people who make this project

  • successful. I'm exciteed to see over a thousand contributeors

  • from outside Google who

  • are makeing contributions not just by improveing code, but

  • also by helping the rest of the committee by answering

  • questions,

  • responding to queries and so on.

  • Our commitment to this community is by

  • share

  • -- sharing our direction in the

  • roadmap, have the design direction, and

  • focus on the key needs like TensorBoard. We will be talking

  • about this later this afternoon in detail.

  • Today we are launching a new TensorFlow Lite blog. We'll be

  • shareing work by the team in the community on this blog, and we

  • would like to invite you to participate in this as well.

  • We're also launching a new YouTube channel for TensorFlow

  • that brings together all the great content for TensorFlow.

  • Again, all of these are for the community to really help build

  • and communicate. All day today we will be shareing a number of

  • posts on the blog and videos on the channel.

  • The talks you are hear hearing here will be made available

  • there as well, along with lots of conversations and interviews

  • with the speakers.

  • To make views and shareing easyier, today we are launching

  • TensorFlow hub.

  • This library of components is easyily integrateed into your

  • models. Now, again, goes back to really

  • makeing things easy for you.

  • Library.

  • With the focus on deep learning and neural networks. It's a

  • rich collection of machine learning environments.

  • It includes items like regressions and

  • decision trees commonly used for many structured data

  • classification problems. There's a broad collection of

  • state of

  • the art tools for stats and

  • Baysian analysis.

  • You can check out the blog post for details.

  • As I mentioned earlier, one of the big key focus points for us

  • is to make TensorFlow it easy to use. And we have been pushing

  • on simpler APIs, and making them more intuitive. The lowest

  • level -- our focus is to consolidate a lot of the APIs we

  • have and make it easier to build these models and train them.

  • At the noise level the TensorFlow APIs are really

  • flexible and let users build anything they want to.

  • But these same APIs are easier to use.

  • TensorFlow contains a full

  • implementation of Keras. You can offer lots of layers to

  • train them as well.

  • Keras works with both executions as well. For distributed

  • execution, we provide

  • estimators so you can take models and distribute them

  • across machines.

  • You could also get estimators from the Keras models.

  • And finally, we provide premade estimators. A library of ready

  • to go implementations of common machine learning environments.

  • So, let's take a look at how this works. So, first, you

  • would often define were model. This is a

  • nice and easy way to define your model. Shows a convolution

  • model here with just a few lines here.

  • Now, once you've defined that, often

  • you want to do some input processing.

  • We have a great idea of the data introduced in 1.

  • 4 that makes it easy to process inputs and lets us do lots of

  • optimizations behind the scenes. And you will see a lot more

  • detail on this later today as well. Once you have those, the

  • model and the info data, now, you can put them

  • together by equating the data-set, computing gradients

  • and updating parameters themselves.

  • You need a few lines to put these together.

  • And you can use your debugger to debug that and involve problems

  • as well.

  • And, of course, you can do it even fewer lines by using the

  • pre-defined lines we have in Keras.

  • In this case, it executes the model as a graph with all the

  • optimizations that come with it. This is great for a single

  • machine or a single device.

  • Now, often, given the high, heavy computation needs for

  • deep learning or machine learning, we

  • want to use more than one actuator. We have estimators.

  • The same datasets that you had, you

  • can build an estimator and really use

  • that to train across the cluster or multiple devices on a single

  • machine. That's great. Why not use a cloud cluster?

  • Why not use a single block box if you can do it faster?

  • This is used for training ML models at scale. And the focus

  • is to take everything you have been doing and build a TPU

  • estimator to allow you to scale the same model.

  • And finally, once you have trained

  • that model, use that one line at the

  • bottom for the deployment itself. Your deployment is

  • important, you often do that in data centers. But more and more

  • we are seeing the need to deploy this on the phones, on other

  • devices as well.

  • And so, for that, we have TensorFlow lithe.

  • And we have a custom format that's designed for devices and

  • lightweight and really fast to get started with. And then once

  • you have that format, you can include that in your

  • application, integrate

  • TensorFlow Lite with a few lines, and you have an

  • application to do predictions and include ML. Whatever task

  • you want to perform.

  • So, TensorFlow runs not just on many platforms, but in many

  • languages as well.

  • Today I'm excited to add Swift to the mix. And it brings a

  • fresh approach to machine learning.

  • Don't miss the talk by Chris Lattner this afternoon that

  • covers the exciting

  • details of how we are doing this. JavaScript is a language

  • that's synonymous with the web development community.

  • I'm excited to announce TensorFlow.JS, bringing it to

  • the web developers. Let's take a brief look at this.

  • The same TensorFlow applications in JavaScript, you

  • can call them just as plain JavaScript code.

  • And a full-fledged layer of API on top. And full support for

  • TensorFlow and

  • Keras models so you can pick the best deployment for you.

  • And under the covers, these APIs are actuated.

  • And we have NodeJS support coming

  • soon, which will give you the power to

  • actuate on CPUs and GPUs.

  • And I would like to welcome Megan

  • Kacholia to talk about how TensorFlow does performance.

  • [ Applause ] >> Thank you. All right.

  • Thanks, Rajat so, performance across all platforms is critical

  • to TensorFlow Lite's success. I want to take a quick step back

  • and talk about some of the things we think about when

  • measuring and assessing TensorFlow's performance. One

  • of the things we want to do is focus on real world data and

  • time to accuracy. We want to have reproducible benchmarks and

  • make sure they're realistic of the workloads and types of

  • things that users like you are doing on a daily basis. Another

  • thing, like Rajat talked about, is we want to make sure we have

  • clean APIs.

  • And we don't want to have a fast version and a pretty version.

  • The fast version is the pretty version.

  • All the APIs that we talked about that we're talking about

  • through various talks, these are the things you can use to get

  • the best performance out of TensorFlow. You don't have to

  • worry about what is fast or pretty, use the pretty one, it

  • is fast.

  • TF Data from Derek after the keynote. As well as

  • distribution strategy from Igor. And these are great examples of

  • things

  • we have been pushing on to ensure good performance and good

  • APIs. We want good performance, whether it's a large data center

  • like here, or maybe you're using something like on the image

  • here.

  • A GPU or CPU box under your desk.

  • Making use of a cloud platform or a mobile or embedded device.

  • We want TensorFlow to perform well across all of them. Now

  • the numbers, because what is a performance talk if I don't show

  • you slides and numbers. First, look at things on the mobile

  • side.

  • This is highlighting TensorFlow Lite performance. There's a

  • talk giving a lot more detail how it works and the things we

  • were thinking of when making it later

  • today

  • by Sarah.

  • And weft the speed yum with Qu -- and it's critical to have

  • strong performance regardless of the platform, and we're really

  • excited to see these gains in mobile. In looking past mobile,

  • just beyond, there are a number of companies in the

  • hardware space which continues to expand. The contributions

  • that come out of the collaborations that we have with

  • these companies, the contributions they give back to

  • TensorFlow and back to the community at large, are critical

  • to making sure that TensorFlow performs well on these specific

  • platforms for the users that each group really cares about.

  • One of the first ones I want to highlight is Intel.

  • So, the Intel MKL-DNN library, open

  • sourced and highly optimized for TensorFlow.

  • We have a 3X inference speedup on Intel platforms, as well as

  • great scaling efficiency on training. And this is one of

  • those things that highlights how important it is to have strong

  • collaborations with different folks in the community. And

  • we're excited to see things like this to go back to all the

  • users.

  • And I want to call out a new of the collaborations with NVIDIA

  • as well.

  • And Tensor RT, an inference Optimizer and we have been

  • working on this for a long time. It's been

  • around for a little while. But with the TensorFlow 1.

  • 7 release, we have native support built in.

  • You can get low latency, high throughput.

  • You can see an inference speedup

  • versus native 32 with standard TensorFlow. It's great to see

  • the collaborations and the contributions and the great

  • numbers delivered by it. Looking past inference and going

  • on to some of the training things.

  • So, mixed-precision training is important.

  • As faster and more hardware comes out, if you use the

  • support, that's how to get the best out of the hardware.

  • One of the examples is the Tesla V100 that NVIDIA has.

  • And we want the mixed-precision

  • training support get the best performance out of the hardware.

  • You can see a training speedup.

  • This is on an 8X Tesla V100 box. You can see the performance

  • improvement moving to mixed-precision training versus

  • the standard TensorFlow. Scaling efficiency is really

  • important as well. Obviously we want to make sure

  • TensorFlow flows well, maybe a single GPU. But keep going

  • regardless of what you throw at it. We want to make sure that,

  • again, looking at examples for real-world data as well as

  • synthetic data, it's great to

  • benchmark on synthetic data, but real-world needs to perform as

  • expected.

  • 90% with real data and 95% with synthetic data.

  • This is a V100 box that has one 1 and 8GPUs. And you can see

  • the scaling here. But this is something we care about,

  • and you're going to hear more about scaling efficiency with

  • internal APIs later today.

  • Moving past -- moving on to cloud frameworks. I want to

  • talk about cloud TPUs. Cloud TPU was launched in beta in

  • February.

  • Just a month and a half ago.

  • This is Google's V2 TPU. It's available by Google's cloud

  • platform, like I mentioned. It's exciting to look at the

  • numbers here. This picture is showing a device.

  • And on a single device, you can get 180 teraflops of

  • computation. But it's not just about the power, it's

  • about what you can do on this. Doesn't matter if you can't run

  • the types of models that you want.

  • I want to highlight the reference models that we have

  • open sourced that are available today as well as a bunch more

  • types of models coming soon. Again, just to highlight the

  • breadth of things you can run on the hardware and get great

  • performance and great accuracy.

  • We have an internal team continually making sure that

  • these models perform well, they perform fast and they are also

  • training to accuracy in the expected amount of time.

  • It's not just about putting a model out and open sourcing it,

  • but it's making it work as the community expects it to work.

  • Again, some numbers. What good is it if I don't show numbers in

  • the performance talk?

  • One of the numbers I want to call out

  • on this slide, the cost to ImageNet is under $85. It's

  • exciting to see what you can

  • achieve but making use of this platform. And if you want more

  • numbers, you can

  • look at the DAWNBench rentry that was submitted for

  • the ImageNet training. One final exciting thing to call out

  • is the available of pods, later this year. So, what's a pod? A

  • pod is actually 64 of the devices, like I showed earlier,

  • all wired up together. And you get about 11.

  • 5 petaflops of computation in a pod. That is a lot of compute

  • power that is going to be available in this. What can you

  • do with that? The team has been pushing training resident 50 on

  • a pod to accuracy in less than 15 minutes. We're very excited

  • what can be done with this type of hardware. And just the

  • amazing speed that you can get that wasn't really possible

  • before.

  • So, Rajat talked about the APIs and just the ease of use and

  • things we're focusing on. I have given you some numbers.

  • But what happens when you put it together? What can TensorFlow

  • do? I want to invite Jeff Dean, the leader of the brain team,

  • to come up and talk a bit more about how TensorFlow addresses

  • real problems. [ Applause ]

  • >> Thanks, Megan. So, I think one of the really remarkable

  • things about machine learning is it's capacity to solve real

  • problems in the world. And we've seen tremendous progress

  • in the last two years. In 2008, though, the U.S. National

  • academy of engineering put out this list of grand engineering

  • challenges that they were hoping to be solved by the end of the

  • 21st century. It's 14 different challenges.

  • And I think it's a really nice list of things we should be

  • aspiring to work on as a society.

  • If we solved all these problems, our planet would be healthier,

  • people would

  • live longer, we would be happier and things would be better. And

  • I think machine learning is going to help us in all of

  • these. Some in small ways. Machine learning influencing our

  • understanding of chemical molecules. Some in major ways.

  • I'm going to talk about two today. But I think machine

  • learning is a key to tackling these areas.

  • The two are advancing health informatics and engineering of

  • tools for scientific discovery. Clearly machine learning is a

  • big

  • component, and TensorFlow itself you can think of as a tool for

  • helping us engineer some of these discovers. But one of the

  • things that I think is really important is that there's a lot

  • more opportunity for machine learning than there is machine

  • learning expertise in the world.

  • The way you solve a machine learning problem today is, you

  • have some data.

  • You have some computation, maybe GPUs

  • or TPUs or CPUs, and then a machine learning expert.

  • Someone who has taken a graduate class in machine learning.

  • Downloaded TensorFlow and familiar enough to play with it.

  • But that's a small set of people in the world. And then you stir

  • all this together and you get a solution, hopefully.

  • So, that's -- the unfortunate thing about that is there's

  • probably tens of thousands of organizations in the world today

  • that are actually effectively using machine learning in

  • production environments and really making use of it

  • to solve problems. But there's probably tens of millions

  • of organizations in the world that have

  • data in a form that could be used for machine learning but

  • don't have the internal expertise and skills. How can

  • we make machine learning much easier to use so you don't need

  • nearly as much expertise to apply it?

  • Can we use computation to replace a lot of the need for

  • the machine learning expertise? We have been working on a suite

  • of techniques, AutoML. And neural architecture search is

  • one example of this. One of the things a machine learning expert

  • does is they sit down and decide for a particular problem what

  • kind of model structure they're going to use for the problem.

  • A resident 50 architecture, a nine-layer CNM with these

  • filter sizes and so on.

  • It turns out that you can use machine

  • learning to optimize a controller that proposes machine

  • learning models. You can have the controller propose machine

  • learning models, train it on the problems you care about, see

  • which work well and which don't and use that feedback as a

  • reinforcement signal for the Model generating model. You can

  • steer it towards models

  • working well for particular problems and away from the space

  • where they're not working well. If you repeat this, you can get

  • powerful, high-quality models. And they look a little weird.

  • So, this is not something a human machine learning expert

  • would sit down and sort of extract.

  • But it has characteristics of things we know human machine

  • experts have discovered are helpful. If you think of the

  • resident architecture, it has skip connections that allow you

  • to skip every layer.

  • These more organic-looking connections are the same

  • fundamental idea, which is you want to allow input data to flow

  • more to the output without going through as many

  • computational layers.

  • So, the interesting thing is, AutoML does quite well here.

  • Every dot here -- this is a graph

  • showing computational cost versus accuracy for ImageNet.

  • And every dot shows different kinds of tradeoffs. And

  • generally, as you expend more computation, you get higher

  • accuracy. But every dot here is sort of the work

  • of years of effort -- a cumulative effort by top machine

  • learning experts in the world.

  • And if you run AutoML, you get better accuracy and better

  • computational tradeoffs than all those models. That's true at

  • the high end, you care

  • about utmost accuracy and don't care about computational budget.

  • But at the low end, you have low weight with low computational

  • cost and high accuracy. That is exciting. I think this is a

  • real opportunity to use more computation to solve real

  • machine learning problems in a much more automated way so that

  • we could solve more problems more quickly. We have released

  • this in collaboration with the cloud

  • group at Google as a -- as a product that

  • customers can use for solving their own computer vision

  • problems.

  • And obviously the one I have has lots of other categories of

  • problems. Okay. Advance health informatics.

  • So, machine learning and health care is going to be a really

  • impactful combination. One of the areas that we have been

  • working on is a variety of different

  • medical imaging problems, including one problem in --

  • where you're trying to

  • look at an image and diagnose if that

  • image shows science of diabetic retinopathy. 400 million people

  • are at risk of this around the world. It's very treatable if

  • it's caught in time. But if it's not, you can suffer vision

  • loss.

  • So, in many parts of the world, there aren't enough

  • ophthalmologists to inspect these images. And so, we have

  • done work, and with work that we've published in the very end

  • of 2016, we showed that we had a model that was on par with board

  • certified ophthalmologists. And since then,

  • we have been continuing to work on this. And we've changed how

  • we sort of label our training data.

  • We've gotten retinal specialists to

  • label the training data rather than general ophthalmologists.

  • And we have it on par with retinal specialists, a higher

  • standard of care. We can bring this and deliver this to lots

  • and lots of places around the world. But more interestingly,

  • I'm going to tell you a tale of scientific discovery.

  • So, we had a new person join the retinopathy team.

  • And as a warmup exercise, Lily who leads this work, said to

  • this new person, hey, why don't you go off and see if you can

  • predict age and gender from the images.

  • Maybe age within a couple of decades, and no gender.

  • The AUC should be .5. They came back, I can predict gender with

  • AUC of .7. That must be wrong. Go off and come back later.

  • And they came back and said my AUC is now .85. That got us

  • thinking. And we investigated what other kinds

  • of things we could predict from these retinal

  • images. You can predict a variety of things indeed caytive

  • of cardiovascular health.

  • Your age and gender are signs of cardiovascular health. Your

  • hemoglobin level, lots of things like this.

  • We have a new, non-invasive test for cardiovascular health.

  • Normally, you have to draw blood and do lab tests, but we can do

  • this just from an image. Which is pretty cool.

  • We're also doing a bunch of work on predictive tasks for health

  • care given a patient's medical record, can we predict the

  • future? This is something doctors want to do. Understand

  • how your patient is going to progress. And you want to be

  • able to answer lots of kinds of questions. Will the patient be

  • readmitted if I release them now?

  • What are the most likely diagnosis I should be thinking

  • about? What tests for this patient? Lots of questions like

  • that. And we have been collaborating with several

  • health care organizations to

  • work on identified health care records to predict these things.

  • In January, we posted a many

  • author archive paper and looked at these tasks. Highlighting

  • one here.

  • Predicting which patients are most at risk for mortality, and

  • using this,

  • we're able to predict which patients are most seriously at

  • risk 24 hours earlier than the clinical baselines that are

  • currently in use. That means that doctors get 24 hours of

  • advanced notice to pay attention to the patients critically ill

  • and need their close attention and close watching.

  • This is indicative of what machine learning can do.

  • The Google brain team's mission is to make machines intent and

  • approve lives. I'm going to close with a bit of a story.

  • So, when I was 5 years old, I lived in northwestern Uganda for

  • a year.

  • And the local crop there is a root called cassava. And I was

  • 5.

  • So, I liked to go out and help people pick cassava. But it

  • turns out that machine learning

  • and cassava have a kind of cool twist together. Please roll the

  • video.

  • >> Cassava is a really important crop. It provides for over 500

  • million Africans every day. >> When all other crops failed,

  • farmers

  • know they could rely on their cassava plants to provide them

  • food. >> We have several diseases that

  • affect cassava, and these diseases make the roots

  • inedible.

  • It is very crucial to actually control and manage these

  • diseases. >> We are using machine learning

  • to respond to the diseases.

  • >> And TensorFlow is the best foundation for our solutions.

  • The app can diagnose multiple diseases.

  • It's nuru, Swahili for light. You wave your phone over a leaf,

  • if it has a symptom, the box will pop up, you have this

  • problem. When we get a diagnosis, we have an option for

  • advice and learn about management practices.

  • : the object we use through TensorFlow relies upon our team

  • annotating images. >> We have collected over 5,000

  • high-quality images of different cassava diseases throughout this

  • project.

  • We use a single model on the mobile net architecture. It's

  • able to make predictions in less than one second.

  • >> Instead of implementing thousands of lines of code,

  • TensorFlow has a library of functions to allow us to build

  • architectures in much less time. >> We need something that can be

  • deployed on a phone without any connection.

  • TensorFlow is able to shrink these neural networks.

  • >> The human input is critical. We're building something that

  • augments your experience and makes you better at your job.

  • >> So, with AI tools and machine learning, you can improve the

  • yields, you can protect your crops, and you can have a much

  • more reliable source of food.

  • >> AI offers the prospect to fundamentally transform the life

  • of

  • hundreds of millions of farms around the world.

  • >> You can see a product that can actually make someone's life

  • better.

  • This is kind of revolutionary

  • .

  • >> Cool. And I think we have some members of

  • the Penn State and IITA teams from Tanzania here today. If

  • you could all stand up or wave. And I'm sure they would be happy

  • to chat with you. [ Applause ]

  • I'm sure they would be happy to chat you at the break about that

  • work. With that, I would like to introduce

  • Derek Murray who is going to talk to you about tf.data.

  • That's a way to describe an input line. Derek. Thanks.

  • [ Applause ] >> Okay. Thank you, Jeff.

  • And, wow, it's amazing to see how people are using TensorFlow

  • to make the world a better place.

  • As Jeff said, my name is Derek Murray,

  • I'm thrilled to be here to tell you about tf.data.

  • It helps you get your data from cat

  • pictures to cassava pictures into TensorFlow. They're

  • usually overshadowed by the more glamorous aspects of machine

  • learning, matrix, convolution, but I would argue they're

  • extremely important. Input data is the life blood of machine

  • learning. And the current algorithms and

  • hardware are so thirsty for data, we need a powerful input

  • pipeline to keep up with them. There we go.

  • So, when I kicked off the tf. data project last year,

  • TensorFlow had room to improve. You could feed in data from

  • Python at

  • each step, kind of slow, or set up curators to feed in data.

  • These were challenging to use.

  • So, I decided to focus on three themes, the main focus of my

  • talk today. First is performance.

  • When we're training models on state of

  • the art accelerator hardware, we have a

  • moral imperative to keep them busy with new data at all times.

  • The second is flexibility. We want to handle any data in

  • TensorFlow itself. We don't want to have to rely on

  • different tools so you can experiment with different views.

  • And third, we have ease of use.

  • With TensorFlow, we want to open up machine learning to users of

  • all abilities. And it can be a big leap from following along

  • with your first tutorial

  • to training your first on your own data.

  • And we can help smooth this transition. Tf.

  • data is the only library you need for TensorFlow. How to do

  • that? We took inspiration from the worlds of

  • data bastes and designed tf.data as tools to divide up into

  • three. First, the tools to extract data from a wide range

  • of sources.

  • These can range from in-memory arrays to multi

  • Terabyte files across a distributed system. Then we

  • have the tools to transform your data.

  • These enable you to extract features, perform data

  • augmentation and ultimately convert your raw data into the

  • tensors you will use to train your model.

  • And finally, loading into the accelerators. That is important

  • for performance. That's the high level pitch. What's it

  • look like in real today?

  • This is the standard tf.

  • data input pipeline for example protos for example files.

  • I bet that 90% of all TensorFlow input pipelines start out this

  • way. It's so common, we have wrapped this pipeline up in a

  • single utility.

  • For petago logical reasons, it's important to start out there.

  • We get a list of files on your local

  • disk or in GCS or S3, and extract the tf records. We use

  • functional transformations to pre-process the data.

  • We took inspiration from C Sharp, Java and Scala which use

  • method chaining to build up a pipeline of data set objects and

  • a higher order of functional operators like map and filter to

  • help you customize the behavior of that pipeline.

  • And finally, the load phase, tell TensorFlow how to get data

  • out of the set. And one of the easiest ways is to create an

  • iterator.

  • Just like the name sake in Python,

  • gives you sequential access. And we will see ways to soup up

  • this part of the pipeline later on. I have given you an

  • overview, I want to come back to the key themes and tell you how

  • we have been advancing each of them. Let's start with

  • performance. So, you remember all those exciting performance

  • results that Megan told you about in the keynote? Well,

  • every one of them was measured

  • using real input data and ad tf.data input pipeline you can

  • download and use in your programs. And there's one that

  • I personally like, and it measures the performance of

  • training an infinitely fast image model

  • or real data in order to tease out any bottlenecks in the

  • pipeline.

  • When we ran this last week on an

  • NVIDIA feed, it processed over 30,000

  • ImageNet images per second. That's much faster than we can

  • train on current hardware, but it's exciting for a couple of

  • reasons. And this throughput more than doubled other the last

  • eight months. And a testament to the great job the

  • team has done in optimizing TensorFlow performance.

  • And the accelerators are getting faster all the time. And we

  • have this extremely useful

  • benchmark that guides us as we continue to approve tf.data

  • performance. How do you get that kind of performance? Well,

  • one option is you can go on

  • GitHub, the TensorFlow benchmarks

  • projects, and use it in your program. You should probably

  • just do that. But maybe you have a different problem to

  • solve. We have recently launched tf.data on TensorFlow.

  • org, and this guide is full of useful, theoretical, and

  • practical information that gives you the ability to put

  • optimizations into practice on your own pipelines. And to

  • support this on the technical side, adding a raft of new

  • features to tf.data to achieve this

  • performance.

  • One I critically want to call out is this part of TensorFlow

  • 1.8. You can start playing with it in the builds right away.

  • And up to this point, tf.data has been exclusively for code

  • that runs on the CPU. This marks our first foray into

  • running on GPUs as well, there's a lot more I can say on this

  • topic and we are developing the features. But let's go back to

  • look again at program from earlier on. Show you how to put

  • the techniques into practice.

  • First off, when dealing with a large dataset like GCS or S3,

  • you can speed things up by reading multiple files in

  • parallel by increasing the level of throughput into your model.

  • And you can turn this on with the

  • single record, the call, numb parallel -- and then you can

  • improve performance

  • by switching to fused version of various transformations.

  • And I can repeat -- between the boxes, the buffers.

  • And fused together the match and batch, the execution of the

  • function and the map and the data transfer

  • of each element into the batch. And together, these two

  • optimizations

  • get big speedups for models that consume a large volume of data.

  • And last, but not least, we have the GPU prefetch that I

  • mentioned.

  • This ensures that the next batch of input data is in memory on

  • the GPU when it's ready to begun.

  • There is a crucial part of the CNN benchmarks.

  • But achieving it had buffers from CPU to GPU.

  • And the new device API gives you the same performance and only

  • requires you

  • to add one line of code to your input

  • line to get the benefits. This is the cliffs notes version.

  • I would encourage you to watch my colleague's talk later, he's

  • going to

  • show you

  • a more specific approach. And restructure check it out.

  • Now, let's switch gears and move on to the second.

  • Originally the flexibility in tf.data stemmed from the

  • functional transformation. Allowed you to put any

  • TensorFlow graph, at any point in your pipeline. So, for

  • example, if you have existing TensorFlow

  • code for pre-processing images, you can stick it in a data flow

  • map and start using it right away. The original version of

  • tf.data traded on this and let you pass a list of tensors in

  • and get a list of tensors out from these transformations. We

  • heard back from the users, they had

  • more sophisticated things and complicated structures.

  • We had in TensorFlow, added native support, and it's useful

  • for dealing with complex categorical data and trading in

  • models.

  • So, at this point, if TensorFlow is doing everything you want to

  • do, you're all set. But one thing we have learned over last

  • few years is not everything is most naturally expressed in a

  • TensorFlow graph. We have been working to give you alternative

  • ways to build up the tf.data pipelines. The first is to add

  • data set. And this allows you to build a

  • pipeline from a Python function that

  • generates -- and you can wrap existing Python code and benefit

  • from the performance transitions like prefetch and GPUs. The

  • other way

  • might be more appealing to power users.

  • We have openedded up a backend API,

  • and you can build the plugins in C++.

  • And I've heard from some of our

  • partners, this is useful for custom and we're dogfooding this

  • approach for some of the implementations like the

  • recently added Kafka data set. I'm looking forward to what some

  • of you will build with this new API, and encourage you to

  • contribute back via pull requests. We're excited about

  • the contributions from the community at this point in the

  • project.

  • Okay, final thing I want to cover is ease of use. I want to

  • speak to folks like you who have used TensorFlow for a year or

  • more, and struggled with the data into TensorFlow, I don't

  • have to make much of a case. But the users have high

  • expectations and there are people getting their first

  • exposure to every day. We continue to push hard on

  • usability. I want to share a few highlights. First off, as

  • Rajat told you in the

  • keynote, eager execution is here, makes using

  • tf.data a lot more pleasant.

  • Alex is going to tell you more, but from my admittedly biased

  • perspective,

  • you can start treating data sets just like any object.

  • You can look over them with the regular fore loop and there's no

  • iteration required.

  • What's neat about this, it works together with tf.

  • data like GPU prefetch so you can combine the efficiency of

  • execution for your input pipeline and the flexibility of

  • Eager execution for your model code.

  • Next, return feedback.

  • Power users like the composebility and

  • figurability of the data API, many had

  • users just want an easy way to pull best practices. TensorFlow

  • 1.

  • 8 will have new protocol buffers and for

  • CSV data to make it easier to handle the formats and apply all

  • the best practices from the performance.

  • So, let's go for one last time back to the standard problem.

  • As I promised in the beginning, it can be replaced by a single

  • call.

  • And this performs all the parallel IO, shuffling,

  • batching and fetching for you. Gives

  • you back a data set you can continue to transform using map

  • filter and transformations. And if you have a workload, use a

  • binary format like tf.records.

  • But those who tend to have smaller data, they prefer

  • something simple.

  • And the CSV format fits that fine.

  • There are thousands of different CSV data sets that are

  • available to download for free. And this snippet shows you how

  • to use

  • an API, installing them to download with just a couple

  • simple commands. Once you have done that, you can use the new

  • data set function in TensorFlow to get the data out of the

  • loaded files.

  • In this case, it's a data set of a million use headlines. What I

  • particularly like about this

  • new API is it takes care of figuring out the types of common

  • names and dramatically cuts down the boilerplate you have to run.

  • Finally, we have been working to improve the integration between

  • tf.

  • data and high-level APIs like estimators and Keras. The Keras

  • support is still in the pipeline, if you'll excuse the

  • pun.

  • But if we want to switch our CSV parsing code for estimators,

  • it's a simple matter of returning the data set from an

  • estimator's input function. No more iterator required.

  • And pass the input function to the estimators train method and

  • we're good to go.

  • The road we're taking, make the data sets and the integrators as

  • natural as possible.

  • Features like Eager execution and the high level APIs are

  • making this easier. The eventual goal is to make it

  • seamless so it's a natural extension of your TensorFlow

  • program. Well, that is about all the time I have.

  • So, just to recap, I told you in the beginning that our mission

  • for tf.data was to make a library for input processing

  • that was fast, flexible, and easy to use. I hope I have

  • convinced you in the last 15 minutes we

  • have achieved these three goals. I hope you understand that tf.

  • data is the one library all input processing. If you want

  • to find out more, there is a ton of documentation about tf.data

  • on TensorFlow.org. Cover how toes and performance guidance I

  • mentioned earlier. And the benchmarks and the official

  • models and repositories, examples of high performance and

  • readable pipelines written in tf.data.

  • And with all of this information and knowing the creativity, I'm

  • really looking forward to seeing what all of you build with this

  • library. Thanks a lot for listening.

  • [ Applause ] >> Okay.

  • And now it is my great pleasure to introduce Alex Passos who is

  • going to

  • tell you all about TensorFlow Eager execution.

  • >> Hello. My name is Alex.

  • And I'm here to tell you about Eager execution, you have heard

  • the last two talks. But I'm here to tell you what it's

  • about.

  • This new imperative object-oriented way of using

  • TensorFlow. We're introducing today as part of TensorFlow

  • core. Because you're here or watching on the live stream, I

  • hope, that TensorFlow has been this, like, graph execution

  • engine for machine learning that lets you run graphs in high

  • scale and all sorts of other nice things. But has it? And

  • why did we choose to go with graphs in the first place?

  • Since now we're -- I'm going to tell you about Eager Execution.

  • We moved beyond what we can achieve with graphs, it's a good

  • idea to recap why we bothered. And like a really good reason

  • why you want to have your computation respected

  • as a platform-independent graph, once you have that, it's easy to

  • differentiate the graph.

  • I went to grad school before all of this was standard in machine

  • learning tool kits and I do not wish that on anyone.

  • Life is much better now, trust me. And if you have a

  • platform-independent abstract representation of your

  • computation, you can just go and deploy it to pretty much

  • anything you want. You can run it on the TPU, you can run it on

  • the GPU, put it on a phone or a Raspberry Pi. There are all

  • sorts of cool deployments that you are going to hear about

  • today.

  • And this is -- it's really valuable to have this kind of

  • platform-independent view. Compilers work with data and

  • graphs generally. And they know how to do nice optimizations

  • that rely on a global view of the computation.

  • Expressing the -- data laying and all things like that. And

  • these are deep-learning specific.

  • We can choose how to properly lay out your channels and height

  • and width so your convolutions are faster. And finally, like a

  • key reason that's very important to us at Google and important to

  • you as well, I hope, once you have a platform-independent

  • representation, you can deploy it and distribute it

  • across hundreds of machines or a TPU like you saw earlier. And

  • this is a seamless process. Since graphs are so good, what

  • made us

  • think it's a good idea to move beyond them and do Eager

  • Execution? A good place to start, you don't have

  • to give up automatic differentiation.

  • Like Python's autograph -- sorry,

  • autograph, that lets you shape dynamic code. You don't need to

  • have the computation to differentiate it. You can build

  • up a trace as you go and walk back the trace to compute

  • gradients.

  • Also, if you don't stop to build a platform like in this

  • computational graph, you can iterate a lot more quickly. You

  • can play with your model as you build it, inspect it, poke and

  • prod at it. And this can let you just be more productive when

  • you're like, making all these things. Also, you can run your

  • model for debuggers and profilers and add all

  • sorts of, like, analysis to them to just really understand how

  • they're doing what they're doing. And finally,

  • if we don't force you to represent your computation in a

  • separate way than the host programming language you're

  • using,

  • you can just use machinery of your host programming language

  • to do control flow

  • and complicated data structures which for some models is key to

  • making the model work at all. So, I hope you're not wondering

  • how do I get to use this?

  • The way to use this is super-easy.

  • Import TensorFlow and have Eager Execution.

  • And what happens is any time you run a TensorFlow application,

  • instead of TensorFlow building a graph that later when executed

  • is going to run that

  • matrix multiplication, we run that for you and give you the

  • result. You can print it, you can slice it, dice it, do

  • whatever you want with it. And because things are happening

  • immediately, you can have highly dynamic control flow that

  • depends on the actual values of the computation you're

  • executing. And here is just a simple conditions line search

  • example that I wrote, and it doesn't

  • matter.

  • It just is loops that have complicated values based on the

  • computation. And this runs just fine on whatever device you

  • have.

  • And together with this enable Eager Execution theme, we're

  • bringing you a few symbols in TensorFlow that make it easier

  • for you to write code both

  • building graphs and executing Eagerly. We're bringing in a

  • new way of doing gradients.

  • And you're familiar with how you do gradients in normal

  • TensorFlow. Great the variable and the loss function.

  • I hope you can think of a better loss

  • function than this one, and you call gradients to differentiate

  • it.

  • But when you have eager execution, we try to be as

  • efficient as you can.

  • And if you're going to differentiate, you need to keep

  • track of the memory of information of what's happening

  • so far. Like your activation. But I don't want you to pay for

  • the cost of this tracking when you're not computing gradients.

  • Performance, the whole reason we're doing this, we want to use

  • these big, nice pieces hardware to train models

  • super-fast. When you want to compute gradients,

  • you use this context manager and records all the operations you

  • execute so we can play it back. Otherwise, the API is the same.

  • Also, training loops in Eager, as Derek pointed out, is much --

  • it's very easy and straightforward. You can just

  • use a 5.4 loop to iterate over your data sets. And data sets

  • work in Eager just fine and work with the same high performance

  • you get in the graph execution engine. Then you can do your

  • predictions, supply your gradients and do other things

  • you're used to doing. But really, the interesting thing

  • about Eager Execution is not just when you're writing the

  • code that it's finished, that it's done that we already know

  • works, but you're still developing. You want to do

  • things like debug. So, when Eager Execution is enabled, you

  • can just take any model code and I

  • use my simple example here.

  • Add notes to, like, to anywhere you want, and once you're in

  • the debugger, you have the full power of debugging available.

  • You can print the value of any tensor, change the value of any

  • tensor, run any operation you want on any tensor.

  • And this will hopefully empower you to really understand what's

  • going on in your models. And you'll be able to fix any

  • problems you have.

  • You can also take Eager Execution code and profile it

  • using whatever profiling tool you are most familiar and

  • comfortable with.

  • So, here, I have a little dump model

  • that just does an app. And let's pretend I don't know which

  • is going to be the lower, this one is more expensive.

  • But you can run the code for the

  • Payton profiler and find out the matmul

  • is 15 times more expensive. Also, by the way, those

  • examples are

  • run on the Google collaborate thing,

  • which is a completely public shared for notebooks hosted on

  • Google prod. And I think we have a demo on Eager that's

  • hosted on that you can play out with later. If you're on live

  • stream, you can play with it now if you can find the link. But

  • together with Eager, we're

  • bringing a lot of new APIs to make it

  • easier to make graphs and execute models. They are

  • compatible with Eager Execution and graph modeling. A low

  • priority feature request is how to customize gradients in

  • TensorFlow. And I'm sure you're familiar with

  • the tricks, stop gradients and functions. But we're

  • introducing a new API that works in both eager and graph

  • execution. What I like about this example is it's a thing

  • being asked by many, many people how to do it.

  • If I want to have my forward pass and the backward pass, take

  • the gradient from a particular TensorFlow and clip it.

  • Keep it small to prevent it from exploding.

  • It just takes six lines of code to clip the gradient. And I

  • think this is cool. I look forward to seeing what you can

  • do with this when you're doing more than

  • six lines of code and solving all new and interesting research

  • problems. A big, big change when programming with Eager from

  • graph that I really want you to stop and think about is we're

  • trying to make everything as Pythonic and object-oriented as

  • possible.

  • So, variables in had TensorFlow are usually a complicated thing

  • to think

  • about, but when execution is enabled, simpler.

  • It's just a Python object. You can change the value, read the

  • value. When the last reference to it

  • goes away, you get the memory back. Even if it's the GPU

  • memory. So, if you want to share variables, you just reuse

  • those objects. You don't worry about variable scopes or any

  • other complicated structure. And because we have this, like,

  • object-oriented approach to variables,

  • you can look at the APUs in TensorFlow

  • and rethink them in a way that's object-oriented and easier to

  • use. And one is the overhaul with the metrics API.

  • So, we're introducing this new tfe.

  • metrics, one has an updoubt of value, and one gives the result.

  • And hopefully this is an API that everyone is going to find

  • familiar. Please don't try to compare this to the other

  • metrics API.

  • We're giving you a way to do object-oriented saving of

  • TensorFlow models. If you tried looking at TensorFlow check

  • points, you know they depend on variable names. And variable

  • names depend not just on the game show variable, but all

  • other variables present in the graph.

  • This can make it hard to save and load subsets of the model

  • and really control what's in the check point. We're introducing

  • a completely

  • object-oriented, python-object based

  • saving API where you -- it's like Python -- any model gets

  • saved, you can save any subset of your model. You can load any

  • subset of your model. You can even use this tfe.checkpoint

  • object to build things you want to save that have more than a

  • model.

  • Here we have an optimizer and a global stack. You can put

  • whatever you want in there. The object graph is something you

  • can save and load. You can save and load your discriminators and

  • generators separate. And take the discriminator and load it

  • back up as another network that you can use on another part of

  • the model. This should give you a lot more control to get a lot

  • more out of, like, TensorFlow checkpoint. But if you have a

  • question that everybody asks me when I tell them to work with

  • the Eager Execution, is it fast? Graphs have this high

  • performance.

  • How fast can I make this run Python code all the time? We

  • can make it fast enough. For models that are highly

  • computationally intensive, you don't see

  • any Python overhead and we are fast with the TensorFlow.

  • Sometimes slightly faster, and reasons that I don't fully

  • understand.

  • Even for highly dynamic models, you

  • have comparative performance with anything else you can find.

  • And please don't get attached to these numbers.

  • We have many more benchmarks and we're optimizing Eager

  • performance aggressively. But I hope you know if your model can

  • keep it busy, you're doing large matrix computations, there's no

  • cost in experimenting and doing your research and model building

  • with Eager Execution turned on. When you're doing smaller

  • things, there are overheads. Don't get attached to them.

  • We're being aggressive about optimizing this.

  • If you run an identity, it takes a microsecond. If you run it

  • with Eager Execution turned on, there's an extramicrosecond.

  • If you're tracing gradients, another three microseconds.

  • But just enqueuing something on the GPU screen, that takes a

  • single digit microsecond.

  • So, if you can execute enough computation to keep a GPU busy,

  • you're unlikely to see anything bad from using Eager Execution.

  • And, again, these numbers are improving very quickly. Please

  • don't get too attached to them.

  • But there is this large ecosystem of

  • TensorFlow code libraries, models, frameworks, check

  • points, that I don't think anyone wants to give up. And I

  • don't want you it give up if you want to used Eager Execution.

  • So, we're also thinking really hard

  • about how can you -- how you can interoperate between Eager and

  • graph. One way is to call into graphs from Eager code.

  • And you can do that with tfe. make template. We build a graph

  • for that little Python function and you can use it and

  • manipulate and call the graph from Eager Execution. We also

  • have the reverse, which is how to call into Eager from a graph.

  • Let's say you have a big graph and you

  • understand everything in it, but there's a little chunk of your

  • computation that you really don't know how to express in --

  • either don't know, or you don't want to

  • bother expressing it in using liar TensorFlow graphs.

  • So, you can wrap it in a tfe graph,

  • and you can run any TensorFlow in there,

  • including convolutions and other things not available.

  • And you can look at the values and use dynamic control. I hope

  • with these two things together,

  • you can reuse Eager and graph code across.

  • But the easiest way to get Eager and

  • graph compatibility is to write model code that can go both

  • ways.

  • Once the code is written and debugged

  • and tested, there's nothing to tell you to build a graph or

  • execute Eagerly. Debug in Eager and impart that same code into

  • graph.

  • Put it in estimator, deploy it on the GPU, distribute it. Do

  • whatever you want. This is what we've done in the example

  • models. And there's going to be a link in the end of the of

  • presentation so you don't need to worry about writing this

  • down.

  • So, here is some practical advice for you.

  • Write code that's going to work well when executing Eagerly and

  • building graphs. To do that, use the Keras layers, they're

  • object-oriented, easy to understand, manipulate and play

  • around with.

  • Use the Keras model, that will give you saving and loading and

  • training and all sorts of things automatically if you want. But

  • you're not forced to use those.

  • Use config summary, they will move to the TensorFlow package

  • soon. If you're watching this on video, probably already

  • happened.

  • Use the tfe metrics instead of the tf.metrics, these are

  • object-oriented and friendier and eager to use. And use the

  • object-based saving.

  • Which is a much nicer user experience anyway.

  • So, you're going to want to do this all the time, it's how your

  • code is going to work well in Eager execution and graph

  • building. So, now, I would like to take some time to tell you

  • why you

  • should enable Eager execution, and like

  • a real good importance reason that led

  • us to build this in the first place, being able to play these

  • objects and manipulate them directly is just a much nicer

  • experience than having to build a graph and interact later in

  • the session. It's a lot more intuitive, let's you understand

  • what's going on better.

  • If you're

  • new, play around

  • >> Now I would like to point to a few things. Some of my

  • colleagues, they're going to be in the demo room during the

  • break

  • with laptops, with notebooks to let you type and try Eager mode

  • there. Please go and give it a try. Or if you're watching on

  • the live stream, type that short link.

  • Hopefully it will stay long enough for you to type it. And

  • play with it right now. It's really nice. We have a getting

  • started guide on TensorFlow that should be live now. That tells

  • you what you need to know about Eager execution and starting to

  • use TensorFlow using Eager Execution. We have a ton of

  • example models like

  • from RNN to net to all purposing that are available behind that

  • link, and I

  • encourage you to look at them and how easy to write the model

  • and how easy it

  • is to reuse the same code from graphs to deployment. We have

  • deployment for graph from all models except for the highly

  • dynamic ones which are hard to write in a graph form. Give it

  • a try. Let us know how it went. We're super-excited to share

  • with you and I hope you have a good time playing with this.

  • Thank you.

  • And now it's time to have a treat. Introducing Nick hill

  • and Daniel.

  • They have a cool demo set up, but I don't to spoil it.

  • >> Hi, everyone, my name is Daniel.

  • >> My anymore is Nikhil. >> We're from the Google brain

  • team. And today, we're delighted to talk about

  • JavaScript. Python has been one of the mainstream languages for

  • scientific computing. And it's been like that for a while.

  • And there's a lot of tools and libraries around Python.

  • But that's where it ends.

  • We're here today to talk -- to convince you that JavaScript and

  • the

  • browser have a lot to offer.

  • And TensorFlow Playground is a great example of that.

  • I'm curious, how many people have seen TensorFlow Playground

  • before? Oh, wow. Quite a few. I'm very glad.

  • Those of you who haven't seen it, check it out after the talk

  • at playground.tensorFlow Lite.org.

  • It's a visualization of a small neural network.

  • And it shows in real-I'm the neural network as it's training.

  • And this is a lot of fun to make and had a huge educational

  • success. We have been getting emails from high schools and

  • universities that have been using this to teach students

  • about machine learning. After we launched playground, we

  • were wondering, why was it so successful? And we think one

  • big reason was because it was in the browser. And the browser is

  • this unique platform where you -- the things you build, you can

  • share with anyone with just the link. And those people that

  • open your app

  • don't have to install any drivers or any software. It

  • just works. Another thing is, it's -- the browser is highly

  • interactive. And so, the user is going to be engaged with

  • whatever you're building.

  • Another big thing is that browsers -- we didn't take

  • advantage of this in the

  • Playground, but browsers have access to sensors like the

  • microphone and the camera and the accelerometer. And all

  • these are behind standardized APIs that work on all browsers.

  • And the last and most important thing, is the data that comes

  • from these sensors doesn't ever have to leave the client. You

  • don't have to upload anything to

  • the server, which preserves privacy.

  • Now, the Playground that we

  • built is powered by a small neural

  • network, 300 lines of vanilla JavaScript that we built as a

  • one-off library. It doesn't scale. It's a simple loop and

  • wasn't engineered to be reusable.

  • But it was clear to us that if we were

  • to open the door for people to merge machine learning and the

  • browser, we had to build a library. And we did it.

  • We released Deep Learn JS.

  • A JavaScript library that is

  • GPU-accelerated and does that via WebGL, standards in the

  • browser, and allows it to render graphics.

  • And deeplearn .

  • js allows it to both run inference and training in the

  • browser. When we released it, we had an incredible momentum.

  • The community took deep deeplearn.

  • js and forwarded it the into browser and built fun things

  • with it. One example is the file transfer. Another had the

  • character RNN and built a novel interface that allows you to

  • explore all the different possible endings of a sentence.

  • All generated by the model in real-time. Another

  • example is the model -- this was a post about this one, that the

  • person that built it allowed users to explore the

  • hidden dimensions, the interesting dimensions in the

  • embedding space.

  • And you can see how they relate to boldness of the font. And

  • there was even education examples

  • like teachable machines that built this

  • fun little game that taught people how

  • computer vision models work so people could interact directly

  • with the webcam. Now, all the examples I showed you point to

  • the incredible momentum we have with deeplearn.js. And

  • building on that momentum, we're

  • very excited today to announce that deeplearn.

  • js is joining the TensorFlow family. And with that, we are

  • releasing a new

  • ecosystem of libraries and tools and

  • machine learning with JavaScript calls TensorFlow.js. Now,

  • before we get into the details, I

  • want to go over three main use cases of how you can use

  • TensorFlow.js today.

  • The first use case is write models directly in the browser.

  • Huge implications. Think of the playground they

  • just showed. The second use case is -- a major use case --

  • is you can take a pre-trained model in Python, use a script,

  • and you

  • can import it into the browser to inference.

  • And a related use case is the same

  • model that you take during prep, you can re-train it,

  • potentially with private data that comes from those censors of

  • the browser -- in the browser itself.

  • Now, to give a schematic view, we have

  • the bruiser that uses WebGL to do fast algebra.

  • And two sets of APIs, the ops API, which was deeplearn.js, and

  • we worked hard to align with TensorFlow Python.

  • It is powered by an automatic differentiation library. And on

  • top of that, we have a high-level API work layers API

  • that allows you to use best practices and high-level

  • building blocks to write models.

  • But I'm also very excited today to announce is that we're

  • releasing tools

  • that can take an existing Keras model,

  • or TensorFlow savedmodel and forward it automatically

  • for execution in the browser.

  • Now, to show you an example of our

  • API, we're going to go over a small

  • program that tries to learn a quadratic function.

  • They're trying to learn A, B, and C from data.

  • So, we have our import tf from TensorFlow JS.

  • This is a standard ES in JavaScript.

  • We have the three, A, B, C, we mark them as variable.

  • Which means they are viewable and the optimizer can change

  • them.

  • We have the function that does the computation. You can see

  • tf.add and tf.square like TensorFlow. Notion that API,

  • we have a chaining API which allows you to pull these math

  • operations on tensors themselves, and this leads to

  • better readable code that

  • is closer to how we write it.

  • It is very popular in JavaScript world. That's that part of the

  • model.

  • Now, for the training part, we need a loss function. This is a

  • loss function, an error between the prediction and the label.

  • We have our optimizer, the edge of the optimizer.

  • And we train the model, optimizer.minimize for some

  • model. And I want to emphasize for those who

  • have used

  • tf before, Alex's talk, it's aligned with the Eager API in

  • Python. All right.

  • So, clearly, that's not how most people write machine learning.

  • Because those low-level algebras can be quite verbose. For that,

  • we have our layered API.

  • So show you an example of that, we're going to build a recounter

  • neural network that learns the numbers.

  • But the complicated part is that those

  • numbers, like the number 90 plus 10, are being set character by

  • character. And then the neural network has to

  • maintain an internal space with an LSTM cell.

  • And that gets passed into a decoder.

  • And the decoder has 100, character-by-character. It's a

  • sequence-by-sequence model.

  • This may sound complicated, but the layered APU is not that much

  • code.

  • We have the import in TensorFlow.js.

  • We have the sequential model. Those familiar with Keras, this

  • API looks very familiar. We have the first two layers of the

  • encoder, the last three layers are the decoder. And that's our

  • model.

  • We then compile it with a loss, an optimizer, and a metric we

  • want to monitor, like accuracy.

  • And we call model.fit with our data. What I want to point out

  • here is the "Await" keyword.

  • This is an asynchronous call which means -- because in

  • practice, that can take 30-40 seconds in the browser. And in

  • those 30-40 seconds, you don't want the main UI thread of the

  • browser to be locked. And this is why you get a call back with

  • a history object after that's done. And in between the GPU is

  • going to do the work.

  • Now, the code I showed you is when you are trying to write

  • models directly -- when you want to write model the directly in

  • the browser.

  • But, as I said before, a major use

  • case -- even with deeplearn.

  • js, were people importing models that were pre-trained and they

  • wanted to do it in the browser. Before the details of that, I

  • want to

  • show you a fun little game that our

  • friends built that takes advantage of an

  • automatically pre-trained model and imports into the becauser.

  • It's called emoji scavenger hunt.

  • I'm going to show you a real demo with the phone. It's in

  • the browser. Let's see. And you can see here. So, you can

  • see I have a Chrome

  • browser opened up on a Pixel phone. You can see it at the

  • top.

  • And the game uses the webcam and shows

  • me

  • an emoji and I have some number of seconds to find the emoji

  • before the time runs out.

  • Nikhil is going to help me identify the objects. Are you

  • ready? >> I'm ready.

  • >> All right. Let's go. All right. Watch.

  • >> Have a watch. >> Nice. Yay! We got that.

  • Let's see what our next item is.

  • Shoe. >> Shoe.

  • >> Help me out here, buddy. We got the shoe!

  • >> What's next? >> That's a banana.

  • >> Does anyone -- this guy's got a banana.

  • >> Come over here. Yay! >> All right.

  • >> All right. >> I'm ready.

  • >> We're going to have a high score here. Beer.

  • >> Beer. It's 10:30 in the morning, Daniel. Step out --

  • >> All right. All right. So, I'm going to jump into some of

  • the technical details of how we actually built that game. Stand

  • by, please.

  • So, what we did was we trained a model

  • in TensorFlow to be an object recognizer for the game. We

  • chose about 400 different classes that would be reasonable

  • for a game like this.

  • You know, watches and baa bananas and beer.

  • We used the TensorFlow for poets code lab.

  • And in that code lab, you take a pre-trained mobile net model.

  • If you don't know what MobileNet is, it's a state of the art

  • computer model for edge devices.

  • We took that model and retrained it.

  • Now we have an object detector in the pipeline. How to get

  • this into the browser? We provided this tool today to help

  • you do that.

  • Once it's in, you get the same and make the computer talk and

  • all that kind of fun stuff. Let's jump into how but convert

  • that model. As Daniel mentioned earlier, we support two types of

  • models. TensorFlow saved models, we have

  • a converter for that, and a converter for Keras saved model.

  • You define the model and define it with the saved model. The

  • standard way to do that.

  • Similarly, this is the code for Keras. The next piece is that

  • we actually convert it to the web.

  • Today, we're releasing a package, TensorFlow.

  • js, a script lets you point to the TensorFlow save model and

  • lets you point to the output director.

  • That's where the static built art facts go. Keras is the same

  • flow.

  • Point to the output and you have a directory.

  • Now, you statically host those on the website, simple static

  • hosting. And on the JavaScript side, we provide an API that

  • lets you load that model. So, this is what it looks like for

  • TensorFlow. And the TensorFlow save model, we noticed that it

  • was a model, we don't right now support continuing training of

  • this model. While in the Keras case we actually let you

  • continue training.

  • And we're working hard to let you keep these APIs alive in the

  • future. Under the cover, what are we actually

  • doing? Graph optimization.

  • Which essentially means we prune out nodes you don't need to make

  • the prediction. You don't need them.

  • We optimize waits for browser caching.

  • We park in 4 megabytes, helps the browser be quick the next

  • time you load.

  • Today we support about 90 of the most commonly used TensorFlow

  • ops, and we're working hard to control more flow ops.

  • And we support 32 of the most commonly used Keras layers

  • today. And as I mentioned, we let you continue training for

  • Keras models and let you do evaluation as well as make

  • predictions. Okay.

  • So, obviously there's a lot you can do with just porting your

  • models to the web.

  • Since the beginning of deeplearn.js, we have made it a

  • high priority to train directly in the browser. This opens up

  • the door for education and interactive tools like we saw

  • with the playground.

  • As well as lets you train with data that never leaves your

  • client. This is huge.

  • To show off what you can do with something like this, we built

  • another little game game.

  • The goal of the game is to play Pac-Man.

  • Daniel is much, much better at this game than I am. Say hi.

  • There's three phases of the game. Phase one, we're going to

  • collect frames from the webcam and associate those with up,

  • down, left and right. Daniel's going to move his head up, down,

  • left and right and he's going to play the game like that. And

  • you'll notice, as he's collecting frames, he's kind of

  • moving around a little bit, which helps the model see

  • different angles for that class and generalize a little bit

  • better. So, after he's done collecting these frames, we're

  • going to go and train our model. So, we're not actually training

  • from scratch here when we hit that "Train" button.

  • We're taking a pre-trained mobile net,

  • porting that to the web, and doing a re-training phase.

  • And using a layered APU to do this that the browser. You want

  • to hit the train button. It's going down, looks like we're

  • learning something. That's great. So,

  • as soon as we press that play button, what's going to lap is

  • we're going to make predictions from the webcam. Those are

  • going to get plugged into those controls and it's going to

  • control the Pac-Man game. Ready? All right.

  • So, you can see in the bottom right, he's highlighting the

  • class that is could inned. And he moves his head around, you'll

  • see a change by class. And he's off.

  • So -- so, all of this code is online and you can go fork it.

  • We invite you to do so. Obviously this is just a game.

  • But you can imagine, you know, other types of applications of

  • this, like make a browser extension that lets you control

  • a page for accessibility purposes. So, again, all this

  • code is online. Please go fork if and play and make something

  • else with it. Okay. Daniel, I know this is fun.

  • >> I got it.

  • >> Okay. So, let's talk a little bit about performance.

  • So, what we're looking at here is a benchmark of MobileNet 1.0

  • running with TensorFlow. TensorFlow classic, not with

  • TensorFlow.js. And I want to point out, this is a batch size

  • of one. This is important because we're

  • thinking about this in the context of an interactive

  • application. Maybe this Pac-Man game, feeding in webcam data,

  • you want to know the prediction time for one. Can't really

  • batch it.

  • In the top row, TV Cuda running on a 1080 GT X, it's about 3

  • milliseconds. And the shorter the bar, the faster.

  • The second row we have TensorFlow CPU

  • running with AVX512 on a Macbook pro here. And 60 seconds for

  • that.

  • Where does TensorFlow.js come into the picture? We're getting

  • about 11 milliseconds for this. Which is pretty good if you

  • think about this in the context of an interactive game. So, on

  • the laptop we just showed the game, we're getting about 100

  • milliseconds for that. And that's still pretty good. Like,

  • you can build a whole interactive game with what's

  • running on there. The web is only going to get faster and

  • faster. There's a whole new set of standards

  • coming,

  • like web GPU to push the boundaries. But the browser has

  • limitations. You can only get access to the GPU

  • through WebGL on these APIs. How do we scale beyond that?

  • How do we scale beyond the limitations we have in the

  • browser?

  • There's a whole ecosystem of server-side JavaScript tools

  • using NodeJS that we would love to take advantage of.

  • So, today I'm really happy to tell you

  • that we're working on NodeJS binding to the TensorFlow C API.

  • That means you'll be able to write the

  • same with the Eager mode, the poll

  • polynomial, and the Pac-Man example, and

  • bind to TensorFlow C and have your TensorFlow running with

  • CUDA installed.

  • Eventual we will run the TensorFlow opt backend, that

  • same JS code. These bindings are under active development.

  • Stay tuned. All right. So, let's recap

  • some of the things we launched today and talked about.

  • This low-level ops API which does

  • hard level linear algebra and the Eager mode.

  • This is previously known as deeplearn.js, rebranding today.

  • We released these high-level layers API. That mirrors

  • TensorFlow layers.

  • And we saw that with the addition of RNN and the Pac-Man

  • demo.

  • We also showed you how you can import

  • TensorFlow saved Model and Keras for re-training in the browser.

  • We have released a bunch of demos and examples on GitHub.

  • These are not the only two. There is a whole repository that

  • can get you started.

  • They have live links and you can poke around and play. I invite

  • you to do that. We really want to see you get involved in this

  • project. We have a bunch of links here.

  • JS.tensorFlow.org is the website.

  • There's tutorials, documentation, et cetera.

  • Our code is open source under TensorFlow.js, play there too.

  • And we started a community mailing list today. That's the

  • short link here. And the community

  • mailing list is for people to post demos and ask questions.

  • So, this project was not just Daniel and myself. This is a

  • larger team effort between many of our amazing colleagues at

  • Google. We want to thank them. And we want to thank all of the

  • amazing open source contributor for deeplearn.js. And we're

  • really excited to build the

  • next chapter of machine learning in JavaScript with you. Thank

  • you. >> Thank you.

  • >> We can

  • >> Okay. We are now going to take a break. We have about a

  • little under half an hour. We are going to be back here at

  • 11:30 for a great talk on performance.

  • Head on over to the auditorium.

  • We have food, demos, we have the speakers there to ask questions.

  • Have fun.

  • Test. Test.

  • >> So, we've got a few more seating down over here. We can

  • help you out on that end. We have a few seats left over here.

  • >> Hi. >> A few seats over there.

  • >> I would like to get started again.

  • So, hi, my name is Brennan. I'm talking about training

  • performance today.

  • Now, this talk is centered around a user's guide to how to

  • improve performance.

  • So, there's -- performance is very complicated, a lot of

  • internals to TensorFlow as to things we're going to optimize

  • your training time. But I'm going to talk today about how

  • you can make the most of the TensorFlow you know and love to

  • converge faster. Now, before I go further, I want to

  • take a moment to acknowledge the great

  • work that's done not just by our engineering TensorFlow team and

  • not just by other teams at Google, but by our partner

  • teams, for example, the partner

  • team at NVIDIA, doing a lot of work to make TensorFlow work

  • fast and I want to acknowledge that. With that, dig into

  • motivation. Why do we need performance and how to improve

  • performance. Is it just fine today?

  • Some folks at Baidu put together research. If you want to

  • improve the quality of your models, just

  • train on larger data sets. These beautiful straight lines

  • are showing for multiple different models,

  • as you give more and more training data, you get linearly

  • more accurate.

  • Now, I'm being slightly facetious

  • here, if you look closely, the axis on

  • the graph are log rhythmic, not linear.

  • We don't need linearly increasing

  • amounts of data, we need exponentially more data. This

  • holds not just for model classes

  • on -- in this case, it was sequence-to-sequence-type work,

  • but they found this applied to images and translation. To

  • multiple different areas across multiple model tops.

  • We're going to need to train on exponentially more data to

  • improve our model quality. Unfortunately, we have quite the

  • obstinate adversary, physics. Here's a graph of

  • microprocessor trend data over 40 years. And we can see that

  • clock frequency has hit a wall. And are performance is not

  • getting that much faster compared to how it used to. We

  • are going to have to work a lot harder

  • to meet the challenges of today and tomorrow with performance.

  • Silicon itself is not going to get us

  • there without a little bit of clever cleverness.

  • The result of these two forces coming

  • together has resulted in a Cambrian explosion of hardware.

  • We have TPUs and other exciting things coming.

  • There's startups like Nirvana is part of Intel, and the IPU from

  • Graphcore. Taking points in the design space and trying

  • different hardware methodologies

  • and layouts to get to get the best machine learning

  • performance. This is going to be a very exciting, exciting

  • area coming forward as we think about performance in the future.

  • Now, before I dig into the meat of my

  • talk, I do want to give a -- I do want to acknowledge, this is

  • the Jurassic

  • period and not of the pre- Cambrian era.

  • You have a picture of tetal bytes that's not licensed, send

  • it my way. Across models and paradigms, it looks roughly as

  • follows. You have the training data. You need to load that if

  • in. This is phase one, read from

  • neither disk or generate from a reinforcement learning

  • environment.

  • Decompress it, parse it, image model,

  • image augmentations, random flips, color distortions. And

  • compute the forward pass, the

  • loss, the backwards pass, your gradients.

  • And after the gradients, update the things you're trying to

  • learn and repeat the cycle again.

  • Now, again, there's a wide variety of accelerators.

  • The phase one happens mostly on CPUs.

  • The accelerator takes over phase two and phase three. With that,

  • let's dig in.

  • In my experience, as people migrate to modern accelerators,

  • new TPUs, new

  • generation of GPUs, et cetera, phase one is actually where the

  • most performance problems are. Things everyone hits. There's

  • a problem with phase one. We're going to spend a bit of time

  • digging into input pipelines.

  • Now, you heard earlier from Derek about tf.data. This is

  • the far and away recommended

  • API and way to load data into TensorFlow. And here is if

  • you're doing simple image model, for

  • example, ResNet50, it will start like this.

  • Images batched together into tf .record files. You can shuffle

  • and repeat. The parser function you map across every input

  • image, that will do things

  • like parse the examples, jpg decode, and

  • you batch it up and return the dataset.

  • Now, if you run this on a fancy-pants

  • CloudTPU, a modern accelerator, only 150 images a second. This

  • is nowhere near what you should expect.

  • Now, before you think, wow, cloud TPUs must be garbage, I'm

  • going back to what I was doing before, it behooves you to try

  • to optimize performance. When you're optimizing performance,

  • it's important to follow a methodology. You have to

  • measure your performance, find your bottleneck, optimize your

  • bottleneck and repeat.

  • What does that look like with a cloud TPU.

  • There's tools in TensorFlow for GPUs and TPUs.

  • But you have capture TPU profile.

  • And you can run this to -- pointing it to your TPU. In

  • this case, the

  • TPU's name is Seta. And capture into a profile log directly.

  • The same as with TensorBoard. With that, I would like to

  • switch to

  • the laptop where you can see what the profileing tools look

  • like.

  • Here it a trace from actually the very same input pipeline, or

  • similar to the input pipeline I just showed. And you can see

  • here that your step time graph, you have this tiny bit of orange

  • at the bottom. And this is the compute time on the cloud TPU.

  • And everything up here, it had blue, that's actually waiting

  • for data. That's input pipeline processing. So, our TPU is

  • sitting idle and this

  • is telling you 92% of the time. Totally, totally not what we

  • want to be doing. So, let's dig in.

  • Now, what I recommend using -- we have a bunch of tools and

  • they're constantly improving and getting better. To really

  • understand what's going on underneath the hood, we're going

  • to use the trace viewer.

  • So, here I've loaded it up. One thing I should note, the trace

  • viewer is designed for power users. It

  • may be a little bit unapproachable. Let me walk you

  • through where to look in the trace viewer. In the top, you

  • have the TPU.

  • Now, one cloud TPU has eight compute cores. And these

  • operate independently, although typically they're operating on

  • the same sorts of data, just in parallel. So, these are on the

  • top you can see your step number. You can see the

  • TensorFlow ops that it's executing, and finally with the

  • XLA ops.

  • So, TPUs are programmed by XLA and you can see what's going on

  • underneath the hood. Below that, the CPU compute threads.

  • These are the general TensorFlow thread pool threads. The

  • iterator thread. And finally, a set of threads for in-feed and

  • out-feed.

  • These are managing DMAs to and from the TPU device. Now, it's

  • a little hard to see at the top, so, we're going to need to zoom

  • in. We're going to need to look in a little bit more depth.

  • In order to navigate around within the trace, the keyboard

  • shortcuts are the most useful. Keyboard shortcuts

  • are from the left hand. A and D move left and right. W and S

  • move in and out. So, this is a little bit like the

  • arrow on like a key pad, just like with your left hand on the

  • home row. There's a couple other keyboard shortcuts. So,

  • for example, if you click on something, you can see the

  • details about what this is. And if you press F, you'll focus in

  • on just that sort of element of the timeline.

  • So, you can zoom in and navigate around really easily. If you

  • want to see a little bit more about it, you can press M. This

  • marks it on the UI.

  • You can see that our step time, our training step took 5.4

  • seconds.

  • We go to the infeed queue, that was also 5.4 seconds.

  • So, let's dig into what's going on the

  • CPU since the TPU is sitting idle waiting for data. There's

  • a lot of things going on. We need to zoom in a lot farther.

  • Not just at the second range, but down to the millisecond

  • range. Here we can see each of those vertical bars, that's 5

  • milliseconds, okay?

  • And if we zoom in this far, we can see that our iterator is

  • running continuously, and the map function is what's taking

  • the longest amount of time. Okay? The map function,

  • there's a bunch of other little ops that are happening here that

  • are your batching or your repeating or whatnot.

  • But the map function is the bulk of the time. That's are the

  • focus of the optimization efforts. And the map function

  • runs the elements

  • of the map on your normal, standard TensorFlow thread pool.

  • And look closely, we can zoom in further, no two ops running at

  • the same time. We're using multiple threads in the thread

  • pool, this is processing single threaded. That leads to the

  • first optimization.

  • I'm going to switch back to the slides. This is what you need

  • to do to use multiple threads for your input pipeline for your

  • map function.

  • Hit numb parallel calls to 64 and you'll be using up to 64

  • threads, and

  • because cloud TPUs are hooked up to a

  • powerful machine, you can use all of these threads

  • concurrently.

  • If you do this and rerun your model, you have a 4X

  • improvement, over 600 images a second. That's pretty great.

  • But, we're not done. An important part of the

  • performance methodology is step three. Repeat. You have to

  • repeat again. So, we take a new trace. I'm not going to do it

  • live on the laptop. Because we want to go through. We have a

  • lot of stuff to cover. We now see right here. We have a lot

  • more compute threads

  • going on, but we're still very much input-bounder. And if we

  • zoom in a lot, you can actually see that the bottom element

  • here, this tf.

  • record, waiting for data to load from the file system. We

  • process things in parallel quickly

  • and take a while to transfer them to the device over PCIE.

  • This presents a pipelining opportunity.

  • Now, to give you a bit of intuition for what I mean, input

  • pipelines you should mentally associate with ETL. Extract is

  • the first phase where you load the data from storage.

  • Transform phases to prepare for training.

  • And finally load into the accelerator.

  • Not just in the API, but a useful mental model for

  • performance. To give you a bit of intuition for that, each of

  • the different phases of ETL use different hardware components in

  • your server system.

  • The extract phase is emphasizing the disk in the

  • storage system or the network link if you're from a remote

  • storage system.

  • Transform typically happens on CPU and it's CPU-hungry. And

  • your load phase is emphasizing the DMA, your connections to

  • your accelerator.

  • This is true with a GPU, TPU or any other accelerators you might

  • be using. And so what's going on is if you map this out over

  • time, you're extracting and while you're extracting, you're

  • doing nothing with the CPU. And during a transform phase,

  • doing nothing with the connection between the CPU

  • memory and the accelerator. And training, the entire CPU and the

  • rest of the machine is sitting idle.

  • This is incredibly wasteful.

  • Because they're all using different components in the

  • system, you can actually overlap all of this in a

  • technique called software pipelining.

  • So, you are extracting for step five, transforming for step

  • four, loading data for step three and training for step two.

  • This is an efficient use of your compute resources. You can

  • train faster. You'll notice in a well-pipelined

  • model, your accelerator will be 100% utilized.

  • But it's possible that your CPU or disk will be a little bit

  • idle. That's okay.

  • Your accelerator is typically your most protecter resource.

  • That's the bottleneck.

  • It's okay if the others are slightly faster than you need

  • them to be.

  • How do you enable software pipelining with datasets?

  • Set numb Juried parallel Juried reads to 32.

  • And it's using party interleaf.

  • A key dataset transformation that anaI believes pipelining.

  • And you can set the prefetches right at the end.

  • And that means everything is pipelined with

  • everything below. Your extraction is pipelined from

  • your loading into your accelerator.

  • Now, one thing I want to mention, when

  • you net numb numb_parallel_reads is

  • equal to 32, you're using parallel_reads. We have

  • conflated these in the API because we believe that

  • distributed storage is critical going forward for machine

  • learning workloads. Why is that? As we have the research,

  • we see that datasets are going to become larger and larger over

  • time. So, you're really going to need -- they just won't fit

  • on a single machine. You need to distribute them across a

  • cluster.

  • Additionally, when you have data disaggregated from your

  • accelerator

  • nodes, it means you can more efficiently share your

  • accelerator nodes. If you're training on them today or

  • tomorrow and in three minutes someone else wants to train,

  • you're not copying datasets around. It's easier to use.

  • And finally, it makes it a lot nicer doing large-scale

  • hyperparameter searches. You have one

  • cluster and a fungible pool of resources. We believe that

  • districted resources are important and we have worked

  • hard to make that fast with tf. data. So, what happens when you

  • do this?

  • It turns out, the cloud TPU, you'll get over 1700 images a

  • second. With these optimizations.

  • So, we're now about 12 times faster than our initial input

  • pipeline with less than 60 characters worth of typing. So,

  • that's pretty good. But we can do better. If you capture the

  • trace, you'll actually see that our transform step is slightly

  • longer than our accelerator trading model time. The TPU is

  • just too fast. And we need to break out some advanced

  • optimization techniques that are available today. One of the

  • most powerful ones is to

  • use these fuse dataset operators, map

  • and batch, shuffle and repeat, fusing together the operations

  • to improve performance on your CPU. Tf.data works hard to

  • ensure that the elements produced out of your dataset by

  • your iterator are in a deterministic

  • order. But if you give tf.data the opportunity to reorder, we

  • can enable performance optimizations. We can use

  • sloppy interleave

  • underneath the hood, working around variability in similar

  • storage systems. There's other tweaks you can do. And apply

  • them together in this optimized input pipeline, we get over

  • 2,000 images a second and we are now accelerator bound.

  • Here is TensorBoard, and everything is 100% orange. We

  • are entirely accelerator bound and it's churning 100% of the

  • time. This is great. We can now start looking into

  • optimizations that we can do on the TPU to make the TPU faster.

  • We can see that our CPU is idle. We can see that there's some

  • overhead, reshape and copy that we might think

  • about optimizing away with some device-specific optimizations.

  • Which brings me to phase two. Now, as I mentioned before,

  • we're in

  • this sort of Cambrian explosion. We're still in the early days,

  • we're finding that a lot of the accelerators work differently.

  • Some chips are smaller, some chips are

  • bigger,

  • some chips use HBM- HBM-2, for example, TPUs and CPUs. Some do

  • away with that entirely,

  • Graphcore's IPU and optimized for communication. And it's

  • hard to provide out of the box performance recommendations that

  • are going to apply to all of these different hardware

  • platforms. That said, there's a few common things if we peer

  • into the future, gaze into

  • our crystal balls, we think this is what we're going to see more

  • of.

  • What I expect to see a lot of is interesting numerical formats.

  • This is a 32-bit format we know and love.

  • Most models today have been trained in fp32.

  • There is great work at NVIDIA and

  • Baidu, and you can train an fp16, pf32, but the activations

  • in fp16. This is a big win on two dimensions. You can run a

  • larger model because more layers fits in memory. But

  • additionally, your model tends to

  • run faster, because accelerators today, GPUs and TPUs are not

  • compute-bound.

  • They're actually memory bandwidth bound.

  • The memory is too slow.

  • So fp16 can unlock a lot of great performance on devices.

  • But one other floating point format, there's bfloat 16. And

  • this is different than fp16. Even though it uses just 16

  • bits. The range is us same as fp32. You don't worry as much

  • about

  • vanishing or exploding gradients and NANs that you might when

  • using fp16. There are a number of numerical formats such as

  • flex point.

  • Folks from Intel parenned at NIPS is in a poster section. We

  • are going to make it easier to use the numerical floating

  • formats.

  • Stay tuned for APIs in this space. Another hardware trend

  • is optimization

  • nor matrix optimization. Especially in this mode.

  • In Volta GPUs, they have tensor cores with.

  • TPUs are built around a 128X128 matrix unit.

  • This is a systolic array, this tran

  • sisser configs ration that makes it easy

  • to compute convolutions.

  • And here is showing a systolic array does. It's named after

  • the heart, which is pumping blood in these cycles.

  • Plumbs through the data and you get

  • fast matrix multiplication. What this means, because we're

  • seeing

  • hardware-supported matrix multiplication at different

  • sizes and scales, the way you lay out the data and implement

  • it, can make a huge difference on performance on these

  • accelerators.

  • Here I'm calling out two different models running on

  • GPUs.

  • You use channels last, you end up losing a fair bit of

  • performance compared to a channels-first implementation.

  • If you compare different LSTM cell implementations, the folk

  • the at NVIDIA worked really hard to make really fast

  • kernels for LSTMs and make it available as part of the

  • package.

  • If you're using a GPU and want better performance, use the

  • optimized libraries for your platform. This is true. You

  • need to use the latest version of TensorFlow.

  • We're constantly working on performance improvements, and

  • cuDNN and Intel MKL we talked about today. And investigate

  • 16-bit numerical representations. We see a lot

  • of potential performance advantages there.

  • And doing inference, we have talked about TensorFlow RT,

  • which is available

  • for NVIDIA platforms that can quantize and make inference

  • really fast. And as part of the technique, if you see, for

  • example, that a particular computation is a bottleneck, you

  • can substitute it with somethings that computationally

  • faster.

  • You have Tor careful because you may change the quality of the

  • model doing this. With that, I would like to move on to phase

  • three. Now, typically, when you use an accelerator, you're

  • actually using more than one accelerator. You're using a

  • single device, it may have different components that

  • operate in parallel.

  • Here is, for example, a picture of the

  • NVIDIA DGX-1 and shows the connectivity between the GPUs.

  • You have two of four each with connectivity between them. If

  • you don't take advantage of the

  • topology, if you do a naive gradient

  • within the server going via the CPU or TCIP switches, it will be

  • a significant

  • disadvantage due to a clever utilization by nickel and CCL-2.

  • We have an optimizeed implementation available as part

  • of the benchmarks, but

  • it's a little tricky to use. We are working on making this easy

  • for everyone to use in distribution strategies. And

  • you'll hear more about distribution strategies in just

  • a few minutes.

  • TPUs, you also need to carefully aggregate your gradients.

  • And we have the cross John Boehner shoredoptimizer.

  • You take the existing SGD and just

  • wrap it with a TPU cross shardoptimizer.

  • This will aggregate across the compute shards within a single

  • device. But the exact same code works all the

  • way up to a whole cloud TPU pod across 64 different devices.

  • Now, I want to take one moment to actually talk a little bit

  • about measuring performance. I guess the saying goes, there's

  • lies, damn lies and statistics.

  • Well, I'm going do add a fourth one. Performance benchmarks.

  • The Internet is pleat with shoddy benchmarks and

  • misinformation. And this irks me to no end.

  • We have seen benchmarks, they use

  • synthetic data or measuring only certain subsets. You have

  • incomplete comparisons. One benchmark is comparing the full

  • device. One is comparing only one part of the device. We've

  • seen bugs in the machine learning where they've optimized

  • a way, or done performance tricks that make it run faster,

  • but actually make it not converge to the same accuracy.

  • You've lost quality of your model.

  • Additionally, as we look forward, this is actually, to be

  • fair, a nuanced space. And hardware is harder and harder to

  • give an apples to apples comparison.

  • We have different numerical formats and different algorithms

  • fit better on different hardware. Some chips have small

  • amounts of memory. If you have a very, very big model that

  • can't fit, that's a very unfair comparison. As a result, I

  • strongly encourage you, if you're trying to choose and

  • evaluate different hardware platforms, take your

  • workloads and measure them end to end

  • to the accuracy you want in the application. That's a fair

  • amount of work.

  • If you can't run your own workloads,

  • look to quality end to end benchmarks that measure

  • accuracy. And I think a good example is Stanford's DAWNBench.

  • There's nuance in the parameters of how they're set.

  • For example the dataset size, a small dataset size or a large

  • dataset size. And there's nuance in how to set the

  • accuracy threshold.

  • Despite these, it's a lot harder to perform well on an end to end

  • benchmark and not work on real work. You're less likely to be

  • mislead looking at the end to end benchmarks.

  • That said, we're pushing for the end- end-to-end benchmarks,

  • there's a lot of

  • utility in microbenchmarks to understand how fast are the

  • components.

  • When I was optimizeing ResNet50 for

  • the cloud TPU case, how did I know it was slow?

  • As Derek mentioned, input pipelines can have over 13,000

  • images a second. That's using VDG pre-processing. This

  • pre-processing is computationally

  • cheaper than the ResNet on Inception pre-processing, but

  • shows we can go really, really fast.

  • ResNet50 on a

  • DGS-1, this is using a mixed precision 16. This is have

  • TensorFlow nightly. This is the performance you can expect in

  • the future. Now, if you want to test the performance of the

  • GPUs themselves in isolation, that's about 6.1 synthetic

  • images a second. You are excluding the cost of your input

  • pipeline.

  • For a cloud TPU we have a few other microbenchmarks. For

  • TensorFlow 1 1.7 that's available today, you can expect

  • to achieve 2 .

  • 6-5,000 images a second, mixed flow B16.

  • You're streaming the data in with a

  • batch size of 32, and get over 76% accuracy in about 13 hours.

  • If you lop off the input pipeline and just test the

  • device performance, you're actually over 3200 images a

  • second, which is very cool with TensorFlow 1.7.

  • And with TensorFlow nightly, coming in TensorFlow 1.

  • 8, we have optimized the input pipeline performance, and you go

  • from 2600

  • images a second to over 300 image 0 images a second in

  • TensorFlow 1.8. Very exciting. A lot of work happening

  • underneath the hood. As we stare into the future even

  • further, what is coming in TensorFlow with performance?

  • This is actually our optimized input pipeline, or something

  • very close to it, and you'll notice there's a lot of magic

  • numbers we have hand-tuned and picked out. How did we choose

  • them? We spent a lot of time playing around with it and

  • picking them out. Do you need to do that? There's no reason

  • you need to do that. We can autotune a lot of these values

  • and we're working on adding in smarts to

  • TensorFlow to tune your pipelines. This is true not

  • just for the magic

  • numbers, but for these fused dataset operations functions,

  • we'll be working

  • on switching a naive straightforward implementation

  • to use the functions underneath the hood. If you give us

  • permission, we'll be

  • able to adjust things where we're not preserving necessarily

  • the order,

  • but we can do the right things for you.

  • Automatically tuning the prefetch buffer size, that last

  • line, that's available and coming in TensorFlow 1.8. We're

  • not just working on optimizing

  • the input pipelines, we're working on

  • optimizing in-device performance.

  • There's XLA and grab grappler. And they're rewriting the model

  • to work well on different platforms. There's exciting

  • work here that we will be excited to share with you over

  • time.

  • There's a lot more reading and there's a huge amount of

  • literature on this. Check out some of these things.

  • If you want to learn about reduced precision training,

  • here are references. And I'll Tweet out this link shortly.

  • They'll be available and you can load them up into TensorBoard

  • and play around with them yourself. With that, thank you

  • very much for listening to me today.

  • [ Applause ]

  • Next up is Mustafa who is going to talk

  • about TensorFlow's high-level APIs.

  • >> Thank you, Brennan. Hello, everybody.

  • [ Applause ]

  • My name is Mustafa.

  • Today I'll talk about high-level

  • APIs, but keep practitioners like you in mind.

  • To keep you in mind, we'll provide an example to increase

  • user happiness by the power of machine learning.

  • After defining the example project,

  • we'll use pre-made estimators to start our first experiment.

  • Then we'll experiment more with every

  • feature we have by feature columns. And we'll introduce a

  • couple of pre-made estimators that you can experiment more.

  • And we learn, how can you experiment with other modeling

  • ideas too?

  • So, those are the topics we will cover in this talk, and talk

  • about how to scale it up and how you can use it in your

  • production. Let's talk about estimators. So, it's a library

  • that lets you focus on your experiment. There are thousands

  • of engineers.

  • This is not a small number.

  • And hundreds of projects in Google who use estimators. So,

  • we learned a lot from their experiments.

  • And we created our APIs so that the time from an idea to an

  • experiment will be as short as possible.

  • So -- and I'm really happy to share

  • all this experience with all of you. Whatever we are using

  • internal at

  • Google is the same as the open source.

  • So, you all have the same things. Estimator keeps the

  • model function. We'll talk about what the model

  • function is later, but it defines your network and how can

  • you train or what is

  • the behavior during the evaluation or during the export

  • time?

  • And it provides you some loops such as training, evaluation,

  • and it provides

  • you some interface to integrate with tf. Also, estimator keeps

  • sessions so you

  • don't need to learn what is tf. session. It handles it for you.

  • But you need to provide data.

  • And as Derek mentioned, you can return a tf.data set from your

  • input function. So, let's define our project and start

  • with our experiment. I love hiking. This is one of the

  • pictures I took in one of my hikes.

  • And let's imagine there's a website,

  • hiking website, similar to IMDB, but it's for hiking.

  • And that website has information for each hike, and users are

  • labeling those hikes by saying, I like this hike, I don't like

  • this hike, this is my rating, and all this stuff. And we want

  • to use this data. Let's imagine you have this data from that

  • website.

  • To recommend hikes for users. How can we do that?

  • There are many ways of doing it.

  • Let's define one way machine learning can help us.

  • In this case, we want to predict probability of like.

  • Whether a given user will like a given hike or not.

  • What you have, you have hike features and user feature

  • features. And where can you learn from? The label data, you

  • can have whether users like that hike or not.

  • So, whack we use to predict if they like a hike?

  • You can use one of the pre-made estimators.

  • In this case, the pre-made estimator.

  • It's a binary estimation problem.

  • We designed estimators to go to this kind of problem.

  • This means you can use it as a black box solution.

  • Pre-made estimators are surprisingly

  • popular within Google and in many projects. Why? The

  • engineers are using pre-made solutions instead of building

  • their own models.

  • I think, first of all, it works. It handles many

  • implementations so you can focus on your experiment.

  • It has reasonable defaults for initialization, partitioning, or

  • optimization so you have a reasonable baseline as quick as

  • possible. And it is easy to experiment with new features.

  • So, we learn about that.

  • You can experiment with all of your

  • data by using the same estimator without changing it. So, let's

  • jump into our first

  • experiment so that we can have a

  • baseline that we will improve. I will talk about it, but in

  • this

  • case, you are useing hike

  • _id, it might be hike name, as an identification to your model.

  • And say you have hidden_units one. What this will learning

  • with it will learn the label for each hike idea.

  • That may be a good baseline for your overall progress.

  • You need to say what is your evaluation data, what is your

  • training data?

  • Then you can call train and evaluate. Just by this couple

  • of lines of code, you should be able to experiment.

  • And you can see the results on the TensorBoard. For example,

  • you can see training and evaluation. Or how the metric

  • is moving. Since this is a classification problem, you will

  • see accuracy metric.

  • And this is a binary estimation, you will see that. All of these

  • things are free and ready to be used. Let's experiment more.

  • Let's start with the data. Experimenting with the data

  • itself. The design feature columns with the same mind-set.

  • We want to make it -- make it easy to experiment with your

  • features, with your data.

  • And based on our experience -- internal experience -- it

  • reduces the lines of code and may improve the model.

  • There are a bunch of transformations you can handle

  • via feature columns.

  • These are bucketing, crossing, hashing and embedding.

  • Each of these needs a careful explanation. Unfortunately I

  • don't have enough time

  • here, but you can check Magnus's tutorial and the video, they are

  • very good. Let's experiment with all the hike features we

  • have.

  • Each hike may have text such as kid friendly, dog friendly,

  • birding.

  • You may choose indicator column instead of embedding column in

  • this case because you don't have a huge number of tags. You

  • don't need the dimension. And for a numerical column, such as

  • each hike may have elevation gain, you need to normalize so

  • that optimization will be well-conditioned.

  • Your problem with condition.

  • And you can use a normalizer function here.

  • Or, you may choose buckettizing. In this case, the distance of

  • the

  • hike, we bucket-ize it so it will learn the model for

  • different things for different segments. You can concentrate

  • as a different kind of normalization too. How can you

  • use all of these things together?

  • Just putting them into a list, that's it. Then your system

  • should work. So, let's experiment with personalization.

  • What we mean by personal ization is instead of

  • recommending the same hikes to all users, let's recommend

  • different hierarchies for different

  • users based on their interests. And the way -- one way to do

  • that is using user features.

  • In this case, we are using user embedding by embedding_column.

  • So, this will let the model to learn a

  • vector for each user and put the users closer if their hike

  • preferences are similar. And how can you use that?

  • Again, it's just depending into your list.

  • And you need to also play with hidden units because rough

  • minimal features now and you need to let your model to learn

  • different transformations. And the rest of the pipeline should

  • work.

  • You will hear this a lot during this talk because it's based on

  • that. The rest of the pipeline should work and you should be

  • able to analyze your experiments. Let's experiment

  • more.

  • We have a couple of pre-made solutions. I mentioned it's

  • very popular and I picked only two of them here to show.

  • One is wide-n-deep,

  • it's a joint training of the neural network and the model.

  • You may like it or not. So let's start the experiment. You

  • need to define what are the

  • features you want to fit to the neural network.

  • Again, via feature column, it's the list.

  • And you need to define the features to feed into the linear

  • part.

  • In this case, user id and picks. For example, if a user

  • always picks

  • dog of had friendly-friendly hikes, the model will learn this

  • feature.

  • And you can instantiate dnn and the rest of the pipeline should

  • work.

  • Baseed on the

  • 2017 Kaggle survey, they are very popular. And we are

  • introducing gradient

  • boosted trees as a pre-made estimator.

  • And you can experiment without changing your pipeline. Let's

  • start our experimentation.

  • In the current version, we only support buckettized column. And

  • we are working to support

  • numerical column and cot categorical column too. Here is

  • hike distance and hike elevation gain.

  • And we buckettize them. And then you can

  • have the classifier and the rest of thepipe line should work.

  • We know that trees are those at computationally expensive -- or

  • training

  • trees is not as computationally expensive as training neural

  • networks. And they fit into memory. So, by leveraging that,

  • we provide you

  • a utility so that you can train your

  • model in order of magnitude faster than the usual one. And

  • the rest of the pipeline should work.

  • So, let's say this solutions are not enough for you and you want

  • to experiment more ideas. Let's talk about them.

  • Before delve into this high-level

  • solutions that you can use, let's look at a network in a

  • supervised setting. In this case, you have a network which

  • you fit the features. And based on the output of network and

  • the labels, you need to decide what is the loss? What is the

  • objective you want to minimize? And what is -- what are the

  • metrics

  • that you will use as a success metric for your evaluation?

  • And your predictions on serving time may be

  • didn't than in training time.

  • For example, if you have a large setting, you may want to use

  • just the

  • ranking of the classes instead of the priorities in the serving

  • time.

  • For that you don't need to calculate the priority. You can

  • use the losses to rank them. For all of these, we can

  • sectored

  • them out under head API. It expects you to give the label

  • and out of your network and provides these things for you.

  • You'll see it in action. And model function is an

  • implementation of this help and network together.

  • We talk about the DNN class 5. DNN class 5 has a model

  • function.

  • It has a specific implementation for head network.

  • Let's implement this with the head API. In this case, DNN

  • estimator. And we can instantiate ahead.

  • In this case, it's binary classification head because we

  • are trying to predict whether it's like or not like.

  • And why are we introducing this head

  • since these two lines are the same as DNN classifier? Why to

  • introduce this head? So, you can experiment with

  • different ideas by combining different network architectures

  • and different heads.

  • For example, you can use wide-ended, or DNN estimator

  • with a multi-label head.

  • You can even combine different heads together, we introduced

  • multi-head here.

  • So, it's the one way of experimenting with the

  • multi-task learning.

  • With a couple lines of code you can experiment with multi-task

  • learning. Please check it out. Say the architectures are not

  • enough for you and you want to expand more, you can write your

  • own model function.

  • We strongly recommend to you to use TFK layers to build your

  • network. You can do whatever you want there. You can be as

  • creative as possible. And after you have the output of

  • network, you need to pick one of the optimizers available in

  • TensorFlow.

  • And you can use one of the heads we mentioned which will convert

  • your network had output and the labels

  • into training behaviors or evaluation

  • metrics or expart behavior export

  • behaviors. Then you can fit to the estimator. Again, the rest

  • of the pipeline should work.

  • Keras model is another way of creating your model.

  • And it's very popular, it's very intuitive to use. For example,

  • this is one of the Keras models you can build.

  • So, how can you get this estimator so that the rest of

  • the pipeline should work? You can use model to estimator which

  • gives you the estimator so you can run your experiments without

  • changing your pipeline.

  • Transfer learning is another popular technique we do.

  • Experiment with.

  • One way of extending is using model A,

  • which is already the trend, to improve the prediction of model

  • B. How can you do that?

  • Surprisingly, just copying and

  • transferring from model A to model B works. That's simple,

  • but it works.

  • And we provide that for you.

  • You can use one start -- this one line

  • is saying that transfer all of the model A into model B.

  • Or, you can define a subset of model A

  • to transfer from model A to model B. Let's talk

  • about image features.

  • We talk about embedding categorical column and image

  • features. But what if you have image features,

  • how can you use them in your pipeline with a couple of lines

  • of code?

  • You can implement one of the image classifiers.

  • Which is not a couple of lines of code.

  • Or you can, thanks to TF-Hub, you will learn later, you can

  • use one line from their hub to instantiate the feature column,

  • called image embedding column.

  • In this case, you may remember Jeff

  • mentioned AutoML. It is one of the AutoML models. It's really

  • good. It's one of the top models you can use.

  • Here you will use nasnet as a feature.

  • It will use only the optimizeer of nasnet into your feature.

  • How can you use it as the classifier?

  • Just depending it into your feature columns, and then done.

  • You can experiment with it. Let's say you experimented and

  • you find some models, but you need to scale it up. Not all of

  • you, but some of you may need to scale your training.

  • You can use multi-GPU, means multiplication on different

  • GPUs.

  • You can learn about that after my talk. You don't need to

  • change the estimator or model code. Everything should work

  • with just a single line of configuration change. Or, you

  • may want to distribute your training to multiple machines by

  • saying these are workers, these are primary

  • servers and there's very little going on. Same. You don't need

  • to change your estimator or model code. Everything should

  • work based on the configuration.

  • Or, you may want to use TPU estimator for TPU. There's a

  • minimal change in the model function. Hopefully later we

  • will fix that too. But now there's a minimal change in

  • your model function you need to do.

  • To use this in your production, you need to expart. Or you can

  • -- you need to serve.

  • And we recommend you to use TF serving. In the serving time,

  • instead of reading data from the files, you have a request. And

  • you need to define the receiver

  • function, which is defining how can you connect that into the

  • model? In this case -- and after that, what would be the

  • output of that model? Which is defined by a signature

  • definition. So, here, again, a couple of lines of code. You

  • will export your chain model with the TF serving.

  • For example, if your request is tf.

  • example, you will use this function to get your receiver

  • function.

  • And you can use export set model so

  • that it will be used by TF serving.

  • These are the modules I mentioned, tf.estimator, tf.

  • feature Curdy column. Don't use tf.learn, we are deprecating it.

  • And these are a couple that I picked. You can check it out.

  • Thank you. I hope some of you will improve your products with

  • the tools that we mentioned. Thank you.

  • And I'll introduce Igor.

  • Igor will talk about how you can do distribution with TensorFlow.

  • And he's coming. Yes. He's coming.

  • >> Hey.

  • Hello, everyone. My name is Igor. I'm going to -- I work in

  • the TensorFlow team and I'm going to talk to

  • you today about distributed TensorFlow. Well, why would you

  • care about distributed TensorFlow? Many of you know

  • the answer, probably.

  • But just in case, it's a way for your models to train faster and

  • be more parallel.

  • It's a way for you to get more things done, iterate quicker.

  • When you train in models, it can take a long time. And when I

  • say a long time, I mean week weeks.

  • Without With all the available hardware to you out there,

  • scaling up to hundreds of

  • CPUs or GPUs are really make a difference. How could you scale

  • up?

  • Well, you could just add a GPU to your machine. In this case,

  • this is just plug and play.

  • You insert a GPU, TensorFlow handles all the details for you

  • and you see a nice bump in the training speed.

  • You could also insert multiple GPUs.

  • In this case -- in this case, you would have to write

  • additional code. You need to replicate

  • your model. You need to combine gradients from

  • every GPU and if you're using batchnorm layer, you have the

  • tricky question of what to do with the statistics on each GPU.

  • The point I'm trying to make is that you need to do additional

  • work to make this work. And you need to learn stuff that you

  • didn't plan on learning. You can also use multiple machines.

  • And this situation is similar to the one before. But in this

  • case, your bottleneck is probably going to be that

  • communication between the machines. You'll start thinking

  • about minimizing that communication and probably doing

  • more work locally.

  • For example, combining the gradients in the local GPUs

  • before exchanging them with the remote GPUs. Unless specialized

  • network and hardware is used. The coordination costs in this

  • setup

  • are probably going to limit your scaling. But there is an

  • approach -- there is a solution to this. This approach is

  • called parameter server.

  • Some hosts we call them parameter servers.

  • They're going to only hold training weights. Other hosts,

  • workers, they're going to have a copy of the TensorFlow graph.

  • They're going to get their own input, compute their own

  • gradient, and then

  • just go ahead and update the training waits without any

  • coordination with other workers.

  • So, this is an approach with low coordination between a large

  • number of hosts. And there you go. This scales well. And we

  • have been doing this at Google for a long time. But there is a

  • wrinkle with this approach. You give up synchronicity. And that

  • has benefits. And if you think about it, parameter

  • server approach, it's an approach from

  • the -- from the CPU era. With all the reliable communication

  • between GPUs, we can consider designs which have tighter

  • coupleing and more coordination between the workers.

  • One such approach is based on overages. That's not a new

  • idea.

  • The general goal of all-reduce is to combine all the values and

  • distribute results to all the processes. All-

  • reduce is kind of tricky to explain in this light.

  • But you can think of its results as reduce stipulation followed

  • by a broadcast stipulation. But don't think of it that way in

  • terms of performance. It's a fused algorithm. And it's way

  • more efficient than those two operations together.

  • In addition to it, hardware vendors --

  • hardware vendors specialized or reduce implementations that

  • TensorFlow could secretly use behind the scenes to help you.

  • Alternative approaches. They typically send all data to a

  • central place.

  • All-reduce is not going to have such a bottleneck because it

  • distributed coordination between GPUs way more evenly.

  • With every tick of all-reduce, each GPU sends and receives a

  • part of the final answer.

  • So, how could all-reduce help us with our models.

  • Well, consider -- let's say you have two GPUs.

  • You copied layers and the variables in every GPU and you

  • performed the forward pass, nice and parallel.

  • But then during the backward pass, as the gradients

  • become available, we can reduce -- we

  • can use all-reduce to combine those gradients with the

  • counterparts from other GPUs. In addition to that, because of

  • the

  • way -- in addition to that, gradients

  • from the other layers are available before the gradients

  • from the other layers.

  • So, we could overlap backward

  • propagation computation and reduce communication. That

  • gives you even more per second.

  • The bottom line is, when all -- when communication between GPUs

  • is reliable, all-reduce can be fast and allow you to scale

  • well.

  • How could you use all-reduce in TensorFlow?

  • Well, so far in this talk I told you to take advantage of

  • multiple GPUs you need to write additional code, change your

  • model and learn stuff.

  • Chances are you're using -- you're following -- of using the

  • highest level API that works for your use case.

  • That probably isette is Estimator. It is the model

  • function, it has no knowledge about GPUs or devices.

  • So, to have that model use multiple

  • GPUs, you just need to add one line.

  • You need to pass an instance of a new

  • class called MirrormirrorStrategy.

  • And it is one implementation of our new distribution strategy

  • API. TensorFlow, how to replicate your model? Oops.

  • Sorry.

  • Another thing I want to say is that MirrorStrategy could take a

  • number of GPUs or a list of GPUs to use. Or you cannot give it

  • any arguments at all and then it will just figure out what GPUs

  • to use.

  • MirrorStrategy works in a way exactly as I described before.

  • It replicates your model, it uses all-reduce for

  • communication.

  • So, gradient updates from every GPUs, from all GPUs, they're

  • going to be combined before updating the waits. And each

  • copy of your model, an average GPU, is part of a single

  • TensorFlow graph. That means there is the replication

  • with synchronous training that uses all- all-reduce on many

  • GPUs. Now, the last ten minutes are kind of a waste of time for

  • you if this doesn't perform well.

  • And it does perform well. As

  • you add GPUs, this implementation scales well. We

  • have a team at TensorFlow that specifically works on fast

  • implementations of all-reduce for various machine

  • configurations. And this implementation gets 90% scaling

  • on 8 GPUs.

  • And, again, it didn't require any

  • change to the -- to the user's model.

  • It didn't require any change are because we changed everything in

  • TensorFlow that's not your model.

  • Things like optimizer, bench norm, summaries, everything that

  • rides state, now needs to become distribution aware.

  • That means it needs to learn how to combine its state with other

  • GPUs.

  • And this is important because alternative APIs out there, they

  • typically ask you to rephrase your model

  • to supply optimizeers, for example,

  • separately so that they can do all the state coordination

  • behind the scenes.

  • And if you have some experience with training your models on

  • multiple GPUs,

  • you might be wondering, well, can I -- can I save my model

  • own a computer with

  • 8 GPUs and then do an evaluation on it on a computer

  • with, say, no GPUs? Typically this causes a problem.

  • But with distribution strategy API, we

  • maintain backward compatibility on the checkpoint level.

  • So, Mirror Mirrorstrategy, it has multiple copies on the GPU.

  • It's going to save one coupe, and then at the restore time,

  • it's only going to restore that state to a required number of

  • GPUs.

  • So, this use case is supported.

  • Distribution strategy works with Eager mode as well. But we are

  • still fine-tuning the performance.

  • And distribution strategies are a very general API that I hope

  • in the future will support many use cases.

  • It's not tied to Estimator and we are

  • looking into ways of creating even more

  • -- better APIs based on distribution strategy.

  • We -- in the future, soon, pretty soon -- we intend to

  • support all kinds of -- many kinds of distributed training.

  • Synchronous, asynchronous, multi-node,

  • parallelism, all of that is as part of distribution strategy

  • API. Until then, for

  • multi-node, use estimator.train and evaluate.

  • Or Horovod, that offers a multi-node solution.

  • MirrorStrategy is available for you in our nightly build. And

  • we are very, very actively working on it.

  • And it's a product of work of many people. And I would really

  • encourage you to try it out and let us know what you think about

  • it.

  • By GitHub or talk to us after my talk. All right. Thank you.

  • Thanks for your attention. [ Applause ]

  • Next up is Justine and Shanging to tell

  • you how to debug TensorFlow using TensorBoard.

  • >> Well, thank you, everybody, for being here today.

  • We're going to be giving a talk about

  • the new TensorFlow debugger,, which comes included with

  • TensorBoard. It's basically a debugger like you

  • would see in an IDE that lets you step in separate points and

  • models and watch tensors. But before we do that, I would like

  • to

  • give you some background on TensorBoard and some of the

  • other developments which happened in the last year.

  • Which we unfortunately don't have too

  • much time to go into.

  • But TensorBoard is basically a weapon application.

  • It's a suite of weapon applications that was authored

  • by about 20 people.

  • And it's all packed into a 2

  • megabyte command web server that works offline. And TensorBoard

  • can be used for many purposes with the different plugins baked

  • into it.

  • The one you're all most familiar with for those who have used

  • TensorBoard is the scalers dashboard. You can plot

  • anything you want. It could be like loss curves, et cetera,

  • accuracy. And these things like sort of help us

  • understand, like, whether or not our model is converging on

  • optimal solutions.

  • And here is the really interesting, underutilized

  • feature called the embedding projector. And this was

  • originally written by Google so we could do things like, you

  • know, project our data into a 3D space, see how things cluster.

  • doing MNIST, the 7s here and the 9s here. And we actually

  • recently -- what you

  • see on the screen is we got a really

  • cool contribution from Francois at IBM research.

  • He sent pull requests on the GitHub repository, since we are

  • in the open. He added interactive label editing. So,

  • you can sort of like go in there

  • and change things as algorithms like

  • TSNE, give your data -- sort of reveal the structure of your

  • data.

  • To learn more, search Google for interactive super vision with

  • TensorBoard. This is another really amazing contribution that

  • we received from a university student named Chris Anderson.

  • It's called the holder plugin.

  • And this basically gives you a real

  • real-time visual glimpse into TensorFlow data structures.

  • Like, for example, as you're training script is running, it's

  • real-time. It doesn't require a hard drive. It doesn't work

  • with something like GCS at this point in time. I think this

  • would be a very useful tool going forward in terms of model

  • explainability. Now, TensorBoard also has some

  • new plugins for optimization.

  • Cloud recently contributed a TPU profiling plugin. And TPU

  • hardware is a little different from what many of you might be

  • used to. And TensorBoard, with this plugin, can really help you

  • get the most out of your hardware and ensure that it's

  • being properly utilize utilized.

  • Now, the TensorBoard ecosystem, part of the goal of this talk,

  • before we get

  • into the demo, is I want to attract more folks in the

  • community to get involveed with TensorBoard development. We use

  • many of the tools you're familiar with such as TypeScript

  • and Polymer.

  • We also use some tools you might not

  • be familiar with, luck Bazel, for good reasons.

  • You can go to the Readmes for the plugins we wrote originally.

  • Now, with TensorBoard, the reason this is just a little bit

  • more challenging compared to some of the other web

  • application you may have used or written in the past, is we deal

  • with very challenging requirements. Like, this thing

  • needs to work offline.

  • It needs to be able to build, regardless of, like, corporate

  • or national firewalls that may block certain URLs when it's

  • downloading things. For example, one of the first things

  • I did when I joined the TensorBoard team wasn't actually

  • visualizing machine learning, but adding a contribution to

  • Bazel which allows downloads to be carrier grade

  • internationally. And there are a whole variety of challenges

  • like when it comes to an application like this.

  • But those burdens are things we've mostly solved for you, and

  • here is a concrete example.

  • Writing that toilsome thousand line

  • file

  • was what it took to make TensorBoard look good anywhere

  • in the world without having to ping. That is one of the many

  • burdens that the TensorBoard team carries on behalf of plugin

  • authors.

  • Now, I want to give you a quick introduction for Shanging who is

  • the

  • author of this TensorFlow debugger.

  • And with the help of Che Che Zhang.

  • As I mentioned earlier, TensorBoard has been the flash

  • light that's giving broad overviews of what's

  • happening inside these black box models.

  • What the TensorFlow debuggers does, it turns that flash light

  • into an X had-ray. Using this plugin, you can literally watch

  • the tensors as they flow in real-time while having complete

  • control over the entire process. This X-ray is what's going to

  • make it

  • possible for you to pinpoint problems we've previously found

  • difficult to identify.

  • Perhaps down to the tiniest nan at the precise moments they

  • happen. That's why we call it an X-ray.

  • It reveals the graph and math beneath

  • the abstractions we love, Keras or

  • Estimator, or as was announced today, swift. Whatever tools

  • you're using, this

  • could potentially be a very helpful troubleshooting tool. I

  • would like to introduce its author, Shanging, who can show

  • you a demo. >> Thank you very much.

  • [ Applause ] Okay.

  • So, in a moment the screencast will start. Great.

  • Thank you, Justine for the generous intro. I'm Shanging.

  • And I'm glad and honored to present the feature plugin for

  • TensorBoard.

  • Among the many createed for TensorBoard so far. For those

  • that know TensorFlow

  • debugger or TFDGB, it's only had a can command line interface

  • until recently. Like the command line interface, the

  • debugger plugin allows you to look into internals of the run

  • in TensorFlow

  • model, but in a much more intuitive and richer environment

  • in the browser. In this talk I'm going to show two examples.

  • One example of how to use the tool to understand and probe and

  • visualize a working model that doesn't have any bugs in it.

  • I'm also going to show you how to use the tool to debug a model

  • with a bug in it so you can see how to use the tool to catch the

  • road cause of problems and fix them.

  • So, first, let's look at the first example. And that's the

  • example on the right part of the screen right now. It's a

  • simple TensorFlow program that does some

  • regression using some generateed synthetic data.

  • And if we run program in the con

  • console, we can see a constant decrease in the loss value

  • during training. Even though the model works, we have no

  • knowledge how the model works. That's mainly because in graph

  • mode, the sessions are run as a black box. That wraps all the

  • computation in one single line of Python code. What if we want

  • to look in the model?

  • Look at the matrix multiplication in a dense layer

  • and have a gradient and so forth? The TensorFlow debugger

  • or TensorBoard debugger plugin as a tool can allow you to do

  • that.

  • To start the tool, we start the TensorBoard with the flat

  • debugger port. We specify the port to be 7,000.

  • Once it's running, we can navigate to our TensorBoard URL

  • in the browser. At the startup, the plugin tells you it's

  • waiting for connections from TensorFlow to run. That's

  • because we haven't started program yet.

  • And code snippets, estimators or Keras models.

  • In this model, we're using tf. session. The first line is an

  • import line, and the second line is a line that wraps are the

  • original objects with a special wrapper that has the information

  • where to connect the port number.

  • Now, without our -- with our program implemented we can start

  • program again. Now, as soon as program starts, we

  • can see the graphical user interface in the browser switch

  • to a mode that shows you a graph of the sessions that are run in

  • two ways. In a tree view on the left and in a graph on the

  • right. On the bottom left corner, you can also see what

  • session is currently executing. The tree structure on the right

  • corresponds to name scopes in your model.

  • For example, the dense layer -- the dense name scope corresponds

  • to the dense layer. You can open the source code to look at

  • the correspondence between the graph nodes and the lines of the

  • path and program created those programs.

  • If you click nasnal, you can see which line of the

  • path and source code is responsible for creating that

  • node. In this case, it's dense layer. As expected.

  • If you click another -- if you click the last tenser, you will

  • see the corresponding node in the graph. You can see where

  • it's created in the path and source code. It's where we call

  • the me squared error.

  • And the gradients, name scope corresponds to the back

  • propagation part of the model. You can click around, poke

  • around and

  • explore how a TensorFlow model does optimization and

  • propagation if you are interested. And these nodes are

  • created when we created the decent optimizer. You can

  • continue to any node of the graph and pause there.

  • So, we have just continued to the m node in the dense layer.

  • And you can continue to the gradient at the matmul. And we

  • did. And you can see the summaries of the tensor values.

  • You can look although the data type, their shape and also the

  • range of their values.

  • So, in the so-called health pills, you can look at how many

  • of those values are zero and negative or

  • positive and so forth. You hover over those, you can get

  • more information such as the mean and the standard deviation

  • of the values in the tensor.

  • So, next, we can click these links to

  • open a detailed view of the tensors.

  • You can apply slicing to reduce the dimensionality so it's easy

  • to look at the values. We have reduced the dimension from 2 to

  • 1, looking at it as a curve. Now continue to the loss tensor.

  • Which is a scaler. And yep. It's a scaler. And the shape is

  • an empty list as we can see here. We can switch to the

  • history -- full history modes. We can look at how the value

  • changes as it is being trained.

  • So, with the full history mode enabled, we can continue other

  • the sessions. Like 50 of them. We can see in real-time how the

  • loss value decreases and how the value on the Matmul changes.

  • That's how you can use it as an X-ray animater for the models to

  • have a better understanding how your model works. Next, let's

  • look at a broken model.

  • That's the debug model we ship with TensorFlow. That's the

  • only broken model we ship with TensorFlow as far as I know, and

  • I'm proud to be the author of it. We can see that the model

  • doesn't quite work. After two iterations of training, the

  • accuracy is stuck at about 10%.

  • We suspect there might be bad numerical values like not a

  • number or infinities. But we're not sure which nodes of the

  • graph are responsible for generating those infinities.

  • To answer that question, we can use the debugger tool. We do a

  • refresh in our browser.

  • And then we can start our debug MNIST example to connect to the

  • debugger plugin.

  • So, again, we're looking at the graph. Now, in order to find

  • the nodes

  • responsible for that infinities, we can look at the watch points.

  • And use the conditional break points

  • feature to continue running the model until any tenser includes

  • it.

  • You are seeing a list of tensor values. Those a complete list

  • of tensors involved in training the

  • model. In a moment, the model is stopping

  • because it hit an infinity in the

  • tensor, cross entropy/log. We can see in the health pill and

  • see

  • in the detailed tenser view those orange lines. Showing you

  • the infinity values. Now, the question is, why do those

  • infinity values happen? So, we can go back to the source code

  • and find the line of Python code where

  • it's created and that's where it's tf.lot. And we can open up

  • the graph view and we see the inputs. So, we can trace the

  • inputs.

  • In this case, the input is the softmax tensor. We can expand

  • and highlight and look at the value of the inputs, which is

  • softmax.

  • There are, indeed, five values of the is tensor. And the

  • reason for infinity is because we have log of zero. With that

  • knowledge, we can go back to the source code and fix it. We're

  • not going to do this in this demo here. All right.

  • So, that's the TensorBoard debugging. And I encourage you

  • to use it, explore it. And hopefully it will help you

  • understand your model better and help you fix

  • bugs much more quickly. You can just use this simple command

  • line, TensorBoard, with a special flag. With that, I

  • would like to hand this back to Justine.

  • [ Applause ]

  • >> Well, thank you, Shanging. I thought that was a really

  • interesting demo.

  • And it is -- and it was a great leap forward for TensorBoard.

  • And it really shows that one of the things we have been doing

  • recently is, rather than being a read-only reporting tool, we're

  • trying to explore more

  • interactive directions as we have shown you today.

  • These are things folks who are

  • productionizing TensorBoard, such as cube flow, should take

  • into consideration.

  • We want to attract more contributors. We have two

  • approaches for this where you can develop an artificial repo

  • and send us pull requests. We do our work in the open. This

  • does need approval on security footprint, et cetera. And there

  • is an escape hatch if that doesn't work out.

  • You can independently develop plugins, you can create custom

  • static builds without anyone's approval. You can do whatever

  • you want. Because part of the goal on this team

  • is to liberate the tools.

  • With that said, I want to thank all of you for attending. And

  • thank you watching on YouTube.

  • If you like this talk, hashtag Tweeter or, you know, reach out.

  • Thank you, again.

  • [ Applause ]

  • Test. Test.

  • >> Hi, everyone.

  • Hope everybody had a good lunch.

  • I'm Sarah Sirajuddin, an engineer on activity. And my

  • colleague, Andrew Aselle, also on the same team. And we are

  • excited to talk about the work we have been doing to bring

  • machine learning to mobile devices.

  • So, in today's talk, we'll cover three areas.

  • First, how machine learning on devices is different and

  • important. Then we'll talk about all the work

  • that we have been doing

  • on TensorFlow Lite.

  • And then how you can use it in your apps. Let's talk about

  • your devices.

  • Usually a device is a mobile device, basically a phone. Our

  • phones are with us all the times.

  • These days they have lots of censors giving rich data about

  • the world around us.

  • And lastly, we use the phones all the time.

  • Another category of devices is edge devices.

  • And this industry has seen a huge growth in the last few

  • years. By some estimates that are 23 billion

  • connected devices, smart speakers, smart

  • watches, smart sensors, with what

  • have you. And technology that only was available on the most

  • expensive devices is now available on the cheaper ones.

  • So, this rapid increase in the availability of these more and

  • more

  • capable devices has now opened up many opportunities for doing

  • machine learning on device. In addition to that, though, there

  • are several other reasons why you may consider doing on device

  • machine learning. And probably the most important one is

  • latency.

  • So, if you're processing streaming

  • data such as audio or video, then you don't want to be making

  • calls back and forth to the server. Other reasons are that

  • your processing

  • can happen even when your device is offline. Sensitive data can

  • stay on device. It's more power-efficient because the

  • device is not sending data back and forth. And lastly, we are

  • in a position to

  • take advantage of all the sensor data that is already present on

  • the device. So, all that is great. But there's also a

  • catch.

  • And the catch is that on-device machine

  • learning is hard. And the reason it is hard is that many

  • of these devices have some pretty tight constraints.

  • Small batteries, low compute power, tight memory.

  • And TensorFlow wasn't a great fit for this. And that is the

  • reason we built TensorFlow Lite.

  • Which is a light-weight library and tools for doing machine

  • learning on embedded and small platforms. So, we launched

  • TensorFlow Lite late last year in developer preview. And since

  • then we have been working on adding more features and support

  • to it. So, I'll just walk you through the high-level design of

  • the system. We have the TensorFlow Lite format. This is

  • different from what TensorFlow uses and we had to do so for

  • reasons of efficiency. Then there's the interpreter, which

  • runs on device.

  • Then there are a set of optimized kernels. And they are

  • there are interface which is you can use to take advantage of

  • hardware acceleration when it is available.

  • It's cross-platform, so, it supports Android and iOS. And

  • I'm really happy to say today that we also have support for

  • Raspberry Pi and pretty much most other devices which are

  • running Linux.

  • So, the developer workflow roughly is that you take a

  • trained TensorFlow model and then you convert it to the

  • TensorFlow Lite format using a converter. And then you update

  • your apps to

  • invoke the interpreter using the Java or C++ APIs. One other

  • thing that I want to call out here is that iOS developers have

  • another option.

  • They can convert the trained TensorFlow graph into the CoreML

  • graph

  • and use the CoreML run time directly.

  • And this TensorFlow to CoreML converter is something we worked

  • on

  • together with the folks that built CoreML. There are top of

  • the mind every time you talk about TensorFlow Lite.

  • The two most common, is it small in size? And is it fast? Let's

  • talk about the first one. Keeping TensorFlow Lite small

  • was a key goal for us when we started building this. So, the

  • size of our interpreter is only 75 kilobytes. When you include

  • all the supported ops, this is 400 kilobytes. Another thing

  • worth noting here is a feature called selective registration.

  • So, developers have the option to only include the ops that

  • their models need and link those. And thereby keep the

  • footprint small. So, how do we do this? So, first of all, we

  • have been pretty

  • careful in terms of which dispense dependencies we

  • include. And TensorFlow Lite uses flat buffers

  • which are more memory efficient than critical buffers. And

  • moving on to the next question, which is performance.

  • Performance, a super-important goal for us. And we made design

  • choices throughout the system to make it so.

  • So, let's look at the first thing, which is the TensorFlow

  • Lite format.

  • So, we use flat buffers to represent models.

  • And FlatBuffer is it a cross-platform serialization

  • library developed for game performance, and since used in

  • other sensitive applications. And the advantage of using

  • FlatBuffers is we are able to access data without going

  • through heavy weight, parsing or steps of the large files which

  • contain weights. Another thing we do at the conversion

  • is we look at the biases and that allows us to execute faster

  • later on.

  • The TensorFlow Lite interpreter uses static memory and execution

  • plans which allows us to load up faster.

  • There are a set of optimized kernels which have been

  • optimized to run fast on

  • the NEON and ARM platforms. We wanted to build TensorFlow Lite

  • so

  • that we can take advantage of all the innovations that are

  • happening in silicon for these devices. So, the first thing

  • here is that

  • TensorFlow Lite supports the Android neural network API.

  • The Qualcomm HVX is coming out soon.

  • And MediaTek and others have announceed their integration

  • with Android neural network API. So we should be seeing those in

  • the coming months as well. Second thing here is we have

  • also been working on adding direct GPU acceleration.

  • And useing Metal on iOS.

  • So, quantization is the last bit that I want to talk about in the

  • context of performance.

  • So, roughly speaking, quantization is

  • this technique to store numbers and perform calculations on them

  • in representations that are more exact than 32-bit floating point

  • numbers. This is important for two reasons. One, the smaller

  • the model, the better it is for these small devices.

  • Second, many processers have specialized instruction sets

  • which process fixed point operations much faster than

  • they process floating point numbers.

  • So, a very naive way to do point optimization would be to shrink

  • the

  • weights and activations after you're done training.

  • But that leads to suboptimal accuracies. So we have been

  • working on doing quantityization at training time.

  • And we have recently released a script which does this. What we

  • have seen is for architectures

  • and inception, we are able to get accuracies that are similar

  • to their floating point counterparts while seeing pretty

  • impressive gains in the latencies. So, I've

  • talked about a bunch of different performance

  • optimizations. Now let's see what all of these translate to

  • together in terms of numbers.

  • So, these are two models that we benchmarked and we ran them on

  • the Android Pixel 2 phone. We were running these with four

  • threads and using all the four large cores of the Pixel 2.

  • And what you see is that we are seeing

  • that these quantized models run three times faster on TensorFlow

  • Lite than their floating point counterparts on TensorFlow.

  • So, I will move on now and talk about whats supported on

  • TensorFlow Lite.

  • So, currently, it is limited to inference only, although we are

  • going to be working on supporting training in the

  • future. We support 50 commonly-used operations which

  • developers can use in their own models. In addition, they can

  • use any of these popular open source models that we support.

  • One thing to note here is that we have an extensible design,

  • so, if a developer

  • is trying to use a model which has an app not currently

  • supported, they

  • have the option to use what we call a custom op and use that.

  • And later in this talk, we will show you some code snippets on

  • how you can do that yourself.

  • So, this is all theory about TensorFlow Lite. Let me show

  • you a quick video of TensorFlow Lite in practice.

  • So, we took this simple mobile-like

  • model and we retrain on some common objects that we could

  • find around our office. And this is our demo classification

  • app, which is already open sourced. As you can see, it is

  • able to classify these object objects.

  • [ Laughter ] So, that was demo. Now let's

  • talk about production stuff.

  • So, I'm very excited to say we have been working with other

  • teams in Google to bring TensorFlow Lite to Google apps.

  • So, the portrait mode on Android

  • camera, 'Hey Google' on Google Assistant, Smart reply on Google

  • OS, these are going to be powered by TensorFlow Lite in

  • the future.

  • And I'm going to hand it off to Andrew who can tell you how to

  • use TensorFlow Lite. >> Thanks for the introduction.

  • So, now that we see what TensorFlow is, or TensorFlow

  • Lite is, in particular, let's find out how to use it. Let's

  • jump into the code. So, the first step of the four-step

  • process is to get a model. You can download it off the Internet

  • or you can train it yourself. Once you have a model, you need

  • to convert it into TensorFlow Lite format using our converter.

  • There might be ops you want to spot

  • optimize using special intrinsics or hardware specific

  • to your application. Or we don't support.

  • write custom ops for that. You can go to the app and write it

  • using the client API of your choice. Look at the conversion

  • process.

  • We support save model or frozen graph def.

  • And we are showing the Python interface.

  • We give it a direct directory of a

  • save model, and it gives us a FlatBuffer out. Before that's

  • done, there might be

  • some things you need to use to make this work better.

  • The first thing is you need to use a frozen graphdef. A lot of

  • times a training graph has conditional logic or checks that

  • are not

  • necessary for inference.

  • Sometimes it's useful to create a special inference script.

  • Lastly, if you need to look at what the model is doing, the

  • TensorFlow is good, but we have the TensorFlow Lite visualizer

  • and looking at these compared to each other can help you. If you

  • find issues, file them with

  • GitHub and we will respond to them as we get needs.

  • So, lastly, write custom operator. Let's see how to do

  • that.

  • To write it in TensorFlow Lite is relatively

  • simple. The main function is invoke. I've defined an

  • operator that returns PII. A one scaler Pi. Once you have

  • done that, you need to register the new operations. There's a

  • number of ways to register operations.

  • If you don't have custom ops and don't need overriding, you can

  • use in the built-in resolver. But you might want to ship

  • binary that's much smaller. You might want selective

  • registration. You should ship a needed ops resolver

  • or include your custom ops in that same thing. Once you have

  • the ops set, you just plug it into the interpreter. Okay. We

  • have talked about custom operations. Let's see how we

  • put this into

  • practice in the Java API.

  • In Java, you put it in the interpreter, fill in the inputs

  • and outputs, and run run, which will

  • populate with the results of the inference. Really simple.

  • Next, how to include this? Compile a bunch of code? We're

  • working hard to make it so you can use TensorFlow from the PIP

  • and do the training and you don't need

  • to compile TensorFlow.

  • We parade an Android Gradle file and you don't need to compile

  • for an Android app.

  • We have a similar thing for Cocoapods. Once we know how to

  • use TensorFlow, look at the roadmap. As we move forward, we

  • are going to support more and more TensorFlow models out of

  • the box with more ops. Second, add on-device training and look

  • at so you can do hybrid training.

  • Some of it on your server, some on your device. Wherefore it

  • makes sense. Should be an option. And include tooling to

  • analyze graphs better to do more optimizations.

  • We have more that we can talk about that we're working on, but

  • hope this will make you interested and excited to try

  • it. So, there's one remaining question

  • left, which is, should I use TensorFlow mobile or TensorFlow

  • Lite? TensorFlow mobile is a stripped down

  • set of TensorFlow that uses a subset of the ops. We are going

  • to improve TensorFlow Lite and its ability to map to custom

  • hardware.

  • We recommend you target TensorFlow Lite as soon as

  • possible if it's possible. If there's functionality you need

  • in TensorFlow mobile, let us know and we'll work to improve

  • TensorFlow Lite ins a commensurate way. Okay. Demo

  • time. So, nothing like a live demo, right? So, let's switch

  • over to the demo feed and we'll talk about it.

  • And, so, we saw some mobile phones. Mobile phones are

  • really exciting, you know, because everybody has them. But

  • another thing that's happening is these edge computing

  • devices.

  • One of the most popular ones for hobbyists is the Raspberry Pi.

  • I have built hardware around the Raspberry Pi. As we zoom in, we

  • have the Raspberry Pi mobile board. This is a system on

  • chip, similar to a cell phone chip. And one of the great

  • things about the Raspberry Pi is that they're really cheap.

  • Another great thing, they can interface to hardware. Here

  • we're interfaced to a microcontroller that allow us to

  • basically control these motors.

  • These are server motors, common in RC cars.

  • And they allow us to move the camera left and right and up and

  • down.

  • Essentially it's a camera gimbal. And it's connected to

  • a Raspberry Pi compatible camera. What to do with this?

  • We showed the classification demo before.

  • Let's look at an SSD example. Single shot detection.

  • It can identify bounding boxes in an image. Given an image, I

  • get multiple bounding boxes. So, for example, we have an

  • apple here. And it identifies an apple.

  • Now, the really cool thing we can do with this, now that we

  • have the motor, we can tell it to center the apple. We turned

  • on the motors, and they're active. And as I move it

  • around, it's going to keep the apple as centered as possible.

  • If I go up, it will go up, if I go down, it will go down. So,

  • this is really fun. What could you use this for? Well, if

  • you're a person, it can identify you. Currently I have that

  • filtered out. So, if I stand back, it's going to center on

  • me. So, I could use this as sort of a virtual

  • videographer. Imagine a professor wants to tape their

  • lecture, but they don't have a camera person. This would be a

  • great way to do that. I'm sure that all the hobbyists around

  • can now use TensorFlow in a really

  • simple way, can come up with many better applications than

  • what I'm showing here.

  • But I find it fun and I theme you do.

  • And I'm not an electrical or mechanical engineer, so you can

  • do this too. All right. Thanks for showing the demo.

  • Let's go back to the slides, please.

  • So, I had a backup video just in case it didn't work. It's

  • always a good plan. [ Laughter ]

  • But we didn't need it. So, that's great. So, let's

  • summarize.

  • So, what should you do? I'm sure you want to use TensorFlow

  • Lite. Where can you get it?

  • You can get it on GitHub right around the TensorFlow

  • repository. How you can find out about it is looking at the

  • TensorFlow Lite

  • documentation on the TensorFlow. org website.

  • And we have a mailing list, tflite@tensorFlow.org. And you

  • can tell us about issues and

  • what you use TensorFlow Lite for. I hope to hear from all

  • you. One at a time, please.

  • Thanks, everybody, thanks Sarah for her presentation. Thank

  • you, everybody around the world listening to this. And in

  • addition, this was work that we worked very hard with other

  • members of the Google team, lots of different teams. So, there's

  • a lot of work that went into this. So, thanks a lot.

  • [ Applause ]

  • So, with that -- we have our next talk. And where is our

  • next speaker?

  • Our next speaker will be Vijay.

  • And he's going to be talking about AutoML.

  • >> Thank you very much. Hi, everybody. My name is Vijay.

  • And today I'll be talking to you, or hopefully convincing

  • you, that when we try to apply machine learning to solving

  • problems, that we should really be thinking about designing

  • search spaces over solutions to those problems. And then we can

  • use automated machine learning techniques in order to evaluate

  • our ideas much more efficiently. I think a big reason why a lot

  • of us are here today is due to the incredible

  • impact that machine learning can have on practical problems. Two

  • often-cited reasons is that we

  • have increasing amounts of compute capability and access to

  • data to train on. But I think one other aspect is all of you,

  • right? There's so many more people involved in many machine

  • learning today that are contributing and publishing

  • ideas.

  • So, this graph tries to put this into perspective by measuring

  • how many machine learning papers are published on archive every

  • year since 2009. And plotting that

  • against a Moore's Law exponential growth curve. As

  • you can see here, we have been keeping up with Moore's law, 2X,

  • every year. And this is demonstrating how many new ideas

  • are being developed in the field. This is a great thing,

  • right?

  • So, one concrete way of looking at this is in the field of

  • computer vision,

  • we have seen top one image and accuracy start from the 50%

  • range from the

  • AlexNet architecture, which, by the way, revolutionized the

  • field of image classification. And every year we have been

  • getting

  • better and better up until 2017. Now, these improvements haven't

  • come just because we have been training bigger models, right?

  • These improvements have also come from the fact that we have

  • lots of great ideas, right?

  • Things like, batch normalization,

  • residual or skip connections and various regularization

  • techniques. Now, each of these points, like Jeff mentioned

  • earlier, is the result of years of research effort. And we

  • build on each other's ideas. But one of the challenging

  • things is how do we keep up with so much -- so many ideas that

  • are being produced? And Is want to zoom in a little bit in terms

  • of the complexity of some of these models.

  • So, we're going to zoom in a little

  • bit on InceptionV4 and look at the idea embedded in there.

  • These are models within the architecture.

  • Every one of the arrows and operations was designed by a

  • human. Somebody wrote some code in order to

  • specify all of these little details. Now, there are

  • high-level reasons why this kind of architecture might make

  • sense.

  • But our theory doesn't really explain with so much certainty

  • how every detail seems to matter. And as a field, I

  • think, we're definitely working on trying to improve the theory

  • behind this.

  • But for many of us, we're happy to use this kind of complexity

  • out of the box if we can. Because it really helps to solve

  • problems. Now, this isn't too surprising. We know that

  • because machine learning has had such an impact

  • on real products, that we're going to be willing to use

  • anything we possibly can. And even if we don't understand all

  • the little minor details. As long as it solves our problems

  • well

  • and hopefully are understandable.

  • So, given all these ideas, how can we harness this explosion of

  • ideas much more efficiently?

  • So, let's step back and kind of ask a

  • few questions that we might have heard when we were trying to

  • train machine learning models. Simple, but hard questions.

  • What learning rate should I apply for my optimization? If

  • I'm training a deep neural network model, what dropout rate

  • should I apply? How do we answer this question today? I

  • think we combine a few different types of benefits.

  • One of them is leveraging research or intuition and

  • engineering intuition. What this means is that we start with

  • code, or we ask our colleagues, hey,

  • what are good settings for these fields? If it were the case

  • that there was one setting that worked for

  • everybody, we wouldn't be looking at these parameters.

  • But it does matter. So, then, we move on to some trial and

  • error process. We try a certain setting and see how well it

  • works on our problem and we continue to iterate. And I

  • think the other aspect, which is

  • becoming more common, hopefully, is increasing access

  • to compute and data by which we can evaluate these ideas. So,

  • this combination is really ripe for automation, right? And not

  • surprisingly, this exists today.

  • It's called hyperparameter optimization.

  • And this kind of setup, we have a tuner giving out these

  • hyperparameter settings. We have a train their trains our

  • model on our dataset and then tries to give some kind of

  • signal about how good those settings were. And it might

  • give a validation accuracy of some value. And the tuner can

  • then learn from this feedback to find better points from the

  • search space. And, you know, this is an existing

  • big field and there are existing systems

  • like those shown at the very bottom that can help you to do

  • this. But now let's ask a few more complicated or detailed

  • questions that I think people do often ask as well.

  • Why do you use batchnorm before relu? I switched the order and

  • it seems to work better. If you're trying to train a

  • completely new model, use one type of sub architecture or

  • another type of sub architecture? Now, if you think

  • about it, these questions aren't really that different from

  • hyperparameter settings.

  • So, if we think of hyperparameter optimization as

  • searching over a specific domain of ideas, then it seems possible

  • that maybe we can actually treat the decisions made in this type

  • of model as another form of searching over a domain of

  • ideas. And we can therefore think about deemphasizing any

  • specific decision that we make on our architectures.

  • And instead think about the surplus of ideas that we might

  • have.

  • So, let's take a concrete example of a search space design

  • that my colleague Barrett did where

  • he tried to design a search space for a convolutional cell.

  • I'll walk you through how you might design such a work space.

  • So, the first question is, you have to get your inputs. Might

  • say you have access to the previous input. And if you want

  • support for skip connections, you might have the previous,

  • previous input. So, the first job in the search space is to

  • define which inputs I'm going select. And then, once you have

  • those inputs selected, you want to then figure out what

  • operation should I apply to each of those inputs before summing

  • them together? So, I might select something like

  • three by three convolution or three by three maxpooling and

  • combine those together. We can then recursively turn that crank

  • and apply it several more times where we use different

  • operations for different inputs.

  • And we can even use the intermediate outputs of previous

  • decisions in our search. And then finally, you take all of

  • your

  • outputs that are unused and you concatenate them together.

  • And that is your convolutional cell.

  • And if you want to build your model,

  • like ResNet, stack them together. This is one point

  • from the search space of ideas.

  • There are a billion possible ways to

  • construct a cell like this in the search space.

  • Changing the list and the way the connections can be made.

  • Now that we've designed our search

  • space, we go back to the hyperparameter tuning system.

  • We have a program generator on the left that generates samples

  • from this search space.

  • We then train and evaluate on the task at hand. Oftentimes a

  • proxy task. And iterate to quickly find what are the best

  • programs from our search space? And the system on the left, in

  • program generator, can optionally learn from feedback.

  • So, it might use something like reinforcement learning,

  • revolutionary

  • algorithms, or even search can work well in certain situations.

  • So, we did this type of approach. We took this

  • convolutional cell.

  • Trained it on proxy task to make quick

  • progress on the evaluation of an idea. And then we took the best

  • candidate cells that with found from that search. We enlarged

  • in terms of the number of filters and the number of times

  • we

  • stacked it and applied it to the ImageNet dataset. These are two

  • found from the search. Looking at the results, you can see that

  • we were able to do better than the existing state of the art

  • models in

  • terms of top line accuracy top 1 accuracy. This effort was an

  • example where we took a model where decisions were pretty

  • complex and we honestly found another complex model that was

  • better.

  • But next I'll show you an example where we can use this

  • general technique

  • to find even more interpretable outputs.

  • So, let's look at optimization update rules.

  • Most of you are probably familiar with stochastic

  • gradient decent.

  • And shown on the left, gradient by the learning weight and

  • delta.

  • And then we have Adam, these can be

  • expressed fairly concisely just by being given the moving

  • average of the gradient and so forth.

  • But we really only have a handful of these type of

  • optimization update rules that we typically apply for deep

  • learning, for example. What if we, instead, treat these update

  • equation rules as part of a larger search space?

  • And so, you can take these expressions and turn them into a

  • data flow graph that uses the optimization update rule. We

  • can express them using this simple tree, but also a lot of

  • other ideas. And so, you can then turn this crank

  • on this new search space and try to find

  • a better optimization update rule.

  • So, my colleagues ran that experiment. They took a fixed

  • convolutional model

  • and tried to search over the fixed rules. They found

  • optimizations that did better than what I have shown you on

  • this particular task. One nice feature of this search space,

  • the results are more interpretable. The fourth

  • update rule here.

  • Taking the gradient and multiplying it by an expression.

  • The gradient and the moving average

  • the gradient agree in your direction, you should take a

  • bigger step in that direction. And that they disagree that we

  • should make a smaller step. This is actually a form of

  • momentum. And so, one thing we can get from this

  • is maybe we should be designing search

  • spaces that have more notions encoded in the search space

  • ideas. We may be able to find even better results. So, so far

  • I have focused on techniques and search space ideas where we care

  • about accuracy.

  • But what's great about searching over many ideas, we night have

  • the potential to search over more than just accuracy. For

  • example, a lot of us care about inference speed. We want to

  • take a model and deploy it

  • on real hardware, real mobile platform. And we take a lot of

  • time and try to figure out how to take one idea and make it

  • fast enough. But what if could, as part of the search space of

  • ideas, finds ones that balance both speed and accuracy? So, we

  • tried to do this experiment

  • where we included the run time on a

  • real mobile device as part of this inner loop of the

  • evaluation.

  • So, we tried to focus to optimize on both accuracy as

  • well as inference speed. And as this process goes on over time,

  • program generator is able to find faster models while also

  • figuring out how to

  • make those models even more accurate.

  • One interesting side effect of this is

  • that when you run searches over ideas, the output is actually

  • not just one model, it's a culture of models that

  • implicitly codes this tradeoff. This shows you we have points

  • along

  • the space that provide a tradeoff between inference speed

  • on a mobile platform and accuracy on the dataset that

  • we're trying to solve. Rather than manually engineering the

  • one point I want to get working, I can get a result that can

  • maybe be deployed on various types of platforms. So, I'll

  • emphasize this in maybe a slightly different way. Which

  • is that we could define a search space of ideas in TensorFlow,

  • and through this automatic machine learning

  • process, we could get models that have a guarantied run time

  • performance target on a target platform device. And one of the

  • nice things about

  • having an integrated ecosystem like TensorFlow, you can just

  • use the

  • libraries that convert from program program to program to

  • you can get this end to end pipeline working well together.

  • There's nothing required to specifically tune a model. Let

  • me conclude by returning to this process of evaluating ideas in

  • this world where we're trying to explore different ideas. The

  • first is that we designed search

  • spaces to try to test out a large set of possible ideas.

  • Note that when we designed the search space, that required

  • human intuition. There's a need for human ingenuity as part of

  • this process.

  • So, designing the search space

  • properly takes a lot of efforts, but you can evaluate many more

  • ideas much more quickly. When it comes to trial and error, we

  • had to think about how software should be changed so that we can

  • permit this type of search process. So, for example, I

  • think many of us have probably written scripts where you take

  • things like learning rate and dropout rate as command line

  • flags.

  • What if you wanted to test out deeper ideas in your programs?

  • How do you design a program that's much more tuneable at all

  • levels of your program? I think this is a big question for us to

  • tackle. And lastly, we think these ideas will become

  • increasingly relevant as many of you get access to more and more

  • computation capabilities such as TPU pods. Imagine a world where

  • all you have to do is take your idea, submit it to an

  • idea bank and you have a pod of TPUs crunching overnight to

  • figure out which solution organization ideas are the best

  • and then waking up in the morning and it telling you,

  • these were the good ideas, these were the bad ideas and so forth.

  • I think part of the reason this excites me is that automatic

  • machine learning can keep these machines much more busy than we

  • can. We have to sleep.

  • But machines can keep on churning 24/7. So, with that,

  • thanks for listening. [ Applause ]

  • And next up is Ian who will be talking

  • to you about fusion plasmas.

  • So, I want to talk to you about

  • something that's very important to me.

  • And that's how will civilization power itself for the next

  • hundred years?

  • So, in 2100, the projected world's population is 11.2

  • billion. If all 11.

  • 2 billion people to want enjoy the same

  • power usage that we do now in the United

  • States, that's going to require burning around .

  • 2yata joules of energy over the next hundred years. That's a

  • whole lot.

  • So, to put that in perspective, if we wanted to do that with oil

  • alone, we

  • would are have to ramp up oil production by a factor of 10 for

  • the next hundred years. There's no way that's going to happen.

  • Besides being infeasible, that would

  • contribute to catastrophic climate change.

  • If we want to keep climate change to a

  • not ideal, but reasonable, say, 2

  • degrees temperature increase, only 1.

  • 2% can come from coal or oil.

  • Where does the other come from?

  • One possible source would be nuclear fusion.

  • So, fusion involves pushing together two smaller nuclei.

  • What you get out is a whole lot of energy. And no greenhouse

  • gas.

  • So, right now the sun runs on nuclear fusion.

  • And the reaction is so energy-dense that .

  • 2 yatajewels would require a trivial amount of boron.

  • So, so far it sounds like a miracle fuel. What's the catch?

  • Well, the difficulty is that people have been trying this for

  • 70 years and so far no one has gotten out more energy than they

  • put in.

  • So, to understand this, you have to imagine that the -- well, the

  • reaction takes place inside of a plasma.

  • And the plasma is a million plus

  • degree swarm of charged particles. And these particles

  • don't to want stay in place.

  • The sun uses a gravitational force to keep everything in

  • place. We can't do that.

  • So, instead, we use magnets. Now, magnets, you try to squeeze

  • it with magnets and they can pop out the end. And you can get

  • little turbulent ripples. And what happens is the plasma

  • breaks up, it gets unstable. It gets

  • cooler. And then the reaction stops. And that's what's been

  • happening for 70 years. So, this is the kind of problem that

  • I like.

  • It combines physics, probability, computation,

  • mathematics.

  • And so, I was like, I want to work on this.

  • How can we accelerate progress?

  • Well, so, Google is not building a fusion reactor. What

  • we have done is we have partnered

  • with TAE technologies, the world's largest private fusion

  • energy company. And we have been working with them since

  • 2015.

  • So, pictured here is their fifth

  • generation plasma generation device. And this thing is huge.

  • It would fill up a large part of this room.

  • And then in the center we have -- is

  • where the Applause ] many plasma is kept.

  • This is elongated toriod.

  • And the goal is to keel this in its place and prevent

  • turbulence. If it gets out of place, then the reaction stops.

  • So, there's magnets and neutral beams and a host of other

  • technologies to keep it in place.

  • Now, what's Google's job specifically?

  • Well, our goal is to take the measurements that come from this

  • experimental reactor. And every time the physicists do an

  • experiment, within five minutes, we want to tell them the plasma

  • density, temperature and magnetic field on a

  • three-dimensional grid. So, how hard is that?

  • Well, first of all, the plasma is very, very hot.

  • So, you can't just poke it with a thermometer like a turkey.

  • The thermometer would melt and you would disrupt the plasma and

  • ruin the experiment. What you do have are measurements along

  • the boundary. But there's only so many measurements

  • you can take, because you can't cut to -- can't cut, you know,

  • that many holes

  • in the side of this device. So, let's look closely at one.

  • Let's look at measuring of electron

  • density, that's done with a device known

  • as as an inner pherometer, it's proportional to the average

  • density along that ray.

  • So, we have 14 lasers shining through the center of the

  • plasma. We know the average density along

  • 14 lines.

  • And from that, we want to know the density everywhere. So,

  • clearly there's no one unique solution to this problem.

  • And instead we'll have a distribution over possible

  • solutions.

  • So, we do this in a Basian sense, and

  • the final output is a probability

  • density function for the density of the electrons given the

  • measurements.

  • And we can visualize that with a graph where you have a mean

  • and some air bars.

  • How does TensorFlow help with this? Well, so, the first place

  • is translating measurement physics into code. So, let's

  • consider the distribution for the camera measurement.

  • So, the cameras measure photons. And say we have some photons

  • being emitted from the plasma. The mean number of photons

  • reaching

  • the camera is given by a sparse tenser, dense matmul. But we

  • don't realize the mean.

  • Instead what we realize is a noisy mean. There's noise due

  • to a finite number of photons.

  • And we have discreetization noise, we have space. The

  • TensorFlow normal distribution library gives you access, so

  • this noisy flux represents a normal distribution. It has a

  • mean. It has -- you can draw samples.

  • You can compute the PDF and so on.

  • That's not all, though, we also have analog to digital

  • conversion process that we model as passing this normal

  • distribution through a non-linear response curve and

  • digitizing it to 8 bits.

  • So, at the end, this digitized charge is another distribution

  • object that has the ability to take samples. You can compute

  • the probability mass function because it's discreet. And so

  • on.

  • And since we want to be Bayesian, we

  • want to reassemble a number of these distributions giving us a

  • likelihood and

  • a prior and so on with the goal of producing a posterior.

  • And then we do Bayesian inference. To do inference, we

  • do this in two different ways.

  • The first way is variational

  • inference, which amounts to min nic minimizing the loss

  • function.

  • And you can get the true posterior. This is done like

  • any other TensorFlow minimization.

  • For example, we use Adam Optimizer.

  • The second way is using Hamiltonian Monte Carlo.

  • So the TensorFlow probability library

  • gives you a number of Monte Carlo sample

  • hers, and the Ham ill toneon allows you to take samples

  • faster. Notice in both cases, it's autodifferentiation.

  • Whether we're taking afraid why notes

  • for the loss or to do the Hamiltonian Monte Carlo

  • sampling. Popping up a level, you'll notice, we're not doing

  • deep learning.

  • As I said, we're doing an inverse problem, measurements

  • given by physicists are into a reconstruction of some physical

  • state. So, there's a few differences I want to highlight.

  • First of all, there are no labels that are given to us.

  • The natch natural label here would be a three-dimensional

  • image of the actual plasma. But we're the ones who are telling

  • people what the plasma looks like, so, we're the ones

  • actually producing the labels. So, begin that there's no

  • labels, you

  • might be tempted to say, this is an

  • unsupervised learning technique like word clustering. Here

  • there really is a right answer. There really was a plasma out

  • there. And if -- or the plasma doesn't fall within our air

  • bars, we have made a mistake. And also you'll notice that our

  • graph

  • here models physics rather than generic functions.

  • So, it's a bit more constrained on these deep neural networks.

  • But that allows us to get the right answer with no labels.

  • At the end of the day TensorFlow does, despite it not being deep

  • learning, TensorFlow adds value with the TensorFlow

  • distributions and the probability library. We have

  • autodifferentiation to do inference. And in order to

  • provide answers to

  • many measurements at once, GPUs and distributed computing is

  • very important. So, thank you very much.

  • [ Applause ]

  • And next up we have Cory talking about

  • machine learning and

  • genomics.

  • >> Hello, everyone. My name is Cory McLean and I'm an

  • engineering on the genomics team in Google brain. And today I'm

  • excited to tell you about nucleus, which is a library

  • we've

  • released today to make it easy to bring genomics data to

  • TensorFlow.

  • So, genomics is the study of the structure and function of

  • genomes. In every cell in your body, you have two copies of the

  • genome, one from each parent. And this is strings of DNA,

  • which is a four-letter alphabet. And about 3 billion letters in

  • the genome.

  • So, here is a picture of a snapshot on chromosome-1,

  • 150,000 letters. What we can see is there's a number of known

  • things about this area.

  • One, there are functional elements, like the genes

  • depicted in that second row. Biological measurements allow us

  • to analyze what our different things that are active in cells.

  • So, on that third row, we can see the amount of gene

  • expression across different tissue types is quantified

  • there.

  • And at the bottom, through sequencing many

  • people, we can identify places where there's variation across

  • individuals.

  • And there's many different computational algorithmic

  • challenges in developing that image.

  • This ranges from, on the experimental data generation

  • side.

  • Can we better take the output of these physical measurements to

  • get accurate DNA readings. We'll reduce noise in the

  • experiments that quantify this expression.

  • Can we take the DNA sequence and interpret where our functional

  • elements like these genes? Or predict how active are they in

  • different tissue types?

  • And can we identify places where individuals vary compared to our

  • reference?

  • And how is that different in small variants versus say in

  • cancer?

  • And how do those changes influence human traits? So, one

  • thing that is really exciting

  • for us is, there are many opportunities

  • for deep learning in genomics. And a lot of that is driven by

  • the increase in the amount of data available. This graph

  • shows the dramatic reduction in cost to

  • sequence a million bases of DNA over the past decade.

  • But also, there's a lot of structure in these datasets that

  • is often complex

  • and difficult to represent with relatively simple models.

  • But this made us display convolutional

  • structure, so we can use techniques from image

  • classification as well as sequence models. And there have

  • been a number of proven successes of applying deep

  • learning to problems in genomics such as deep variant, which is a

  • tool our group

  • developed to identify small variants using convolutional

  • neural networks. So, our goals in genomics are multi-faceted.

  • One is to make it easy to apply TensorFlow to problems in

  • genomics. And do this by creating libraries to make it

  • easy to work with genomics data.

  • We're also interested in developing tools and pushing the

  • boundaries on some of these scientific questions using

  • those things that we've built.

  • And then want to make all of that

  • publicly available as tools that can be used by the

  • community. So, today, I'll focus on the first part of

  • making it easy to bring genomics data to TensorFlow.

  • So, what is a major problem?

  • One major difficulty is that there are

  • many different types of data that are generated for genomics

  • research.

  • You can see here on the right a subset of different types used.

  • And these different file formats have

  • varying amounts of support and in

  • general no uniform APIs. We have some concerns about

  • efficiency and language support where we would like

  • to be able to express some manipulations

  • in Python, but need some effective ways to efficiently go

  • through this data such that native Python wouldn't make that

  • possible. So, to address these challenges, we

  • developed Nucleus, which is a C++ and Python library for

  • reading and writing

  • genomic data to make it easy to bring to

  • TensorFlow models and then feed through the tf.data API that

  • Derek talked about today

  • for training models for your particular task of interest.

  • And we support the reading of many of the most

  • common data

  • formats in genomics and provide a unified API across the

  • different data types. So, we're able to iterate through

  • the different records of these different types and be able to

  • query on specific

  • regions of the genome to access the data there.

  • The way that we developed this uses

  • protocol buffers under the hood so that we can implement all the

  • general parsing

  • in C++ and then make those available to

  • other languages like Python. And for those of you particular

  • with

  • genomics, we end up using HTS

  • lib which is a conical parser for the high-through put

  • sequencing data formats with the variants and then wrap that to

  • generate the protocol buffers and then use CLIF on top of this

  • to make the data available to Python.

  • And finally, we use some of the TensorFlow core libraries so we

  • can

  • write out this data as T FRecords so they can be ingested

  • by the API. The data types we currently support are the

  • following, raking from general genome

  • annotation to reference genomes and different sequencer reads.

  • Whether they're director off the

  • sequencer or mapped as well as genetic variants.

  • So, to give an example of the reading API, it's quite

  • straightforward. So, this is kind of a toy example, but it is

  • essentially similar to what is used for deep variant where we

  • want to

  • train a model to identify actual genome

  • variations based on maps, sequence reads and a reference

  • genome. So, you have three different data types that we

  • need. We import the different reader types.

  • And then say, in this renal region

  • that we're interested in, we can issue queries to each of the

  • different reader

  • types and then have

  • iterables that we can manipulate and turn into TensorFlow

  • examples. On the writing side, it's similarly straightforward.

  • So, if we have a list of variants for

  • the -- the common vcf format, we'll have an associated header

  • which provides metadata about this, and then open a writer

  • with that header and then just loop

  • through the variants and write them.

  • And note that we support writing to

  • block format which is for the subsequent indexing by other

  • tools.

  • We can directly write to TF records and write the methods to

  • write out charted data which we found helps

  • avoiding certain hot spots in the genome

  • using very a very similar API.

  • Finally, we have been working with the Google Cloud team which

  • has some tools for analyzing variant data. And so, they have

  • developed a tool

  • called Variant Transforms, which allows

  • you to load the variant files to big

  • query using Apache Beam.

  • And you can do queries over that data.

  • We're implementing here to have Nucleus under the hood providing

  • that generation of the variants and to learn more about that

  • tool you can go to the link below.

  • So, to summarize, we have developed Nucleus, which is a

  • C++ and Python

  • library to make it easy to bring genomics data to TensorFlow to

  • train your models of interest for genomic problems.

  • And we have the ability to interoperate

  • with cloud genomics and being integrated into the various

  • transforms at the moment. And this ended up being the

  • foundation

  • of our CNN-based variant caller which is available at the link

  • below. So, with that, I would like to thank you all for your

  • attention today.

  • [ Applause ]

  • Next up we'll have Edd to talk about open source

  • collaborations.

  • . >> Thank you. Hi, everyone.

  • Now, I was going to talk to you about

  • my plans to reanimate dinosaurs with TensorFlow, but I don't

  • want to steal those guy's thunder. Actually, I'm here to

  • talk about open source collaboration in the TensorFlow

  • project. That's my job at Google. To work on growing the

  • participation and the collaboration in the project in

  • the whole community. So, you guys all here and everybody

  • watching on the livestream are a huge part of this already. If

  • you saw a slide like this at the beginning of the day in the

  • keynote, you can see the numbers ticked up. In the five days I

  • have been monitoring this slide, I am increasing the numbers.

  • The amount of participation is staggering. As an open source

  • project, it blows my mind when I came to work on it. And so much

  • of that is due to the participation of everybody here

  • in the community. There are parts of TensorFlow that

  • wouldn't exist, many of them, without that collaboration. For

  • instance, whether it's Spark connectors, whether it's support

  • for particular architectures and accelerators, or maybe certain

  • language bindings. We not only benefit from a huge amount of

  • adoption by being open source, but as the adoption grows, this

  • is the way we sustain and grow the project.

  • You saw this map earlier as well.

  • This is just some of the GitHub stars that gave their locations

  • that we could map.

  • And it was as far north as Norway and as far south as the

  • Antarctic.

  • And what's obvious right now, although there's a large team at

  • Google

  • developing TensorFlow, there are for

  • many more people and in far many more places using it . And the

  • open source projects, more adoption were more demand.

  • There's so much we can do together to grow TensorFlow.

  • Now, you remember that thing where you turn up to a party and

  • everyone is having a good time and they all seem to know what

  • they're doing and why it's such fun.

  • But who am I going to talk to with and what are they looking

  • at? And sometimes a large open source project can be a bit like

  • that. You want to get involved and contributing to TensorFlow,

  • but where do you start?

  • You think this module, -- this feature and something you want

  • to work on. How do you find the right person to talk to? How

  • you learn what we're thinking about the direction for this?

  • We have heard some of those things and we recognize that we

  • want to improve our openness, our transparency and our

  • participation. We're trying to work to make it easier to get

  • involved in TensorFlow. We have already, for instance, refreshed

  • our roadmap which you can find on the TensorFlow website about

  • the general direction of a lot of the code. And we'll do that

  • a lot more regularly.

  • But I want to talk about four initiatives that we have going

  • that will enable us to work together more effectively and

  • faster. The first of these is simple.

  • It's a central community for everyone who is working on and

  • contributing to TensorFlow.

  • GitHub has so much energy going on. There's

  • so much great debate in all the issues. Look in there, in the

  • pull requests,

  • really thoughtful conversation conversations and contributions.

  • Thank you for being part of that.

  • But what we don't have is a central place you can

  • collaborate.

  • We have a mailing list, developers @it feel.org.

  • That's a

  • developer@tensorFlow.org. We can work together as a community

  • to get feedback and coordinate together. Many of the projects

  • have mailing lists that you can find at TensorFlow.

  • org/community.

  • Whether it's TF Lite, or TensorFlow.JS.

  • So, that's collaboration. Now, we talked about the fact there

  • are many use cases outside of Google that the core team don't

  • see. Many more.

  • Many Much more happens outside than inside the core team.

  • So, we want to make it possible for people with shared interests

  • in projects to work together. This is where the beauty of open

  • source comes in.

  • How do we do that? We're setting up a structure for

  • groups to work together. Special interest

  • groups. We have been piloting the first of these for a few

  • months now.

  • This is called SIG build. It's about building, packaging and

  • distributing TensorFlow. Familiar with TensorFlow, you

  • know we built it in a certain way. Guess what?

  • Not every architecture or application finds that the best

  • way for them.

  • For instance, the Linux wants to build against the shared

  • libraries in the distribution. That's not something we do. So,

  • we brought together a bunch of

  • stakeholders over Linux, companies like

  • redhat, IBM, and NVIDIA, Intel, to collaborate in a group to

  • look at the build and make it work for effectively for more

  • people in the future. That's just one group. The pilot.

  • But, we want to pave the cow paths. Where there is energy

  • and people collaborating on a particular thing, that's a great

  • candidate to bring a special interest group together.

  • We're also bringing online a group for TensorBoard where key

  • stakeholders of the TensorBoard ecosystem

  • can work together on collaboration.

  • And the bindings, completely built by the group for

  • TensorFlow. And each will have a different way of working, a

  • different community. But the common thing is, we're going to

  • provide forms, if you have a shared interest in a particular

  • area, we can focus on it.

  • Now, I'd like to talk about the design of TensorFlow. You know,

  • one of the most amazing things and the benefits of TensorFlow

  • is that the code that we release is the

  • code that Google uses on a daily basis. It's kind of remarkable

  • for an open source project. And so, we're really causal about

  • changes. We're really careful about design. We have a

  • commitment, obviously, to the API through the 1.X series

  • release. And we have design reviews internally. So, when

  • things change, we have proposals, we get feedback. But

  • by now you are thinking, well, you just said that so many use

  • cases and so many users are outside of Google. Yet you're

  • having these design reviews inside.

  • So, what we're going to do is open up a public feedback phase

  • to our design process so we can engage much more broadly with

  • every contributor and user about how that might affect their

  • needs and what their opinions are.

  • Keep an eye on the developers at TensorFlow.org mailing list.

  • That's where we'll notify it coming online in the next couple

  • months. My hope is this process will be a way that everybody can

  • communicate and discuss about the future direction of

  • TensorFlow. Whether you're in the core team at

  • Google or in the broader community.

  • So, contributing to TensorFlow isn't just about issues or PR

  • requests. In fact, I would say I reckon that there's so much

  • more -- more energy going into blogging, running meetups, doing

  • presentations, teaching, doing courses. So many universities

  • around the world.

  • And we want to amp up and support the amount of content

  • that educates and highlights TensorFlow. We're really

  • excited already that so many of you do such amazing

  • jobs. We would like to be able to point everybody in the

  • TensorFlow community to the work that you're doing. So, there's

  • a couple things that we launched to support this. The first you

  • probably already heard. We now have a blog for TensorFlow. A

  • blog.tensorFlow.org.

  • One of the things I'm most excited

  • about with this blog is that as well as important announcements

  • and education, we're setting it up from the beginning to involve

  • content from around the web and into the community. That's one

  • of the reason we're using

  • the Medium platform to make it easy to integrate content around

  • the web and give you the credit for the work you have done.

  • So, we would really like to hear from you. If you have a blog

  • post to get into the TensorFlow publication, get in touch.

  • Secondly, and if you're on the livestream watching this, you've

  • kind of found out about this, we have a YouTube channel that's

  • launched today.

  • Now, one of the things I'm most excited

  • about this in this is a show called TensorFlow Meets. We are

  • able to get out into the world

  • of contributors and users and highlight some of the use cases.

  • Highlight the work of everybody. This is a chance for you too.

  • We would love to meet you and chat with you about what you're

  • up to and have you featured on the YouTube channel. Again,

  • reach out to us. We would love you to be a part of it. There

  • is one URL to get involved in all these things that I

  • mentioned, TensorFlow. org/community.

  • So, if anyone's mentioned a mailing

  • list or group to you today, please go to that URL and you

  • will find resources there. It's my hope that TensorFlow is going

  • to continue to be a party, but maybe one you can find yourself

  • part of a lot sooner and have more fun. Please, feel free to

  • reach out to me.

  • There's my people address, ew ewj@google.com. And talk about

  • your experiences collaborating around open source and

  • TensorFlow. I would love to hear about it. Thank you so

  • much. [ Applause ]

  • Now, our next speaker is Chris Lattner. He's going to talk

  • about a first principles approach to machine learning.

  • >> All right. Thank you, Edd.

  • Hi there, everyone. I'm excited to introduce a new project

  • that we have been working on that takes a new project to

  • improve usability of TensorFlow. And we care so much about

  • usability here that we're going all the way back to first

  • principles of the computation that we're performing. But

  • first, why usability?

  • I hope that everyone here agrees that productivity and machine

  • learning is critical. Because it leads to a faster pace of

  • innovation and progress in our field. And, of course, question

  • are just want to build beautiful things for TensorFlow users

  • since that's a big piece of it as well. But if you look at

  • machine learning frameworks, there's two major approaches.

  • The most familiar are the graph

  • building approaches where you explicitly

  • define a graph and execute it to run a computation. It's great

  • for performance, but not always for usability. In define by run

  • approach, eager execution, not always the best

  • performance, but you can use it easier. And both approaches are

  • really about allowing Python to understand the difference

  • between the tensor computation of your code and all the

  • other non-tenser stuff like command line processing and

  • visualization and what you do. I think it's interesting to look

  • at how these actually work. In the case of Eager Execution, you

  • write the model and Python parses it.

  • And then it feeds every statement at a time to the

  • interpreter. If it's a tenser operation, it hands

  • it to TensorFlow and takes care of the tensor applications

  • otherwise Python runs it.

  • The key thing about Eager Execution is they're designed

  • within the constraints of a Python library. With the

  • compiler, what we can do. compiler and language involved,

  • there's a whole other set of approaches that can be applied

  • to solving this problem. That's what we're doing. The

  • cool thing about a compiler, after you parse your code, the

  • compiler can see the entire program and all the tensor ops

  • in it.

  • We're all thing a new stage to the

  • compiler, takes the tensor applications out, and because

  • it's a standard TensorFlow graph, you can get access to all

  • the things that TensorFlow can do, including the devices. You

  • get the power and flexibility of TensorFlow, but you get the

  • usability of your execution as well. But there's a catch.

  • There's always a catch,ing right? The catch here is that

  • we can't do this with Python. At least not with the

  • reliability we expect, because it doesn't support the kind of

  • compiler analysis we need. And what do we mean by that?

  • Well, the compiler has to be able to reason about values.

  • Has to reason about control flow and function calls.

  • Variable listing and thing

  • Variable alassing. And we have come to know the things about

  • Python, including using all the standard Python APIs. I know

  • what you're thinking. Does this mean we're talking about doing a

  • new language? Well, that's definitely an approach to solve

  • the technical requirements we want. With a new language, we

  • can build all the nice things we want into it. But this comes at

  • a cost.

  • It turns out we would be foregoing the benefits of a

  • community. That includes tools and libraries, but also things

  • like books, which some people still use. And even more

  • significantly, this would take years of time to get right. And

  • machine learning just moves too fast. No, we think it's better

  • to use an existing language. But here we have to be careful.

  • Because to do this right, we have to

  • make significant improvements to the compiler and the language

  • and do it in a reasonable amount of time. And so, of course,

  • this brings us to the Swift programming language. Now, I

  • assume that most of you are not very familiar with

  • Swift, so, I'll give you a quick introduction. Swift is designed

  • with a lightweight syntax. It's geared towards being easy to use

  • and learn. Swift draws together best practices from lots of

  • different places, including

  • things like functional programming and generics.

  • Swift builds on LLVM, it has an interpreter and scripting

  • capabilities as well. Swift is great in notebook environments.

  • These are really awesome when you're interactively developing

  • in real-time. Swift is also open source. It's part of lots

  • of platforms. It has a big community of people. But the

  • number one thing that's most

  • important to us is it has a fully open design environment

  • called Swift Evolution.

  • Allows us to propose machine learning

  • and compiler features directorially for integration

  • into Swift. When you bring all of this together, I'm happy to

  • introduce Swift for TensorFlow. Swift for TensorFlow gives you

  • the full performance of Graphs. You can use native language

  • control flow.

  • Has built-ins for automatic differentiation. You can detect

  • errors without running.

  • And full interaction with APIs.

  • I would like to welcome Richard Wei to tell you about it now.

  • >> Thank you, Chris. [ Applause ]

  • I'm thrilled to show you Swift for TensorFlow. Swift is a

  • high-performance, modern programming language. And

  • today, for the very first time, Swift has a full powered

  • TensorFlow built right in.

  • I'm going to walk through three major styles of programming.

  • Scripting, interpreting, and notebooks. So, first let me

  • show you the Swift interpreter.

  • This is a Swift interpreter. When I type some code, swift

  • evaluates it and prints a result. Just like Python.

  • Now, let's import TensorFlow.

  • I can create a tensor from some scale scalers.

  • Now, I can do any TensorFlow operation directly and see the

  • result.

  • Just like I would with Eager Execution.

  • For example, A plus A, or ace matrix product with itself.

  • Of course, loops just work. I can print the result.

  • Now, interpreter is a lot of fun to work with.

  • But I like using TensorFlow in a more interactive environment

  • just like Jupiter notebook. So, let's see how they work.

  • This is a Swift notebook. It shows

  • all the results on the right.

  • So, here's some more interesting code. Fun with functions.

  • So, here I have a sigmoid function inside a loop.

  • Now, as I click on this button, it shows a trace of all values

  • produced by this function over time. Now, as a machine

  • learning developer, I often like to differentiate functions.

  • Now, when I type in -- well, since we were able to improve

  • that programming

  • language, we built first class automatic differentiation right

  • into Swift.

  • Now, when I type in gradient effects, it shows the gradients.

  • Swift computes the gradient

  • automatically and gives me the result.

  • So, here is the gradient in the sigmoid.

  • Now, let's look at some Python code. Let's think about Python.

  • Well, as a machine learning developer, I have been using

  • Python a lot.

  • And I know there are many great Python libraries. Just today my

  • colleague, Dan, sent me

  • a dataset in pickle format.

  • Well, I can directly use Python APIs to load it.

  • All I have to do is just type in "Import

  • Python.

  • " And Swift uses a Python API, Pickle, to be specific, to load

  • the data. In here, you can see the data right in the Swift

  • notebook.

  • Now, -- so, here's a Swift notebook.

  • Now, some people like to run training scripts directly in

  • command line. So, let me show you how to train a

  • simple model from command line.

  • So, here is a simple model.

  • I'm using TensorFlow dataset to load the API.

  • I have the forward pass and the backward pass defined in the

  • training loop. Now, I usually like to work on the go,

  • so, this code has been working on the CPU on my laptop. But

  • when I want to get more performance, what do I do?

  • Well, why don't I just enable cloud TPU?

  • So, all I have to do is add one line

  • to enable TPU execution.

  • When I save this file, open the

  • terminal to run this training script. It's initializing TPU.

  • And the Swift compiler automatically

  • partitions this program into a program and a TensorFlow graph.

  • And TensorFlow is sending this graph to

  • the TensorFlow SLA compiler for TPU execution. Now, it's

  • running. And we're waiting for the TPU to give the result.

  • Look! Loss is going down. All right.

  • So, why don't we simply

  • open TensorBoard and see the training curve?

  • So, now I can see the entire training history in TensorBoard.

  • So, this is looking great!

  • Now, this is Swift for TensorFlow.

  • It's an interactive programming experience with super-computing

  • performance at your fingertips. Back to you, Chris.

  • >> Thanks, Richard. Thanks, Richard.

  • [ Applause ] All right.

  • To recap quickly, Richard showed you that Swift has an

  • interpreter. And it works just like you would expect. Now, I

  • know that it's super-frustrating to be working on a

  • program and two hours into a training run, get a shape error

  • are or type mismatch.

  • Swift is catching it early. We built catching mistakes right

  • into TensorFlow.

  • And you can use APIs and other languages from Swift.

  • And give use full access to any of the python APIs that you love

  • to use. Swift is generateing standard TensorFlow graphs,

  • including control flow. Which give you the full performance of

  • the session API. Of course, graphs are also awesome because

  • they give you access to everything that TensorFlow can

  • do, including devices spanning the range from the tiniest

  • Raspberry Pi all the way up to a TPU super computer.

  • You may wonder, what does this mean? This is an early stage

  • project. But we're looking forward to our open source

  • release next month. And not only are we releasing the code,

  • but technical white papers and

  • documents to explain how it works and moving our design

  • discussions out into the public on the Google group so everyone

  • can participate. We're not done yet. We have basic support for

  • automatic

  • differentiation built right into the compiler and the language.

  • But we want to have exotic cases like recursion and data

  • structures.

  • Compatibility issues are super-frustrating. Especially

  • if you use an op or D-type not supported by your device. Swift

  • has great support for detecting issues like this, and we are

  • looking forward to wiring this into supporting TensorFlow. We

  • are interested in high-level APIs.

  • We have some prototypes now, but we

  • would like to design multiple approaches and experiment and

  • settle on the best one based on real-world experience.

  • This has been a super-quick tour of Swift for TensorFlow.

  • Swift for TensorFlow combines the power and flexibility of

  • TensorFlow with a whole knew standard of usability. We think

  • it's

  • going to take your ability to the roof. It's an early stage

  • project. We would like you to get interested and help us to

  • build this future. Thank you.

  • [ Applause ]

  • >> Hello, everyone. Welcome back. Welcome back. I'm

  • Jeremiah, and this is Andrew.

  • We are here from the TensorFlow Hub team.

  • We are based in Zurich, Switzerland, and we're excited

  • to share TensorFlow Hub today. So, this first slide is actually

  • one that I stole.

  • I took it from a colleague, Noah Fiedel who leads TensorFlow

  • serving.

  • And Noah uses this slide to tell a personal story. It kind of

  • shows the growth of things

  • -- the type of tools that we use to do software engineering.

  • And it shows how they mature over time. He connects this to

  • a similar thing happening, the tools we use to do machine

  • learning. And he draws these connections.

  • We're rediscovering things as we grow our machine learning tools.

  • Things like the machine learning

  • equivalent of source control. Machine learning equivalent of

  • continuous integration.

  • And Noah make this is observation that this is lagging

  • behind the software engineering side by 15-20 years. So, this

  • creates a really interesting opportunity, right? We can look

  • at software engineering. We can look at some of the things that

  • have happened there. And think about what kind of impact

  • they may have on machine learning. Right?

  • So, looking at software engineering, there's something

  • so fundamental, it's almost easy to skip over. That's this idea

  • of sharing code. Shared repositories.

  • On the surface, this makes us immediately more productive. We

  • can search for code, download it, use it. But has really

  • powerful second order effects, right? This changes the way we

  • write code. We refactor our code. We put it in libraries.

  • We share those libraries.

  • And this really makes people even more productive. And it's

  • the same dynamic that we want to create for machine learning with

  • TensorFlow Hub.

  • TensorFlow Hub lets you build, share

  • and use pieces of machine learning. So, why is this

  • important? Well, anyone who has done machine learning from

  • scratch knows you need a lot to do it

  • well. Now media an algorithm. You need data. You need compute

  • power and expertise. And if you're missing any of these,

  • you're out of luck.

  • So, TensorFlow Hub lets you distill

  • all these things down into a reusable

  • package called a module. They can be easily reused.

  • So, you'll notice I'm saying "module" instead of "Model." It

  • turns out that a model is a little bit too big to encourage

  • sharing. If you have a model, you can use that model if you

  • have the exact inputs it wants and you expect the exact outputs

  • it provides. If there's any little difference ises, you're

  • kind of out of luck.

  • So, modules are a small piece, right? If you think of a model,

  • like a

  • binary, think of a module like a library.

  • So, on the inside, a module is actually a saved model.

  • So this lets us package up the algorithm in the form of a

  • graph. Package up the weights, you can do things like

  • initialize, use assets. And our libraries make it very easy to

  • instantiate these in your TensorFlow code. So, you

  • can compose these in interesting ways. This makes things very

  • reusable. You can produce one of these and share it. These

  • are also retrainable. Once you patch it into your bigger

  • program, you can back propagate through it just like normal.

  • And this is really powerful because, if you do happen to

  • have enough data,

  • you can customize the tf.hub module for your own application.

  • And to tell us a little bit more about

  • some of those applications, I'll hand it over to Andrew.

  • >> Thanks Jer reMaya.

  • Let's look at a specific example of using a tf .

  • hub module for image retraining.

  • Say we're going to make an app to classify rabbit breeds from

  • photos.

  • We have a couple hundred examples, not enough to train an

  • entire image classification model from scratch.

  • But we could start from an existing general purpose

  • classification model. Most of the high-performing ones are

  • trained on millions of examples and they can easily classify

  • thousands of categories.

  • So, we want to reuse the architecture and the trained

  • weights of that model without the classification layers, and

  • in that way, we can add our own rabbit classifier on top and we

  • can train it on our own rabbit examples.

  • And keep the re-used weights fix fixed.

  • So, since we're

  • using TensorFlow Hub, our first stop is TensorFlow.org/hub. We

  • can find the list of the newly released, state of the art, and

  • also the well-known image modules. Some of them include

  • the classification layers. And

  • some of them remove them, just providing a feature vector as

  • output. And so, we'll choose one of the feature vector ones

  • for this case.

  • Let's use NASNet, a state of the art image module created by a

  • neural network architecture search. You paste the URL of

  • the module. And TensorFlow Hub downloads the graph and all of

  • its weights and importing it into your model. In that one

  • line, you're ready to use the module like any function. So,

  • here we just provide a batch of inputs and we get back our

  • feature vectors.

  • We add a classification layer on top and output our predictions.

  • But in that one line, you get a huge amount of value. In this

  • particular case, more than

  • 62,000 hours of GPU time went into

  • finding the best architecture for NASNet and training the

  • result.

  • All the expertise, the testing, the research that the author put

  • into that, that's all built into the module. Plus that module

  • can be fine-tuned with your model. So, if you have enough

  • examples, you can potentially get better

  • performance if you use a low-learning rate, if you set

  • the trainable parameter to true, and

  • if you use the training version of the graph.

  • So, NASNet is available in a large size as well as a mobile

  • size module. And then there's also the new progressive NASNet.

  • And then a number of new MobileNet modules for doing

  • on-device image classification, as well as some industry

  • standard ones like Inception and ResNet. That list is at

  • TensorFlow.org/hub. All those modules are pre-trained using

  • the TFslim check points and ready to be used for

  • classification or as feature vector inputs to your own model.

  • Okay. Let's look at another example. In this case, doing a

  • little bit of text classification.

  • So, we'd like to know whether a

  • restaurant review is a positive or negative sentiment.

  • So, as Jeremiah mentioned, one of the

  • great things about TensorFlow Hub is it packages the graph

  • with the data. For our modules, that means that all

  • Doing things like normalizing and tokennizing operations.

  • So, we can use a pre-trained sentence embedding module to map

  • a full sentence to an embedding vector. So, if we want to

  • classify some restaurant reviews, then we just take one

  • of those sentence embedding modules, we add our own

  • classification layer on top. And then we train with our

  • reviews. We keep the sentence modules weight's fix weights

  • fixed.

  • And just like for the image modules, TensorFlow.org/hub

  • lists a number of different text modules.

  • We have neural network language models

  • that are trained for English, Japanese,

  • Spanish, we haveword2vec trained on Wikipedia.

  • And EL MO that looks how words are used across context. And

  • something really new, today, you may have seen a new paper this

  • morning from the team, this is the universal sentence encoder.

  • It's a sentence-level training module and enables a variety of

  • tasks, in other words, universal.

  • Some of the things it's good for,

  • semantic similarity, custom text classification, clustering and

  • semantic seven.

  • search. But the best thing is how little training is required

  • to adapt to your problem. That sounds great in our particular

  • case. Let's try it on the restaurant review task.

  • So, we just paste that URL from the

  • paper, and like before, TensorFlow Hub downloads the

  • module and inserts it into your graph.

  • But this time we're using the text

  • embedding column to feed into a classifier. And this module can

  • be fine-tuned with your model by setting trainable to true.

  • Of course, you have to lower the learning rate so that you don't

  • ruin the existing weights that are in there, but it's something

  • worth exploring if you have enough data. Now, let's take a

  • closer look at that URL. As Jeremiah mentioned, a module is

  • a program. So, make sure what you're executing is from a

  • location that you trust.

  • In this case, the module is from tf.hub.dev.

  • That's our new news for Google-provided modules like

  • NASNet and the encoder. We would like to make a place where

  • you can publish the modules that you create. In this case,

  • Google is the publisher. And universal sentencing encoder is

  • the name of the module.

  • And finally, the version number is 1.

  • So, TensorFlow Hub considers modules to be immutable. That

  • way you don't have to worry about the weights changing

  • between training sessions.

  • So, that module URL, and all of the module URLs on tf.hub.dev

  • include a version number.

  • And you can take that URL and paste it into your browser and

  • see the complete documentation for any module that's hosted on

  • tf.hub .Dev. Here's the particular one for the universal

  • sentence encoder. And then we also have modules for other

  • domains besides text classification and imagery

  • training, like a generative image module that

  • contains a progressive GAN that was trained on celeb A.

  • And another model based on deep local

  • features network that can identify landmark images.

  • Both have great co-lab notebooks on TensorFlow.org/hub. The

  • images here were created from them. And we're adding more

  • modules over time for tasks like audio and video over the next

  • few months. But most importantly, we're really

  • excited to see what you build with TensorFlow Hub.

  • Use the hashtag t tfhub, and visit TensorFlow.

  • org/hub for examples of tutorials, interactive notebooks

  • and code labs and

  • our new discussion mailing list.

  • For everyone from our team in Zurich, I want to thank you so

  • much.

  • [ Applause ] Okay.

  • Next up, Clemens and Raz will tell you about TensorFlow

  • Extended. >> Thank you. Hello, everyone.

  • First, thanks, everyone, for coming to the TensorFlow Dev

  • Summit. And second, thanks for staying around this long. I

  • know it's been a long day and there's a lot of information

  • that we have been throwing at you.

  • But we have much, much more and many more announce wants.

  • Stick with me. My name is Clemenss, and this is Ras.

  • And we are going to talk about TensorFlow Extended today. I'm

  • going to do a quick survey. How many of you do machine learning

  • in a research or academic setting? Okay. Quite a big

  • number. And how many much you do machine learning in a

  • production setting? Okay. That looks about half/half.

  • Obviously, also, a lot of overlap. For those who do

  • machine learning in a production setting, how many of you agree

  • with this statement? Yeah? Some? Okay. I see a lot of

  • hands coming up. So, everyone that I speak with who is doing

  • machine learning in production agrees with this statement.

  • Doing machine learning in production is hard. And it's

  • too hard.

  • Because after all, we want to

  • democratize machine learning and allow people to deploy machine

  • learning in their products. One of the reasons it's still hard

  • is

  • that in addition to the actual machine learning, the small,

  • orange box where you use TensorFlow, you may use Keras to

  • put the layers in the model, you need to worry about so much

  • more. There's all of these other things you

  • need to learn about to actually deploy machine learning in a

  • production setting and serve it in your product.

  • This is exact my what TensorFlow Extended is about. It's a

  • Google machine learning platform that allows the users to go from

  • data to a production serving machine learning models as fast

  • as possible.

  • Now, before we introduced TFX, we saw that going through this

  • process of writing some of these components, some

  • of them didn't exist before, glueing them together and

  • actually getting to a launch took anywhere between six and

  • nine months, sometimes even a year.

  • Once we deployed TFX and allowed developers to use it, most can

  • use it and get up and running in a day and get to a deployable

  • model in production in a matter of weeks or just a month.

  • Now, TFX is a very large system and

  • platform that consists consists of of a lot of components and

  • services, so I can't talk about it all in the next 25 minutes.

  • We can only cover a small part. But talking about the things we

  • have open sourced and made available to you. First, we're

  • going to talk about TensorFlow transform. And how to apply

  • transformations on your data consistently between training

  • and serving. Next, Ras is going to introduce you to

  • a new product that we are open sourcing called TensorFlow model

  • analysis. We're going to give a demo of how all of this works

  • together end-to-end. And then we can make a broad announcement

  • for our plans for TensorFlow Extended and sharing it with the

  • community. So, let's jump into TensorFlow transform first. A

  • typical ML pipeline that you may see in the wild is during

  • training you do a -- you have a distributed data pipeline that

  • applies transformations to your data. Because usually you train

  • on a large amount of data. This needs to be distributed.

  • And you run that pipeline and sometimes materialize the output

  • before you actually put it into your trainer.

  • Now, at serving time, you need to somehow replay those exact

  • transformations online.

  • As a new request comes in, it needs to be sent to your model.

  • Now, there's a couple of challenges with this. The first

  • one is, usually those two things are very different code paths,

  • right? The data distribution systems that you would use for

  • batch processing are very different from the libraries and

  • tools that you would use to, in real-time, transform data to

  • make a request to your are model, right?

  • So, now, you have two different code paths.

  • Second, in many cases, it's very hard to keep those two in sync.

  • And I'm sure a lot of you have seen this. You

  • change your batch processing pipeline and add a new feature

  • or change how it behaves. And you need to make sure that the

  • code in the production system is changed at the same time and

  • kept in sync. And the third problem is, sometimes

  • you actually want to deploy your TensorFlow machine learning

  • model in many different environments. So, you want to

  • deploy it on a mobile device, on a server, maybe you want to put

  • it on a car. Now suddenly you have three different

  • environments where you want to apply these transformations, and

  • maybe different languages for those and it's very hard to keep

  • those in sync. And this introduces something that we

  • call training serveing skew, the transformations at training time

  • might

  • be different than serving time, which

  • leads to bad quality of your serving model.

  • TensorFlow transform addresses this by helping you write your

  • data processing log at training time. Help you create those

  • data pipelines

  • And at the same time, it emits a TensorFlow graph that can be in

  • line with your training and serving model.

  • Now, what this does is, it actually hermetically seals the

  • model.

  • And your model takes a raw data request as input. And all of

  • the transformations are actually happening within the TensorFlow

  • graph. Now, this is a lot of advantages. And one of them is

  • that you no longer

  • have any code in your serving environment that does these

  • transformations, because they're all being done in the TensorFlow

  • graph. Another one is wherever you deploy this TensorFlow

  • model, all of those transformations are applied in a

  • consistent way, no matter where this graph is being evaluated.

  • Let's see how that looks. This is a code snippet of a

  • pre-processing function that you would write with tf.transform.

  • I'm going to walk you through what happens here and what we

  • need to do for this. So, the first thing we do is normalize

  • this feature. And as all of you know, in order to

  • normalize a feature, we need to compute the mean and the

  • standard deviation.

  • And to apply this transformation, we need to

  • divide by the standard deviation. For the input

  • feature X, we have to compute the statistics.

  • Which is a trivial task if the data fits into a single machine.

  • You can do it easily.

  • It's a non-trivial task if you have a gigantic training dataset

  • and you have to compute these metrics effectively. Once we

  • have these metrics, we can apply this transformation to the

  • feature. This is to show you that the output of

  • this transformation can then be, again,

  • multiplied with another tensor, which is a regular TensorFlow

  • transformation. And in order to bucketize a feature, you also,

  • again, need to compute the bucket boundaries to actually

  • apply this transformation. And, again, this is a distributed

  • data job to compute those metrics for the result of an

  • already-transformed feature. This is another benefit. To

  • then actually apply this transformation

  • transformation. The next examples show you in the same

  • function you can apply any other

  • tensor and tensor route function. And there's also some

  • of what we call

  • "Mappers" itpers" in tf . transform that we don't have to

  • run a data pipeline to compute anything.

  • Now, what happens is the orange boxes are what we call

  • analyzers. We realize those as actual data pipelines to compute

  • over your data.

  • They're implemented using Apache Beam. We can talk about this

  • more later. But this allows us to actually run this distributed

  • data pipeline in different environments. There's different

  • runners for Apache Beam. And all of the transforms are just

  • single instance-to-instance transformations using pure

  • TensorFlow code. And what happens when you run

  • TensorFlow transform is that we actually

  • run these analyze phases, compute the results of the

  • analyze phases, and inject the result as a constant in the

  • TensorFlow graph. So, this is on the right. And in this graph

  • is a hermetic TensorFlow graph that applies all of the

  • transformations. And it can be inlined in your serving graph.

  • Now your serving graph has the transform graph as part of it

  • and can play through these transforms wherever you want to

  • deploy this TensorFlow model. So, what can be done with

  • TensorFlow Transform? At training time, for the batch

  • processing, really anything that you can do with a distributed

  • data pipeline. So, there's a lot of flexibility here, what

  • types of statistics you can compute. We provide a lot of

  • utility functions for you.

  • But you can also write custom data pipelines. And at serving

  • time, because we generate a TensorFlow graph that applies

  • these transformations, we have limited to what you can do with

  • the TensorFlow graph. But for all of you in TensorFlow,

  • there's a lot of flexibility in there as well.

  • And so anything that you can do in a

  • TensorFlow graph you can do with tf.transformations. So, some

  • of the common use cases, the ones

  • on the left I just spoke about.

  • You can scale and continue value to the core. A value between

  • zero and one.

  • You can bucketize a continuous value. If you have text, you

  • can apply a bag of words or N-grams. Or for future crosses,

  • you can cross

  • those strings and then generate the result of those crosses.

  • And as mentioned before, the transformer is extremely

  • powerful in being able to chain look at these transforms. You

  • can apply transform on the result of a transform as well.

  • And another particularly interesting transform is

  • applying another TensorFlow model. You have heard about the

  • safe model before. If you have a safe model that you can apply

  • as a transformation, you can use this it f.transform.

  • You want to apply an Inception model and combine it with

  • another feature or

  • use it as an input feature to the model. You can use any

  • TensorFlow model

  • that's in line in had your graph and in your serving graph. So,

  • all of this is available today. And you can go check it out on

  • GitHub at TensorFlow/transform. I'm going to hand it over to Ras

  • who is going to talk about TensorFlow model analysis.

  • >> All right, thanks, Clemens. Hi, everyone.

  • I'm really excited to talk about TensorFlow model analysis today.

  • We're going to talk little bit about metrics. Let's see. Next

  • slide. All right. So, we can already get metrics today,

  • right? We use TensorBoard. TensorBoard's awesome.

  • You saw an earlier presentation today boutons. It's a great

  • tool. While you're training you can watch your metrics, right?

  • And if you're training isn't going well, you can save

  • yourself a couple of hours of your life, right? Terminate the

  • training, fix some things. But let's say you have a trained

  • model already. Are we done with metrics? Is that it? Is there

  • anymore to be said about metrics when we're done training well?

  • Of course there is. We want to know how well our trained model

  • actually does for our target population. Right? And I would

  • argue that we want to do this in a distributed fashion over the

  • entire dataset. Now, why wouldn't we just sample? Why

  • wouldn't we just save more hours of hour lives, right?

  • Just sample, make things fast and easy, right? Start with a

  • large dataset. You're going to slice that dataset. I'm going

  • to look at people at noon. Noontime, right? That's a

  • feature. From Chicago. My hometown. Okay? Running on

  • this particular device.

  • Well, each of these slices reduced the

  • size of your evaluation dataset by a factor, right? This is an

  • exponential decline. By the time you're looking at the

  • experience for a particular, you know, set of users, you're not

  • left with very much data, right? And the error bars on your

  • performance measures, they're huge. I mean, how do you know

  • that the noise doesn't exceed your signal by that point,

  • right? So, really, you want to start with your larger dataset

  • before you start sliceing. All right. So, let's talk about a

  • particular metric. I'm not sure if, you know, who has heard of

  • the ROC Curve?

  • It's kind of an unknown thing in machine learning these days.

  • Okay. So, we have our ROC Curve. And I'm going to talk

  • about a concept that you may or may not be familiar with, which

  • is ML fairness. Okay.

  • What is father fairness?

  • Fairness is a complicated topic.

  • Fairness is basically how well does our machine learning model

  • do for

  • different segments of our population?

  • You don't have one ROC Curve, you have

  • an ROC Curve for everybody segment and group of users.

  • Who here would run their business based on their top line

  • metrics? No one, right? That's crazy.

  • You have to slice your metric metric.

  • You have to dive in and find out how things are going.

  • That lucky user, black curve on the top, great experience.

  • The unlucky user, the blue curve, not such a great

  • experience.

  • So, when can our models be unfair to various users, okay?

  • Well, one instance is if you simply don't have a lot of data

  • from where which to draw your inferences, right?

  • So, we use tochiastic optimizers, and if we retrain

  • the model, it does something slightly

  • different every time. And you get high variance for some users

  • because you don't have a lot of data there. We have been

  • incorporating data from a lot of data sources. Some data sources

  • are more biased than others. Some users get the sort end of

  • the deal. Other users get the ideal experience. Our labels

  • could be wrong, right? All of these things can happen. And

  • here's TensorFlow model analysis. You're looking here

  • at the UI hosted within a Jupiter notebook. On the X axis

  • we have our loss. And you can see there's some natural

  • variance in the metrics, you know? And we're not always

  • going to get spot on the same precision and recall for every

  • segment of population. But sometimes you'll see, you know,

  • what about those guys at the top there? Experiencing the highest

  • amount of loss. Do they have something in common? We want to

  • know this.

  • Sometimes our -- our users that are our most -- that get the

  • poorest experience, they're sometimes our most vocal users,

  • right? We all know this.

  • I'd like to invite you to come visit ml-fairness.com. There's

  • a deep literature about the

  • mathematical side of ML Fairness.

  • Once you have figureed out thousand measure fairness.

  • How does TensorFlow model analysis give you the sliced

  • metrics?

  • How do you go about getting the metrics? Today, a saved model

  • for serving. That's a familiar thing.

  • TensorFlow model analysis is simple -- is similar.

  • You export a saved model for evaluation. Why are these

  • models different? Why export two?

  • Well, the eval graph that we serialized as a save model has

  • some additional annotations that allow our evaluation batch job

  • to find the features, to find the prediction, to find the

  • label. And we don't want those things, you know, mixed in with

  • our serving graph. So, we export a second one. So, this

  • is the GitHub. We just opened it I think last night at 4:30

  • p.m. Check it out. We have been using it internally for

  • quite some time now. Now it's available externally as well.

  • The GitHub has an example that kind of puts it all together.

  • So, that you can try all these components that we're talking

  • about from your local machine. You don't have to get an account

  • anywhere. You just get cloned and run the scripts and run the

  • code lab. This is the Chicago taxi example.

  • So, we're using public data from -- publicly-had available data

  • to determine

  • which riders will tip their driver and

  • which riders, shall we say, don't have

  • enough money to tiptoed tip today, right? What does

  • fairness mean in this context?

  • So, our model is going to make some predictions. We may want

  • to slice these predictions by time of day. During rush hour,

  • we're going to have a lot of data. So, hopefully, our model

  • is going to be fair if that data is not biased. At the very

  • least, it's not going to have a lot of variance. But how is it

  • going to do at 4 a.m. in the morning? Maybe not so well.

  • How is it going to do when the bars

  • close? An interesting question. I don't know yet. But I

  • challenge you to find out. All right. So, this is what you can

  • run using your local scripts. We start with our raw data. We

  • run the tf.transform. The tf.

  • transform emits a transform function and our transform

  • examples. We train our model.

  • Our model, again, emits two saved

  • models, one for serving and one for eval. You can try this

  • locally. Run scripts and play with this stuff.

  • Clemens talked a little bit about transform.

  • We want to take our dense features and scale them to a

  • particular Z-score. We don't want to do that batch by batch.

  • The mean for each batch is going to differ, may be fluctuations.

  • We want to normalize these things

  • across the entire dataset. We've built a vocabulary, we

  • bucket for the wide part of our model, and we emit our transform

  • function, and into the trainer we go, right?

  • You heard earlier today about tf estimators. And here is a wide

  • and deep estimator that takes our transform

  • features and emits two save models. And now we're in

  • TensorFlow model analysis which reads in the save model. And

  • runs it against all of the raw data. We call render slicing

  • metrics from the Jupiter notebook. And you see the UI.

  • And the thing to notice here is that

  • this UI is immersive, right? It's not just a static picture

  • that you can look at and go, huh, and walk away from. It

  • lets you see your errors broken down by bucket or broken down by

  • feature. And it lets you drill in and ask

  • questions and be curious about how your models are actually

  • treating various subsets of your population.

  • Those subsets may be the lucrative subsets. Right? You

  • really want to drill in. And then you want to serve your

  • model.

  • So, our demo, our example, has a

  • one-liner here that you can run to serve your model.

  • And make a client request.

  • The thing to notice here is we're mic making a GRPC request

  • to that server.

  • We're takeing our feature tensors,

  • sending them to the server, and back

  • comes probability, right? That's not quite enough. We

  • have heard a little bit of feedback about this server. And

  • the thing we have heard is that

  • GRPC is cool, but Rust is really cool.

  • . But REST is really cool. I tried. This is one of the top

  • feature

  • requests on GitHub for model serveing.

  • You can now pack your tensors into a JSON object. Send that

  • JSON object to the server

  • and get a response back via HTTP. Much more convenient.

  • And I'm very excited to say that it will be released very soon.

  • Very soon.

  • I see the excitement out there. Back to the end-to-end. So,

  • yeah.

  • So, you can try all of these pieces, end-to-end, all on your

  • local machine.

  • Because they're using Apache Beam director rinser. And

  • direct runners allow you to take your distributed job and run

  • them all locally.

  • If you swap in Apache Beam's Dataflow runner, you can run

  • against the entire dataset in the cloud. The example also

  • shows you how to run the big job against the cloud version as

  • well.

  • We're currently working with the

  • community to develop a runner for Apache Flink, a runner for

  • Spark. Stay tuned to the TensorFlow blog and to our

  • GitHub.

  • And you can find the example at TensorFlow/model-analysis. And

  • back to Clemens. >> Thank you, Ras.

  • [ Applause ]

  • All right.

  • We have heard about transform and how to use model analysis

  • and how to serve them. You say you want more. Is that enough?

  • You want more? All right. You want more.

  • And I can think of why you want more.

  • Maybe you read the paper we published

  • and presented last year about TensorFlow expended. We laid

  • out a broad vision of how this platform works within Google and

  • the

  • features and impact we have using it. And that figure one,

  • which allows these boxes and described what

  • TensorFlow extrovert Extended.

  • It's overly simplified, but much more than today. We spoke about

  • these four components of TensorFlow Extended.

  • This is not yet an end-to-end machine learning platform.

  • This is just a very small piece.

  • These are the libraryies we have open sourced for you to use, but

  • we haven't yet released the entire platform. We are working

  • hard on this. We have seen the profound impact it had

  • internally. How people could start using this platform and

  • deploying machine learning in production

  • using TFX. And we are working hard to make more

  • of those come don'ts components available to you. And we are

  • looking at the data components and you can analyze the data,

  • visualize the distributions and detect anomalies. That's an

  • important part of any machine learning pipeline. To detect

  • changes and shifts in your data and anomalies.

  • After this, looking at the horizontal

  • pieces that help tie all of these components together. If

  • they're all single libraries, you have to get them together.

  • You have to use them individually. They have

  • well-defined interfaces, but you have to combine them by

  • yourself. Internally, we have the shared configuration

  • framework that allows you to configure the entire pipeline.

  • And the nice thing on the front end, allows you to monitor the

  • status of these pipelines and see progress and inspect the

  • different artifacts that have been produced by all of the

  • components. So, this is something we're also looking

  • to release later this year. And I think you get the idea.

  • Eventual we want to make all of this available to the community.

  • Because internally, hundreds of teams use this to improve our

  • products.

  • We really believe that this will be as transformative to the

  • community as it is at Google.

  • And we're working very hard to release

  • these technologies in the platform to see what you can do

  • with them in your products and companies.

  • Keep watching the TensorFlow blogs for our future plans.

  • And as mentioned, you can use some

  • today, transform is released, model-analysis yesterday,

  • serving is released.

  • And the end to end example is available under this link. You

  • can find it under the model analysis repo. Thank you from

  • myself and Ras. I'm going to ask you to join me in welcoming

  • a special external guest, Patrick Brandt, joining us from

  • Coca-Cola who is going to talk about applied AI at Coca-Cola.

  • Thank you. [ Applause ]

  • >> Hey, great job. Thanks, Clemens. All right. So, yes,

  • I'm Patrick.

  • I'm a solutions strategyist for Coca-Cola. I'm going to share

  • with you how we're using TensorFlow to support some of

  • our

  • large largest and most popular digital marketing programs in

  • North America. We are going off on a marketing tangent before we

  • come back. As background, what is proof of purchase and the

  • relationship to marketing? As an example, back in the day,

  • folks could clip the bar codes off their cereal boxes and mail

  • the bar codes back into the cereal company to receive a

  • reward. Some kind of coupon or prize back through the mail.

  • And this is basic loyalty marketing. Brands, in this

  • case, the cereal company, rewarding consumers who

  • purchase. And at the same time, opening up a line of

  • communication between the brand and the consumer.

  • Now, over the last 15-some odd years

  • of marketing digitization, this concept has evolved into digital

  • engagement marketing.

  • Consumers in the moment in real-time

  • through web and mobile channels. But proof of purchase is still

  • an important component of the experience.

  • We have an active digital marketing

  • program at Coca-Cola.

  • They can earn a magazine sub

  • corruption or the chance to subscription

  • or a chance to win.

  • We printed these 14-character product Pincodes.

  • And these are what our consumers enter into our promotions. You

  • can enter them in by hand. But on your mobile device, you can

  • scan them. This had been the holy grail of

  • marketing IT at Coke for a long time. We looked at commercial

  • and open source optical character recognition

  • software, OCR, but it could never read these codes very

  • well. And the problem has to do with the code.

  • These are 4x7, dot-matrix-printed.

  • The printer head is an inch under the cap, and they are

  • flying under the printer at a rapid rate.

  • Creates visual artifacts, things that normal OCR can't handle

  • well. We knew if we wanted to unlock this experience, we were

  • going to have to build something from scratch. When I look at

  • these codes, a couple of characteristics jump out at me.

  • We're using a small alphabet. Let's say ten characters.

  • And there's a decedent amount of variability in the presentation

  • of these characters.

  • This reminds me of MNIST, the online database of 60,000

  • handwritten digital images.

  • And convolutional neural networks,

  • extremely good at extracting the text.

  • I'm probably going to tell you something you know, they work by

  • breaking it down into smaller pieces and looking for edges and

  • textures and colors.

  • And these very granular feature activations are pooled up,

  • often, into a more general feature layer. And then that's

  • filtered. And those activations are pulled up and so on until

  • the output of the neural net is run through a function which

  • creates a probability distribution of the likelihood

  • that a set of objects

  • exists within the image.

  • But they had a nice property, handleing the nature of images

  • well. From our perspective, they can handle

  • the tilt and twist of a bottle cap held in someone's hand.

  • It's perfect. So, this is what we're going to use. We're going

  • to move forward.

  • Now we need to build our platform. That begins with

  • training. The beating heart of any applied AI solution. And we

  • knew we needed high quality images with accurate labels of

  • the codes

  • and we likely needed a lot of them.

  • We started with a synthetic dataset of

  • random strings over blank images and superimposed over blank

  • backgrounds. This was a base for transfer learning once we

  • created our real-world data set.

  • We did a production run of caps and fridge packs and

  • distributing those to

  • multiple third-party displayers along

  • with custom tools to scan the cap and label it with the pin

  • Code. But an important component was an existing

  • pincode validation service we have had in production for a

  • long time to support our programs. So, any time a

  • trainer labeled an image, we would send that label through

  • our validation service, if it

  • was a valid pincode, we knew we had an accurate label. This

  • gets the model trained and now we need to release it to the

  • wild. We had some aggressive performance requirements.

  • We wanted 1 second average processing time. 95% accuracy

  • at launch. And host the model remotely, for the web, and embed

  • it natively on mobile devices to support mobile apps. So, this

  • means that our model has to be small.

  • Small enough to support over the air updates as the model

  • improves over time. And to help us improve that model over time,

  • we created an active learning UI, a user interface, that

  • allows our consumers to train the model once it's in

  • production. And that's what this looks like.

  • So, if I was a consumer, scan a cap,

  • and the model cannot infer a valid pincode, it sends a

  • per-character

  • confidence of every character at every position. It can render a

  • screen like what you see here.

  • I, as a user, am only directed to address the particularly

  • low-confidence characters. I see the

  • characters, tap the red ones, bring up the keyboard, I tap

  • them, entered into the promotion. It's a good user

  • experience for me.

  • I scan a code and I'm only a few taps

  • away from being entered into a promotion. But we have

  • extremely valuable data for training. Because we have the

  • image that created

  • the invalid difference as well as the user-corrected label they

  • needed to correct to get into the promotion. We can throw

  • this into the hopper for future rounds of training to improve

  • the model. All right. When you put it all together, this is

  • what it looks like.

  • User takes a picture of a cap, the

  • image is normalized, then sent into our confident model. The

  • output of which is a character probability matrix.

  • This is the per-character competence of every character at

  • every position. That is further analyzed to create a top ten

  • prediction. Each one of those are fed into our pincode

  • validation service. The first one that's valid, often the

  • first one on the list, is entered into the promotion.

  • And if none are valid, our user sees the active learning

  • experience. So, our model development effort went through

  • three big iterations. In an effort to keep the model size

  • small up front, the data team used

  • binary

  • binary normalization and it didn't

  • produce enough data to get the model. They switched and the

  • model size was

  • too large to support over the air updates. They start over.

  • They just completely rearchitect the net using SqueezeNet,

  • designed to reduce the size by reducing the number of learnable

  • parameters within the model.

  • After making this move, we had a problem.

  • We started to experience internal co-variant shift, the

  • result of reducing the number of learnable parameters.

  • That means that very small changes to upstream parameter

  • values cascaded to

  • huge gyrations in downstream parameter values. This slowed

  • our training process.

  • We had to grind through this co-variant shift to get the

  • model to

  • converge, if it would converge at all. We introduced batch

  • normalization, which sped up training. It got the model to

  • converge. And now we're exactly where we want to be.

  • We have a model, a 25 fold decrease where we started with

  • accuracy greater than 95%. And the results are impressive.

  • These are some screen grabs from a test site that I built. And

  • you can see across the top row how the model handled different

  • types of occlusion. It handles translation, tilting the cap,

  • rotation, twisting the cap. And camera focus issues. So, you

  • can try this out for yourself.

  • I'm going to pitch the newly-launched Coca-Cola USA

  • app.

  • It hit Android and iPhone app stores a couple days ago. It

  • does many things, and you can scan a code. You can go on with

  • the mobile browser, take a picture of a cap code to be

  • entered into a promotion. Quick shoutouts.

  • I can't not mention these folks.

  • Quantiphi built our model.

  • Ellen Duncan, spearheaded this from the

  • marketing side.

  • And my people in IT, my colleague,

  • Andy Donaldson, shepherded this into production. Thank you.

  • It's been a privilege to speak with you.

  • I covered a lot of ground in ten short minutes.

  • There's a lot of stuff I didn't talk about. Please feel free to

  • reach out on

  • Twitter,@patrickbrandt. And wpb.

  • is wpb.is/linkedin. You can read an article I published

  • last year on this solution, on the Google developers

  • >> AUDIENCE: .

  • And you can get there at wpb. wpb.is/tensorFlow. Thank you.

  • Next up is Alex.

  • And Alex is going to talk to us about applied ML with robotics.

  • [ Applause ]

  • >> All right. Hi, everybody. I'm Alex from the brain robotics

  • team. And in this presentation I'll be

  • talking about how we use simulation and adaptation in

  • some of our real-world robot learning problems.

  • So, first, let me start by introducing robot learning. The

  • goal of robot learning is to use

  • machine learning to learn robotic skills that work in

  • general environments. What we have seen so far is that if you

  • control your environment a lot, you can get robots to do pretty

  • impressive things.

  • And where the techniques break down

  • is when you try to apply these same techniques to more general

  • environments. And the thinking is if you use machine learning,

  • this can help learn from the environment and address the

  • generalization issues. So, as a step in this direction, we have

  • been looking at the problem of robotic grasping. This is a

  • project we have been working on in collaboration with people at

  • X.

  • And to explain the product a bit, we have the real robot arm

  • learning to pick up objects out of a bin. There

  • is going to be a camera looking down over the shoulder of the

  • arm into the bin.

  • And from this RGB image, we're going to train a neural network

  • to learn what commands it should send to the robot to

  • successfully pick up objects. Now, we to want try to solve

  • this task using as few assumptions as possible. So,

  • importantly, we're not going to

  • give any information about the geometry

  • of what kinds of objects we're going pick up. And no

  • information about the depth of the scene. So, in order to

  • solve the task, the

  • model needs to learn hand-eye coordination or see where it is

  • in the camera image and then figure out where in the seam it

  • is and then combine these two to figure out how it can move

  • around.

  • Now, in order to train this model, we're going to need a lot

  • of data because it's a pretty large-scale image model. And

  • our solution at the time for this was to simply use more

  • robots.

  • So this is what we called the arm farm. These are six robots

  • collecting data in parallel. And if you have six robots, you

  • can collect data a lot faster than if you only have one robot.

  • So, using these robots, we were able to collect over a million

  • attempted

  • grasps over a million total of robot hours. And we were able

  • to successfully train models to pick up objects.

  • Now, this works, but it still took a lot of time to collect

  • this dataset. This motivated looking into ways to reduce the

  • amount of real-world data needed to learn these behaviors. One

  • approach for doing this is simulation. So, in the left

  • video here, you can see the images that are going into our

  • model in our real-world setup.

  • And on the right here you can see our simulated recreation of

  • that setup. Now, the advantage of moving things into

  • simulation, is that simulated robots are easier to scale. We

  • have been able to spin up thousands

  • of simulateed robots grasping various objects. And we were

  • able to collect millions of grasps in just over eight hours

  • instead of the weeks that were required for the original

  • dataset. Now, this is good for getting a lot of data, but

  • unfortunately, models trained in simulation tend not to transfer

  • to the actual real-world robot. There are a lot of systemic

  • differences between the two.

  • One big one is the visual appearancens of different

  • things.

  • And another big one is just physical

  • differences between our real-world physics and our

  • simulated physics. What we did was we were able to train a

  • model in simulation to get to around 90% grasp success. We

  • then deployed to the real robot and it succeeds just over 20% of

  • the time, which is a very big performance drop. So, in order

  • to get good performance, you need to do something a bit more

  • clever.

  • This motivated looking into similar-to-real transfer, using

  • simulate the data to improve your real-world sample

  • efficiency. Now, there are a few different ways you can do

  • this.

  • One way is adding randomization into the simulator. You can

  • change around the textures you apply to

  • objects, changing their colors, changing how lighting is

  • interacting with your scene. And you can play around with

  • changing the geometry of what kinds of objects you're trying

  • to pick up. Another way of doing this is the main

  • adaptation.

  • Which is a set of techniques for

  • learning when rough you have two domains of data that have

  • structure but are different.

  • The two domains are the simulated and real robot data.

  • And there are feature-level and pixel-level ways of doing this.

  • In this work, we tried all of these approaches. And I'm going

  • to focus primary on the domain adaptation side of things.

  • So, in feature-level domain adaptation, what we're going to

  • do is take our simulated data, take our real data, train the

  • same model on both datasets.

  • But then

  • at an intermediate feature layer, a similarity loss.

  • That's going to make the distribution of features to be

  • the same across both domains.

  • One approach for doing this is domain adversarial networks.

  • This is implemented as a small neural net that tries to predict

  • the domain based on the interview questions put

  • features, and the rest of the model is trying to confuse the

  • domain classifier as much as possible.

  • Now pixel methods look at it from a different point of view.

  • Instead of features, we're going to

  • transform at the pixel level to look for realistic.

  • We take a general adversarial network. We feed it an image

  • from our simulator. It's going to output an image that looks

  • more realistic.

  • And then we're going to use the generator to train whatever task

  • model that we want to train. Now, we're going to train both

  • the generator and the task model at the same time. We found that

  • in practice this was useful because it helps ground the

  • generator output to be useful for training your downstream

  • task. All right. So, taking a step back.

  • Feature-level methods can learn

  • domain-invariant features when you

  • have data from related domains that aren't identical.

  • Meanwhile, pixel-level methods can transform your data to look

  • more like your real-world data, but in practice, they don't work

  • perfectly and there's small artifacts and inaccuracieses

  • from

  • generateor

  • output.

  • You can use both methods. Not getting all the way there, but

  • attach a feature-level to close the reality gap.

  • And combine what we call the grasp, a

  • combination of pixel-level and feature-level. In the left half

  • of the video, a simulated grasp. In the right half, the output of

  • the generator. You can see it's learning cool things in terms of

  • drawing what the tray should look like. Drawing more

  • realistic textures on the arm. Drawing shadows. It's learned

  • how to draw shadows as the arm is moving around in the scene.

  • It's not perfect. There are still odd splotches of color

  • around, but it's definitely learning something about what it

  • means for an image to look for realistic. Now, this

  • is good for getting a lot of pretty images, but what matters

  • for our problem is whether these images are actually useful for

  • reducing the amount of real-world data required. And

  • we find that it does. So, to explain this chart a bit, on

  • the X axis is the number of real-world samples used.

  • And we compared the performance of different methods as we vary

  • the amount of real-world data given to the model. The blue

  • bar is the performance with only simulated data.

  • The red bar is the performance when we use only real data, and

  • the orange bar is the performance when we use both

  • simulated and real data and the domain adaptation methods I have

  • been talking about.

  • When we use just 2% of the original real world data set,

  • we're able to get the same level of performance. This reduces

  • the number of real-world samples we needed by up to 50 times,

  • which is really exciting in terms of not needing to run

  • robots for a long amount of time to learn these grasping

  • behaviors. Additionally, we found that even when we give all

  • of the real-world data to the model, when we give simulated

  • data as well, we're still able to see improved performance.

  • So, that implies that we haven't hit data capacity limits for

  • this grasping problem. And finally, there's a way to train

  • this setup without having real-world labels.

  • And when we train the model in this setsing, we found that we

  • were still able to get pretty good performance on the

  • real-world robot.

  • Now, this was a work of a large team across both brain as well

  • as X. I would like to thank all of my collaborators. Here is a

  • link to the original paper. And I believe there is also a

  • blog post if people are interested in hearing more

  • details. Thanks.

  • [ Applause ] All right. And lastly, we have

  • Sherol who is going to talk to you about art and music with

  • machine learning. >> Amazing. Just amazing.

  • It's really just in awe of what machine learning is capable of

  • and how we can extend human capabilities.

  • And we want to think more than just about, you know,

  • discovering new approaches and new ways of using the

  • technology. We to want see how it's being used how

  • it impacts the human creative process.

  • So, imagine you need to find or compose a drum pattern.

  • You have some idea of -- of a drum beat that you would like to

  • compose.

  • And all you need to do now is go to a website where there's a

  • pre-trained

  • model of drum patterns sitting on the online. You just need a

  • web browser. You give it some human input and you can generate

  • a space of expressive variations. You can tune and

  • control the type of outputs that you're getting from this

  • generative model. And if you don't like it, you can continue

  • going through it exploring this generative space. So, this is

  • the type of work that project Magenta focuses on.

  • To give you a bird's eye view of what Project

  • Magenta is about, it basically is a group of researchers and

  • developers and creative technologists that engage in

  • generative models research.

  • So, you'll see this work pub u pub

  • lushed published at machine learning conferences, a lot of

  • research contributions from Magenta. And you'll see the

  • code, after it's been published, put into open source

  • repository on GitHub in the Magenta repo. And from there,

  • we'll see ways of thinking and designing creative tools

  • that can enhance and extend human expressive creative

  • process.

  • And eventually ending up into the hands of artists and

  • musicians. Inventing new ways we can create. And inventing

  • new types of artists.

  • So, I'm going to give three brief overviews of the

  • highlights of some of our recent work.

  • So, this is performance RNN. How many have seen this? This

  • is one of the demos earlier today. A lot of people have

  • seen and heard of

  • this kind of work. This is what people think of as a generative

  • model. How can we build a computer that has the kind of

  • intuition

  • to know the qualities of

  • melody and harmony and expressing dynamics. It's more

  • interesting to explore this in the browser enabled by

  • TensorFlow.js. This is a demo we have running online. We have

  • the ability to tune and control some of the output that we're

  • getting. So, in a second, I'm going to show you this video of

  • what that looks like. You would have seen it out on the demo

  • floor. But we'll show you, and all of you watching online. We

  • were able to bring it even more

  • alive by connecting a baby grand piano

  • that is also a midi controller and we have the ability to

  • perform alongside the generative model, reading in the inputs

  • from the human playing the piano.

  • So, let's take a look.

  • So, this is trained on classical music data from actual live

  • performers.

  • This is from a dataset that we got

  • from a piano competition.

  • [piano playing] -- I don't know if

  • you noticed, this is Nikhil from earlier today. He's a talented

  • young man. He

  • helped build out the browser version.

  • [piano playing] and so, we're thinking of ways that we take

  • bodies of

  • work, we train a model off of the data, then we create these

  • open source tools

  • that enable new forms of interaction, of creativity and

  • of expression. And this is all these points of engagement are

  • enabled by TensorFlow. The next tool I want to talk about that

  • we have been working on is the variational autoencoders. How

  • many people are familiar with latent space interpolation?

  • Quite a few of you. If you're not, it's quite simple. You

  • take human inputs and you train it through a neural network,

  • compressing it down to an embedding space. You compress

  • it down to some dimensionality and then you reconstruct it.

  • So, you're comparing the reconstruction with the

  • original. And trying to train -- build a space around that.

  • And what that does, is that creates the ability to

  • interpolate from one point to another, touching on the

  • intermediate points where human may have not

  • given input.

  • So, the machine learning model may have never seen an example

  • it's able to generate. It's building an intuition off of

  • these examples. You can imagine if you're an animator, there's

  • so many ways of going from cat to pig. How would you animate

  • that? There's an intuition the artist would have in creating

  • that sort of morphing from one to the other. We're able to

  • have the machine learning model also do this now. We can also

  • do this with sound, right? This technology actually carries over

  • to multiple domains. So, this is NSynth.

  • And we've released this I think sometime last year. And what it

  • does, it takes that same idea of moving one input to another.

  • Let's take a look. You'll get a sense of it.

  • Piccolo to electric guitar

  • . [sound moving back and forth] -- so, rather than recompose, or

  • fading from one sound to the other, what we're actually able

  • to do is we're able to

  • find these intermediary, recomposed sound samples and

  • produce that. So, it looks, you know, there's a lot of

  • components to

  • that, there's a wave net decoder. But really it's the

  • same technology

  • underlying the encoder, decoder. When we think about the types of

  • tools that musicians use, we think less about training

  • machine learning models. We think about drum pedals. Not

  • drum pedals. Guitar pedals. They are used to refine sound to

  • cultivate the art and flavor the musician is looking for. We

  • don't think about parameter flags

  • or trying to write lines of Python code to create the sort

  • of art, you know, in general.

  • So, what we've done, not just are we interested in finding and

  • discovering new things, we're also interested in how those

  • things get used in general. Used by practitioners. Used by

  • specialists. And so, we've created a hardware. We've taken

  • the piece of hardware. We have taken the machine learning model

  • and put it into a box where a

  • musician can plug in and explore this latent space and

  • performance. Let's look at what musicians feel

  • and what thing in this process.

  • [synth music] >> It feels like we're turning a

  • new corner of new possibilities. It could generate a sound that

  • might inspire us. >> The fun part is you think you

  • know what you're doing, there's a weird interaction happening

  • that could give you something totally unexpected.

  • >> I mean, it's great research, and it's

  • really fun and it's amazing to discover new things, but even

  • more amazing to see how it gets used how people think to create

  • alongside it. And so, what's even better is it's just

  • released. And in collaboration with the creative lab London.

  • NSynth super. It's open source. All the specs are on GitHub.

  • We talk about potential to the touch, and the code and what

  • hardware it's running on. This is all available to everyone

  • today. You can go online and check it out yourself. Now,

  • music is more than just sound, right? It's actually a sequence

  • of things that goes on.

  • So, when we think about the -- this

  • idea of what it needs to have a generative music space, we think

  • also about melodies. And so, just like we have cat to pig,

  • what is it like to go from one melody to the next?

  • And moreover, once we have that technology, how does it -- what

  • does it look like to create with that? You have this expressive

  • space of variations. How do we design an expressive tool

  • that takes advantage of it? And what will we get out of it?

  • This is another tool that's developed by another team at

  • Google to make use of melodies in a latent space.

  • How interpolation works and then building a song or a composition

  • with it. Take a listen.

  • Say you have two melodies.

  • [twinkle, twinkle little star --]

  • And the middle.

  • [melody is morphing]

  • And extended.

  • [melody

  • is becomeing more complex] And we really are just

  • scratching the surface of what's possible. How do we continue to

  • have the machine learn and have a better intuition for what

  • melodies are about? So, again, to bring it back full

  • circle, we have, using different compositions and musical works,

  • we're

  • able to train a variational autoencoder to create an

  • embedding space that builds tools that enable open source

  • communities to design creative artist tools. To look at new

  • ways of pushing the expressive boundaries that we currently

  • have. This is, again, just released. It's on our blog.

  • All the code is open source. And made available to you. And

  • also enabled by TensorFlow. In addition to all these other

  • things,

  • including Nikhil, here enabled by the type of work and

  • creativity and expressivity. And so, in wrapping up, I want

  • to take us back to this demo that we saw. Now, the most

  • interesting and maybe the coolest thing about this demo

  • was that we didn't even know that it was being built until

  • it was Tweeted by T Tero, a developer from Finland. And the

  • fact of the matter is, this just -- we're barely scratching the

  • surface. There's so much to do, so much to engage in and so much

  • to discover. And we want to see so much more of this. We want

  • to see more developers, more people sharing things and more

  • people getting engaged. Not just developers, but artists and

  • creatives as well. We to want explore and invent and imagine

  • what we can do with machine learning together as an

  • expressive tool.

  • And so, go to our website, g.

  • co/magenta, you'll find our publications and these demos.

  • You can experience it yourself and more.

  • And you can also join our discussion group. So, here's g.

  • co/magenta. Join our discussion group. Become part of the

  • community and share the things that you're building so we can

  • do this alongside together. Thank you so much.

  • [ Applause ] So, that's it for the talks

  • today. We've had an amazing, amazing show. Amazing spread of

  • speakers and topics. Now let's take a look at a highlight

  • review of the day.

  • [Music playing]

  • >> Earlier this year, we hit the milestone of 11 million

  • downloads.

  • We are reallied really excited to see how much users are using

  • this and the impact in fact world.

  • >> We're very excited today to announce

  • that we are joining the TensorFlow

  • family, deeplearn .js. >> TensorFlow is an early stage

  • project,

  • we would like you to help build this future.

  • >> I told you at the beginning, the mission for tf.data was to

  • make a library that was fast, flexible and easy to use.

  • >> So, I'm very excited to say that we have been working with

  • other teams in Google to bring TensorFlow Lite to Google apps.

  • >> In general, the Google brain team's

  • mission is to make machines intelligent and use that ability

  • to improve people's lives.

  • I think that's good examples of where there's real opportunity

  • for this. [ Applause ]

  • >> So, hold on just a minute.

Test.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

TensorFlow Dev Summit 2018 - ライブストリーム (TensorFlow Dev Summit 2018 - Livestream)

  • 5 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語