字幕表 動画を再生する 英語字幕をプリント Test. >> Hi, everybody. We have a big show for you today. So, if you have -- now would be a great time to turn -- the exit behind you. [ Applause ] >> Hi, everybody. Welcome. Welcome to the 2018 TensorFlow Lite summit. We have a good day with lots of cool talks. As you know -- we are embarking on the -- and the controllers in Europe are useing to project the trajectory of flight through the air of Belgium, Luxembourg, Germany and the Netherlands. This has more than 1. 8 million flights and it is one of the most dense air spaces in the world. And teary farming. We know that a cow's health is vital to the survival of the dairy industry. And -- connected our company in the Netherlands, they wondered if they can use machine learning to track the health of cows and be able to provide insights to farmers and veterinarians on actions to be taken to ensure we have happy, healthy cows that are high yielding. In California, and also from the Netherlands. And -- music, the machine learning algorithm, the neural networks -- >> And changed by machine learning. The popular Google home, or the pixel or search or YouTube or even maps. Do you know what is fascinateing in all of these examples? TensorFlow is at the forefront of them. Makeing it all possible. A machine learning platform that can solve challengeing problems for all of us. Join us on this incredible journey to make TensorFlow powerful, scaleable and the best machine learning platform for everybody. I now -- with TensorFlow to tell us more about this. Thank you. >> So, let's take a look at what we have been doing over the last few years. It's been really amazing. There's lots of new -- we have seen the popularity of TensorFlow Lite grow. Especially over the last year, we focused on makeing TensorFlow easy to use, and the degrees -- and new programming paradigms like -- execution really make that easyier. Earlier this year, we hit the milestone of 11 million downloads. We are really exciteed to see how much users are uses this and how much impact it's had in the world. Here's a map showing self-identifyied locations of folks on Git hub that started TV -- TensorFlow. It goes up and down. In fact, TensorFlow is useed in every time zone in the world. An important part of any open source product is the contributeors themselves. The people who make this project successful. I'm exciteed to see over a thousand contributeors from outside Google who are makeing contributions not just by improveing code, but also by helping the rest of the committee by answering questions, responding to queries and so on. Our commitment to this community is by share -- sharing our direction in the roadmap, have the design direction, and focus on the key needs like TensorBoard. We will be talking about this later this afternoon in detail. Today we are launching a new TensorFlow Lite blog. We'll be shareing work by the team in the community on this blog, and we would like to invite you to participate in this as well. We're also launching a new YouTube channel for TensorFlow that brings together all the great content for TensorFlow. Again, all of these are for the community to really help build and communicate. All day today we will be shareing a number of posts on the blog and videos on the channel. The talks you are hear hearing here will be made available there as well, along with lots of conversations and interviews with the speakers. To make views and shareing easyier, today we are launching TensorFlow hub. This library of components is easyily integrateed into your models. Now, again, goes back to really makeing things easy for you. Library. With the focus on deep learning and neural networks. It's a rich collection of machine learning environments. It includes items like regressions and decision trees commonly used for many structured data classification problems. There's a broad collection of state of the art tools for stats and Baysian analysis. You can check out the blog post for details. As I mentioned earlier, one of the big key focus points for us is to make TensorFlow it easy to use. And we have been pushing on simpler APIs, and making them more intuitive. The lowest level -- our focus is to consolidate a lot of the APIs we have and make it easier to build these models and train them. At the noise level the TensorFlow APIs are really flexible and let users build anything they want to. But these same APIs are easier to use. TensorFlow contains a full implementation of Keras. You can offer lots of layers to train them as well. Keras works with both executions as well. For distributed execution, we provide estimators so you can take models and distribute them across machines. You could also get estimators from the Keras models. And finally, we provide premade estimators. A library of ready to go implementations of common machine learning environments. So, let's take a look at how this works. So, first, you would often define were model. This is a nice and easy way to define your model. Shows a convolution model here with just a few lines here. Now, once you've defined that, often you want to do some input processing. We have a great idea of the data introduced in 1. 4 that makes it easy to process inputs and lets us do lots of optimizations behind the scenes. And you will see a lot more detail on this later today as well. Once you have those, the model and the info data, now, you can put them together by equating the data-set, computing gradients and updating parameters themselves. You need a few lines to put these together. And you can use your debugger to debug that and involve problems as well. And, of course, you can do it even fewer lines by using the pre-defined lines we have in Keras. In this case, it executes the model as a graph with all the optimizations that come with it. This is great for a single machine or a single device. Now, often, given the high, heavy computation needs for deep learning or machine learning, we want to use more than one actuator. We have estimators. The same datasets that you had, you can build an estimator and really use that to train across the cluster or multiple devices on a single machine. That's great. Why not use a cloud cluster? Why not use a single block box if you can do it faster? This is used for training ML models at scale. And the focus is to take everything you have been doing and build a TPU estimator to allow you to scale the same model. And finally, once you have trained that model, use that one line at the bottom for the deployment itself. Your deployment is important, you often do that in data centers. But more and more we are seeing the need to deploy this on the phones, on other devices as well. And so, for that, we have TensorFlow lithe. And we have a custom format that's designed for devices and lightweight and really fast to get started with. And then once you have that format, you can include that in your application, integrate TensorFlow Lite with a few lines, and you have an application to do predictions and include ML. Whatever task you want to perform. So, TensorFlow runs not just on many platforms, but in many languages as well. Today I'm excited to add Swift to the mix. And it brings a fresh approach to machine learning. Don't miss the talk by Chris Lattner this afternoon that covers the exciting details of how we are doing this. JavaScript is a language that's synonymous with the web development community. I'm excited to announce TensorFlow.JS, bringing it to the web developers. Let's take a brief look at this. The same TensorFlow applications in JavaScript, you can call them just as plain JavaScript code. And a full-fledged layer of API on top. And full support for TensorFlow and Keras models so you can pick the best deployment for you. And under the covers, these APIs are actuated. And we have NodeJS support coming soon, which will give you the power to actuate on CPUs and GPUs. And I would like to welcome Megan Kacholia to talk about how TensorFlow does performance. [ Applause ] >> Thank you. All right. Thanks, Rajat so, performance across all platforms is critical to TensorFlow Lite's success. I want to take a quick step back and talk about some of the things we think about when measuring and assessing TensorFlow's performance. One of the things we want to do is focus on real world data and time to accuracy. We want to have reproducible benchmarks and make sure they're realistic of the workloads and types of things that users like you are doing on a daily basis. Another thing, like Rajat talked about, is we want to make sure we have clean APIs. And we don't want to have a fast version and a pretty version. The fast version is the pretty version. All the APIs that we talked about that we're talking about through various talks, these are the things you can use to get the best performance out of TensorFlow. You don't have to worry about what is fast or pretty, use the pretty one, it is fast. TF Data from Derek after the keynote. As well as distribution strategy from Igor. And these are great examples of things we have been pushing on to ensure good performance and good APIs. We want good performance, whether it's a large data center like here, or maybe you're using something like on the image here. A GPU or CPU box under your desk. Making use of a cloud platform or a mobile or embedded device. We want TensorFlow to perform well across all of them. Now the numbers, because what is a performance talk if I don't show you slides and numbers. First, look at things on the mobile side. This is highlighting TensorFlow Lite performance. There's a talk giving a lot more detail how it works and the things we were thinking of when making it later today by Sarah. And weft the speed yum with Qu -- and it's critical to have strong performance regardless of the platform, and we're really excited to see these gains in mobile. In looking past mobile, just beyond, there are a number of companies in the hardware space which continues to expand. The contributions that come out of the collaborations that we have with these companies, the contributions they give back to TensorFlow and back to the community at large, are critical to making sure that TensorFlow performs well on these specific platforms for the users that each group really cares about. One of the first ones I want to highlight is Intel. So, the Intel MKL-DNN library, open sourced and highly optimized for TensorFlow. We have a 3X inference speedup on Intel platforms, as well as great scaling efficiency on training. And this is one of those things that highlights how important it is to have strong collaborations with different folks in the community. And we're excited to see things like this to go back to all the users. And I want to call out a new of the collaborations with NVIDIA as well. And Tensor RT, an inference Optimizer and we have been working on this for a long time. It's been around for a little while. But with the TensorFlow 1. 7 release, we have native support built in. You can get low latency, high throughput. You can see an inference speedup versus native 32 with standard TensorFlow. It's great to see the collaborations and the contributions and the great numbers delivered by it. Looking past inference and going on to some of the training things. So, mixed-precision training is important. As faster and more hardware comes out, if you use the support, that's how to get the best out of the hardware. One of the examples is the Tesla V100 that NVIDIA has. And we want the mixed-precision training support get the best performance out of the hardware. You can see a training speedup. This is on an 8X Tesla V100 box. You can see the performance improvement moving to mixed-precision training versus the standard TensorFlow. Scaling efficiency is really important as well. Obviously we want to make sure TensorFlow flows well, maybe a single GPU. But keep going regardless of what you throw at it. We want to make sure that, again, looking at examples for real-world data as well as synthetic data, it's great to benchmark on synthetic data, but real-world needs to perform as expected. 90% with real data and 95% with synthetic data. This is a V100 box that has one 1 and 8GPUs. And you can see the scaling here. But this is something we care about, and you're going to hear more about scaling efficiency with internal APIs later today. Moving past -- moving on to cloud frameworks. I want to talk about cloud TPUs. Cloud TPU was launched in beta in February. Just a month and a half ago. This is Google's V2 TPU. It's available by Google's cloud platform, like I mentioned. It's exciting to look at the numbers here. This picture is showing a device. And on a single device, you can get 180 teraflops of computation. But it's not just about the power, it's about what you can do on this. Doesn't matter if you can't run the types of models that you want. I want to highlight the reference models that we have open sourced that are available today as well as a bunch more types of models coming soon. Again, just to highlight the breadth of things you can run on the hardware and get great performance and great accuracy. We have an internal team continually making sure that these models perform well, they perform fast and they are also training to accuracy in the expected amount of time. It's not just about putting a model out and open sourcing it, but it's making it work as the community expects it to work. Again, some numbers. What good is it if I don't show numbers in the performance talk? One of the numbers I want to call out on this slide, the cost to ImageNet is under $85. It's exciting to see what you can achieve but making use of this platform. And if you want more numbers, you can look at the DAWNBench rentry that was submitted for the ImageNet training. One final exciting thing to call out is the available of pods, later this year. So, what's a pod? A pod is actually 64 of the devices, like I showed earlier, all wired up together. And you get about 11. 5 petaflops of computation in a pod. That is a lot of compute power that is going to be available in this. What can you do with that? The team has been pushing training resident 50 on a pod to accuracy in less than 15 minutes. We're very excited what can be done with this type of hardware. And just the amazing speed that you can get that wasn't really possible before. So, Rajat talked about the APIs and just the ease of use and things we're focusing on. I have given you some numbers. But what happens when you put it together? What can TensorFlow do? I want to invite Jeff Dean, the leader of the brain team, to come up and talk a bit more about how TensorFlow addresses real problems. [ Applause ] >> Thanks, Megan. So, I think one of the really remarkable things about machine learning is it's capacity to solve real problems in the world. And we've seen tremendous progress in the last two years. In 2008, though, the U.S. National academy of engineering put out this list of grand engineering challenges that they were hoping to be solved by the end of the 21st century. It's 14 different challenges. And I think it's a really nice list of things we should be aspiring to work on as a society. If we solved all these problems, our planet would be healthier, people would live longer, we would be happier and things would be better. And I think machine learning is going to help us in all of these. Some in small ways. Machine learning influencing our understanding of chemical molecules. Some in major ways. I'm going to talk about two today. But I think machine learning is a key to tackling these areas. The two are advancing health informatics and engineering of tools for scientific discovery. Clearly machine learning is a big component, and TensorFlow itself you can think of as a tool for helping us engineer some of these discovers. But one of the things that I think is really important is that there's a lot more opportunity for machine learning than there is machine learning expertise in the world. The way you solve a machine learning problem today is, you have some data. You have some computation, maybe GPUs or TPUs or CPUs, and then a machine learning expert. Someone who has taken a graduate class in machine learning. Downloaded TensorFlow and familiar enough to play with it. But that's a small set of people in the world. And then you stir all this together and you get a solution, hopefully. So, that's -- the unfortunate thing about that is there's probably tens of thousands of organizations in the world today that are actually effectively using machine learning in production environments and really making use of it to solve problems. But there's probably tens of millions of organizations in the world that have data in a form that could be used for machine learning but don't have the internal expertise and skills. How can we make machine learning much easier to use so you don't need nearly as much expertise to apply it? Can we use computation to replace a lot of the need for the machine learning expertise? We have been working on a suite of techniques, AutoML. And neural architecture search is one example of this. One of the things a machine learning expert does is they sit down and decide for a particular problem what kind of model structure they're going to use for the problem. A resident 50 architecture, a nine-layer CNM with these filter sizes and so on. It turns out that you can use machine learning to optimize a controller that proposes machine learning models. You can have the controller propose machine learning models, train it on the problems you care about, see which work well and which don't and use that feedback as a reinforcement signal for the Model generating model. You can steer it towards models working well for particular problems and away from the space where they're not working well. If you repeat this, you can get powerful, high-quality models. And they look a little weird. So, this is not something a human machine learning expert would sit down and sort of extract. But it has characteristics of things we know human machine experts have discovered are helpful. If you think of the resident architecture, it has skip connections that allow you to skip every layer. These more organic-looking connections are the same fundamental idea, which is you want to allow input data to flow more to the output without going through as many computational layers. So, the interesting thing is, AutoML does quite well here. Every dot here -- this is a graph showing computational cost versus accuracy for ImageNet. And every dot shows different kinds of tradeoffs. And generally, as you expend more computation, you get higher accuracy. But every dot here is sort of the work of years of effort -- a cumulative effort by top machine learning experts in the world. And if you run AutoML, you get better accuracy and better computational tradeoffs than all those models. That's true at the high end, you care about utmost accuracy and don't care about computational budget. But at the low end, you have low weight with low computational cost and high accuracy. That is exciting. I think this is a real opportunity to use more computation to solve real machine learning problems in a much more automated way so that we could solve more problems more quickly. We have released this in collaboration with the cloud group at Google as a -- as a product that customers can use for solving their own computer vision problems. And obviously the one I have has lots of other categories of problems. Okay. Advance health informatics. So, machine learning and health care is going to be a really impactful combination. One of the areas that we have been working on is a variety of different medical imaging problems, including one problem in -- where you're trying to look at an image and diagnose if that image shows science of diabetic retinopathy. 400 million people are at risk of this around the world. It's very treatable if it's caught in time. But if it's not, you can suffer vision loss. So, in many parts of the world, there aren't enough ophthalmologists to inspect these images. And so, we have done work, and with work that we've published in the very end of 2016, we showed that we had a model that was on par with board certified ophthalmologists. And since then, we have been continuing to work on this. And we've changed how we sort of label our training data. We've gotten retinal specialists to label the training data rather than general ophthalmologists. And we have it on par with retinal specialists, a higher standard of care. We can bring this and deliver this to lots and lots of places around the world. But more interestingly, I'm going to tell you a tale of scientific discovery. So, we had a new person join the retinopathy team. And as a warmup exercise, Lily who leads this work, said to this new person, hey, why don't you go off and see if you can predict age and gender from the images. Maybe age within a couple of decades, and no gender. The AUC should be .5. They came back, I can predict gender with AUC of .7. That must be wrong. Go off and come back later. And they came back and said my AUC is now .85. That got us thinking. And we investigated what other kinds of things we could predict from these retinal images. You can predict a variety of things indeed caytive of cardiovascular health. Your age and gender are signs of cardiovascular health. Your hemoglobin level, lots of things like this. We have a new, non-invasive test for cardiovascular health. Normally, you have to draw blood and do lab tests, but we can do this just from an image. Which is pretty cool. We're also doing a bunch of work on predictive tasks for health care given a patient's medical record, can we predict the future? This is something doctors want to do. Understand how your patient is going to progress. And you want to be able to answer lots of kinds of questions. Will the patient be readmitted if I release them now? What are the most likely diagnosis I should be thinking about? What tests for this patient? Lots of questions like that. And we have been collaborating with several health care organizations to work on identified health care records to predict these things. In January, we posted a many author archive paper and looked at these tasks. Highlighting one here. Predicting which patients are most at risk for mortality, and using this, we're able to predict which patients are most seriously at risk 24 hours earlier than the clinical baselines that are currently in use. That means that doctors get 24 hours of advanced notice to pay attention to the patients critically ill and need their close attention and close watching. This is indicative of what machine learning can do. The Google brain team's mission is to make machines intent and approve lives. I'm going to close with a bit of a story. So, when I was 5 years old, I lived in northwestern Uganda for a year. And the local crop there is a root called cassava. And I was 5. So, I liked to go out and help people pick cassava. But it turns out that machine learning and cassava have a kind of cool twist together. Please roll the video. >> Cassava is a really important crop. It provides for over 500 million Africans every day. >> When all other crops failed, farmers know they could rely on their cassava plants to provide them food. >> We have several diseases that affect cassava, and these diseases make the roots inedible. It is very crucial to actually control and manage these diseases. >> We are using machine learning to respond to the diseases. >> And TensorFlow is the best foundation for our solutions. The app can diagnose multiple diseases. It's nuru, Swahili for light. You wave your phone over a leaf, if it has a symptom, the box will pop up, you have this problem. When we get a diagnosis, we have an option for advice and learn about management practices. : the object we use through TensorFlow relies upon our team annotating images. >> We have collected over 5,000 high-quality images of different cassava diseases throughout this project. We use a single model on the mobile net architecture. It's able to make predictions in less than one second. >> Instead of implementing thousands of lines of code, TensorFlow has a library of functions to allow us to build architectures in much less time. >> We need something that can be deployed on a phone without any connection. TensorFlow is able to shrink these neural networks. >> The human input is critical. We're building something that augments your experience and makes you better at your job. >> So, with AI tools and machine learning, you can improve the yields, you can protect your crops, and you can have a much more reliable source of food. >> AI offers the prospect to fundamentally transform the life of hundreds of millions of farms around the world. >> You can see a product that can actually make someone's life better. This is kind of revolutionary . >> Cool. And I think we have some members of the Penn State and IITA teams from Tanzania here today. If you could all stand up or wave. And I'm sure they would be happy to chat with you. [ Applause ] I'm sure they would be happy to chat you at the break about that work. With that, I would like to introduce Derek Murray who is going to talk to you about tf.data. That's a way to describe an input line. Derek. Thanks. [ Applause ] >> Okay. Thank you, Jeff. And, wow, it's amazing to see how people are using TensorFlow to make the world a better place. As Jeff said, my name is Derek Murray, I'm thrilled to be here to tell you about tf.data. It helps you get your data from cat pictures to cassava pictures into TensorFlow. They're usually overshadowed by the more glamorous aspects of machine learning, matrix, convolution, but I would argue they're extremely important. Input data is the life blood of machine learning. And the current algorithms and hardware are so thirsty for data, we need a powerful input pipeline to keep up with them. There we go. So, when I kicked off the tf. data project last year, TensorFlow had room to improve. You could feed in data from Python at each step, kind of slow, or set up curators to feed in data. These were challenging to use. So, I decided to focus on three themes, the main focus of my talk today. First is performance. When we're training models on state of the art accelerator hardware, we have a moral imperative to keep them busy with new data at all times. The second is flexibility. We want to handle any data in TensorFlow itself. We don't want to have to rely on different tools so you can experiment with different views. And third, we have ease of use. With TensorFlow, we want to open up machine learning to users of all abilities. And it can be a big leap from following along with your first tutorial to training your first on your own data. And we can help smooth this transition. Tf. data is the only library you need for TensorFlow. How to do that? We took inspiration from the worlds of data bastes and designed tf.data as tools to divide up into three. First, the tools to extract data from a wide range of sources. These can range from in-memory arrays to multi Terabyte files across a distributed system. Then we have the tools to transform your data. These enable you to extract features, perform data augmentation and ultimately convert your raw data into the tensors you will use to train your model. And finally, loading into the accelerators. That is important for performance. That's the high level pitch. What's it look like in real today? This is the standard tf. data input pipeline for example protos for example files. I bet that 90% of all TensorFlow input pipelines start out this way. It's so common, we have wrapped this pipeline up in a single utility. For petago logical reasons, it's important to start out there. We get a list of files on your local disk or in GCS or S3, and extract the tf records. We use functional transformations to pre-process the data. We took inspiration from C Sharp, Java and Scala which use method chaining to build up a pipeline of data set objects and a higher order of functional operators like map and filter to help you customize the behavior of that pipeline. And finally, the load phase, tell TensorFlow how to get data out of the set. And one of the easiest ways is to create an iterator. Just like the name sake in Python, gives you sequential access. And we will see ways to soup up this part of the pipeline later on. I have given you an overview, I want to come back to the key themes and tell you how we have been advancing each of them. Let's start with performance. So, you remember all those exciting performance results that Megan told you about in the keynote? Well, every one of them was measured using real input data and ad tf.data input pipeline you can download and use in your programs. And there's one that I personally like, and it measures the performance of training an infinitely fast image model or real data in order to tease out any bottlenecks in the pipeline. When we ran this last week on an NVIDIA feed, it processed over 30,000 ImageNet images per second. That's much faster than we can train on current hardware, but it's exciting for a couple of reasons. And this throughput more than doubled other the last eight months. And a testament to the great job the team has done in optimizing TensorFlow performance. And the accelerators are getting faster all the time. And we have this extremely useful benchmark that guides us as we continue to approve tf.data performance. How do you get that kind of performance? Well, one option is you can go on GitHub, the TensorFlow benchmarks projects, and use it in your program. You should probably just do that. But maybe you have a different problem to solve. We have recently launched tf.data on TensorFlow. org, and this guide is full of useful, theoretical, and practical information that gives you the ability to put optimizations into practice on your own pipelines. And to support this on the technical side, adding a raft of new features to tf.data to achieve this performance. One I critically want to call out is this part of TensorFlow 1.8. You can start playing with it in the builds right away. And up to this point, tf.data has been exclusively for code that runs on the CPU. This marks our first foray into running on GPUs as well, there's a lot more I can say on this topic and we are developing the features. But let's go back to look again at program from earlier on. Show you how to put the techniques into practice. First off, when dealing with a large dataset like GCS or S3, you can speed things up by reading multiple files in parallel by increasing the level of throughput into your model. And you can turn this on with the single record, the call, numb parallel -- and then you can improve performance by switching to fused version of various transformations. And I can repeat -- between the boxes, the buffers. And fused together the match and batch, the execution of the function and the map and the data transfer of each element into the batch. And together, these two optimizations get big speedups for models that consume a large volume of data. And last, but not least, we have the GPU prefetch that I mentioned. This ensures that the next batch of input data is in memory on the GPU when it's ready to begun. There is a crucial part of the CNN benchmarks. But achieving it had buffers from CPU to GPU. And the new device API gives you the same performance and only requires you to add one line of code to your input line to get the benefits. This is the cliffs notes version. I would encourage you to watch my colleague's talk later, he's going to show you a more specific approach. And restructure check it out. Now, let's switch gears and move on to the second. Originally the flexibility in tf.data stemmed from the functional transformation. Allowed you to put any TensorFlow graph, at any point in your pipeline. So, for example, if you have existing TensorFlow code for pre-processing images, you can stick it in a data flow map and start using it right away. The original version of tf.data traded on this and let you pass a list of tensors in and get a list of tensors out from these transformations. We heard back from the users, they had more sophisticated things and complicated structures. We had in TensorFlow, added native support, and it's useful for dealing with complex categorical data and trading in models. So, at this point, if TensorFlow is doing everything you want to do, you're all set. But one thing we have learned over last few years is not everything is most naturally expressed in a TensorFlow graph. We have been working to give you alternative ways to build up the tf.data pipelines. The first is to add data set. And this allows you to build a pipeline from a Python function that generates -- and you can wrap existing Python code and benefit from the performance transitions like prefetch and GPUs. The other way might be more appealing to power users. We have openedded up a backend API, and you can build the plugins in C++. And I've heard from some of our partners, this is useful for custom and we're dogfooding this approach for some of the implementations like the recently added Kafka data set. I'm looking forward to what some of you will build with this new API, and encourage you to contribute back via pull requests. We're excited about the contributions from the community at this point in the project. Okay, final thing I want to cover is ease of use. I want to speak to folks like you who have used TensorFlow for a year or more, and struggled with the data into TensorFlow, I don't have to make much of a case. But the users have high expectations and there are people getting their first exposure to every day. We continue to push hard on usability. I want to share a few highlights. First off, as Rajat told you in the keynote, eager execution is here, makes using tf.data a lot more pleasant. Alex is going to tell you more, but from my admittedly biased perspective, you can start treating data sets just like any object. You can look over them with the regular fore loop and there's no iteration required. What's neat about this, it works together with tf. data like GPU prefetch so you can combine the efficiency of execution for your input pipeline and the flexibility of Eager execution for your model code. Next, return feedback. Power users like the composebility and figurability of the data API, many had users just want an easy way to pull best practices. TensorFlow 1. 8 will have new protocol buffers and for CSV data to make it easier to handle the formats and apply all the best practices from the performance. So, let's go for one last time back to the standard problem. As I promised in the beginning, it can be replaced by a single call. And this performs all the parallel IO, shuffling, batching and fetching for you. Gives you back a data set you can continue to transform using map filter and transformations. And if you have a workload, use a binary format like tf.records. But those who tend to have smaller data, they prefer something simple. And the CSV format fits that fine. There are thousands of different CSV data sets that are available to download for free. And this snippet shows you how to use an API, installing them to download with just a couple simple commands. Once you have done that, you can use the new data set function in TensorFlow to get the data out of the loaded files. In this case, it's a data set of a million use headlines. What I particularly like about this new API is it takes care of figuring out the types of common names and dramatically cuts down the boilerplate you have to run. Finally, we have been working to improve the integration between tf. data and high-level APIs like estimators and Keras. The Keras support is still in the pipeline, if you'll excuse the pun. But if we want to switch our CSV parsing code for estimators, it's a simple matter of returning the data set from an estimator's input function. No more iterator required. And pass the input function to the estimators train method and we're good to go. The road we're taking, make the data sets and the integrators as natural as possible. Features like Eager execution and the high level APIs are making this easier. The eventual goal is to make it seamless so it's a natural extension of your TensorFlow program. Well, that is about all the time I have. So, just to recap, I told you in the beginning that our mission for tf.data was to make a library for input processing that was fast, flexible, and easy to use. I hope I have convinced you in the last 15 minutes we have achieved these three goals. I hope you understand that tf. data is the one library all input processing. If you want to find out more, there is a ton of documentation about tf.data on TensorFlow.org. Cover how toes and performance guidance I mentioned earlier. And the benchmarks and the official models and repositories, examples of high performance and readable pipelines written in tf.data. And with all of this information and knowing the creativity, I'm really looking forward to seeing what all of you build with this library. Thanks a lot for listening. [ Applause ] >> Okay. And now it is my great pleasure to introduce Alex Passos who is going to tell you all about TensorFlow Eager execution. >> Hello. My name is Alex. And I'm here to tell you about Eager execution, you have heard the last two talks. But I'm here to tell you what it's about. This new imperative object-oriented way of using TensorFlow. We're introducing today as part of TensorFlow core. Because you're here or watching on the live stream, I hope, that TensorFlow has been this, like, graph execution engine for machine learning that lets you run graphs in high scale and all sorts of other nice things. But has it? And why did we choose to go with graphs in the first place? Since now we're -- I'm going to tell you about Eager Execution. We moved beyond what we can achieve with graphs, it's a good idea to recap why we bothered. And like a really good reason why you want to have your computation respected as a platform-independent graph, once you have that, it's easy to differentiate the graph. I went to grad school before all of this was standard in machine learning tool kits and I do not wish that on anyone. Life is much better now, trust me. And if you have a platform-independent abstract representation of your computation, you can just go and deploy it to pretty much anything you want. You can run it on the TPU, you can run it on the GPU, put it on a phone or a Raspberry Pi. There are all sorts of cool deployments that you are going to hear about today. And this is -- it's really valuable to have this kind of platform-independent view. Compilers work with data and graphs generally. And they know how to do nice optimizations that rely on a global view of the computation. Expressing the -- data laying and all things like that. And these are deep-learning specific. We can choose how to properly lay out your channels and height and width so your convolutions are faster. And finally, like a key reason that's very important to us at Google and important to you as well, I hope, once you have a platform-independent representation, you can deploy it and distribute it across hundreds of machines or a TPU like you saw earlier. And this is a seamless process. Since graphs are so good, what made us think it's a good idea to move beyond them and do Eager Execution? A good place to start, you don't have to give up automatic differentiation. Like Python's autograph -- sorry, autograph, that lets you shape dynamic code. You don't need to have the computation to differentiate it. You can build up a trace as you go and walk back the trace to compute gradients. Also, if you don't stop to build a platform like in this computational graph, you can iterate a lot more quickly. You can play with your model as you build it, inspect it, poke and prod at it. And this can let you just be more productive when you're like, making all these things. Also, you can run your model for debuggers and profilers and add all sorts of, like, analysis to them to just really understand how they're doing what they're doing. And finally, if we don't force you to represent your computation in a separate way than the host programming language you're using, you can just use machinery of your host programming language to do control flow and complicated data structures which for some models is key to making the model work at all. So, I hope you're not wondering how do I get to use this? The way to use this is super-easy. Import TensorFlow and have Eager Execution. And what happens is any time you run a TensorFlow application, instead of TensorFlow building a graph that later when executed is going to run that matrix multiplication, we run that for you and give you the result. You can print it, you can slice it, dice it, do whatever you want with it. And because things are happening immediately, you can have highly dynamic control flow that depends on the actual values of the computation you're executing. And here is just a simple conditions line search example that I wrote, and it doesn't matter. It just is loops that have complicated values based on the computation. And this runs just fine on whatever device you have. And together with this enable Eager Execution theme, we're bringing you a few symbols in TensorFlow that make it easier for you to write code both building graphs and executing Eagerly. We're bringing in a new way of doing gradients. And you're familiar with how you do gradients in normal TensorFlow. Great the variable and the loss function. I hope you can think of a better loss function than this one, and you call gradients to differentiate it. But when you have eager execution, we try to be as efficient as you can. And if you're going to differentiate, you need to keep track of the memory of information of what's happening so far. Like your activation. But I don't want you to pay for the cost of this tracking when you're not computing gradients. Performance, the whole reason we're doing this, we want to use these big, nice pieces hardware to train models super-fast. When you want to compute gradients, you use this context manager and records all the operations you execute so we can play it back. Otherwise, the API is the same. Also, training loops in Eager, as Derek pointed out, is much -- it's very easy and straightforward. You can just use a 5.4 loop to iterate over your data sets. And data sets work in Eager just fine and work with the same high performance you get in the graph execution engine. Then you can do your predictions, supply your gradients and do other things you're used to doing. But really, the interesting thing about Eager Execution is not just when you're writing the code that it's finished, that it's done that we already know works, but you're still developing. You want to do things like debug. So, when Eager Execution is enabled, you can just take any model code and I use my simple example here. Add notes to, like, to anywhere you want, and once you're in the debugger, you have the full power of debugging available. You can print the value of any tensor, change the value of any tensor, run any operation you want on any tensor. And this will hopefully empower you to really understand what's going on in your models. And you'll be able to fix any problems you have. You can also take Eager Execution code and profile it using whatever profiling tool you are most familiar and comfortable with. So, here, I have a little dump model that just does an app. And let's pretend I don't know which is going to be the lower, this one is more expensive. But you can run the code for the Payton profiler and find out the matmul is 15 times more expensive. Also, by the way, those examples are run on the Google collaborate thing, which is a completely public shared for notebooks hosted on Google prod. And I think we have a demo on Eager that's hosted on that you can play out with later. If you're on live stream, you can play with it now if you can find the link. But together with Eager, we're bringing a lot of new APIs to make it easier to make graphs and execute models. They are compatible with Eager Execution and graph modeling. A low priority feature request is how to customize gradients in TensorFlow. And I'm sure you're familiar with the tricks, stop gradients and functions. But we're introducing a new API that works in both eager and graph execution. What I like about this example is it's a thing being asked by many, many people how to do it. If I want to have my forward pass and the backward pass, take the gradient from a particular TensorFlow and clip it. Keep it small to prevent it from exploding. It just takes six lines of code to clip the gradient. And I think this is cool. I look forward to seeing what you can do with this when you're doing more than six lines of code and solving all new and interesting research problems. A big, big change when programming with Eager from graph that I really want you to stop and think about is we're trying to make everything as Pythonic and object-oriented as possible. So, variables in had TensorFlow are usually a complicated thing to think about, but when execution is enabled, simpler. It's just a Python object. You can change the value, read the value. When the last reference to it goes away, you get the memory back. Even if it's the GPU memory. So, if you want to share variables, you just reuse those objects. You don't worry about variable scopes or any other complicated structure. And because we have this, like, object-oriented approach to variables, you can look at the APUs in TensorFlow and rethink them in a way that's object-oriented and easier to use. And one is the overhaul with the metrics API. So, we're introducing this new tfe. metrics, one has an updoubt of value, and one gives the result. And hopefully this is an API that everyone is going to find familiar. Please don't try to compare this to the other metrics API. We're giving you a way to do object-oriented saving of TensorFlow models. If you tried looking at TensorFlow check points, you know they depend on variable names. And variable names depend not just on the game show variable, but all other variables present in the graph. This can make it hard to save and load subsets of the model and really control what's in the check point. We're introducing a completely object-oriented, python-object based saving API where you -- it's like Python -- any model gets saved, you can save any subset of your model. You can load any subset of your model. You can even use this tfe.checkpoint object to build things you want to save that have more than a model. Here we have an optimizer and a global stack. You can put whatever you want in there. The object graph is something you can save and load. You can save and load your discriminators and generators separate. And take the discriminator and load it back up as another network that you can use on another part of the model. This should give you a lot more control to get a lot more out of, like, TensorFlow checkpoint. But if you have a question that everybody asks me when I tell them to work with the Eager Execution, is it fast? Graphs have this high performance. How fast can I make this run Python code all the time? We can make it fast enough. For models that are highly computationally intensive, you don't see any Python overhead and we are fast with the TensorFlow. Sometimes slightly faster, and reasons that I don't fully understand. Even for highly dynamic models, you have comparative performance with anything else you can find. And please don't get attached to these numbers. We have many more benchmarks and we're optimizing Eager performance aggressively. But I hope you know if your model can keep it busy, you're doing large matrix computations, there's no cost in experimenting and doing your research and model building with Eager Execution turned on. When you're doing smaller things, there are overheads. Don't get attached to them. We're being aggressive about optimizing this. If you run an identity, it takes a microsecond. If you run it with Eager Execution turned on, there's an extramicrosecond. If you're tracing gradients, another three microseconds. But just enqueuing something on the GPU screen, that takes a single digit microsecond. So, if you can execute enough computation to keep a GPU busy, you're unlikely to see anything bad from using Eager Execution. And, again, these numbers are improving very quickly. Please don't get too attached to them. But there is this large ecosystem of TensorFlow code libraries, models, frameworks, check points, that I don't think anyone wants to give up. And I don't want you it give up if you want to used Eager Execution. So, we're also thinking really hard about how can you -- how you can interoperate between Eager and graph. One way is to call into graphs from Eager code. And you can do that with tfe. make template. We build a graph for that little Python function and you can use it and manipulate and call the graph from Eager Execution. We also have the reverse, which is how to call into Eager from a graph. Let's say you have a big graph and you understand everything in it, but there's a little chunk of your computation that you really don't know how to express in -- either don't know, or you don't want to bother expressing it in using liar TensorFlow graphs. So, you can wrap it in a tfe graph, and you can run any TensorFlow in there, including convolutions and other things not available. And you can look at the values and use dynamic control. I hope with these two things together, you can reuse Eager and graph code across. But the easiest way to get Eager and graph compatibility is to write model code that can go both ways. Once the code is written and debugged and tested, there's nothing to tell you to build a graph or execute Eagerly. Debug in Eager and impart that same code into graph. Put it in estimator, deploy it on the GPU, distribute it. Do whatever you want. This is what we've done in the example models. And there's going to be a link in the end of the of presentation so you don't need to worry about writing this down. So, here is some practical advice for you. Write code that's going to work well when executing Eagerly and building graphs. To do that, use the Keras layers, they're object-oriented, easy to understand, manipulate and play around with. Use the Keras model, that will give you saving and loading and training and all sorts of things automatically if you want. But you're not forced to use those. Use config summary, they will move to the TensorFlow package soon. If you're watching this on video, probably already happened. Use the tfe metrics instead of the tf.metrics, these are object-oriented and friendier and eager to use. And use the object-based saving. Which is a much nicer user experience anyway. So, you're going to want to do this all the time, it's how your code is going to work well in Eager execution and graph building. So, now, I would like to take some time to tell you why you should enable Eager execution, and like a real good importance reason that led us to build this in the first place, being able to play these objects and manipulate them directly is just a much nicer experience than having to build a graph and interact later in the session. It's a lot more intuitive, let's you understand what's going on better. If you're new, play around >> Now I would like to point to a few things. Some of my colleagues, they're going to be in the demo room during the break with laptops, with notebooks to let you type and try Eager mode there. Please go and give it a try. Or if you're watching on the live stream, type that short link. Hopefully it will stay long enough for you to type it. And play with it right now. It's really nice. We have a getting started guide on TensorFlow that should be live now. That tells you what you need to know about Eager execution and starting to use TensorFlow using Eager Execution. We have a ton of example models like from RNN to net to all purposing that are available behind that link, and I encourage you to look at them and how easy to write the model and how easy it is to reuse the same code from graphs to deployment. We have deployment for graph from all models except for the highly dynamic ones which are hard to write in a graph form. Give it a try. Let us know how it went. We're super-excited to share with you and I hope you have a good time playing with this. Thank you. And now it's time to have a treat. Introducing Nick hill and Daniel. They have a cool demo set up, but I don't to spoil it. >> Hi, everyone, my name is Daniel. >> My anymore is Nikhil. >> We're from the Google brain team. And today, we're delighted to talk about JavaScript. Python has been one of the mainstream languages for scientific computing. And it's been like that for a while. And there's a lot of tools and libraries around Python. But that's where it ends. We're here today to talk -- to convince you that JavaScript and the browser have a lot to offer. And TensorFlow Playground is a great example of that. I'm curious, how many people have seen TensorFlow Playground before? Oh, wow. Quite a few. I'm very glad. Those of you who haven't seen it, check it out after the talk at playground.tensorFlow Lite.org. It's a visualization of a small neural network. And it shows in real-I'm the neural network as it's training. And this is a lot of fun to make and had a huge educational success. We have been getting emails from high schools and universities that have been using this to teach students about machine learning. After we launched playground, we were wondering, why was it so successful? And we think one big reason was because it was in the browser. And the browser is this unique platform where you -- the things you build, you can share with anyone with just the link. And those people that open your app don't have to install any drivers or any software. It just works. Another thing is, it's -- the browser is highly interactive. And so, the user is going to be engaged with whatever you're building. Another big thing is that browsers -- we didn't take advantage of this in the Playground, but browsers have access to sensors like the microphone and the camera and the accelerometer. And all these are behind standardized APIs that work on all browsers. And the last and most important thing, is the data that comes from these sensors doesn't ever have to leave the client. You don't have to upload anything to the server, which preserves privacy. Now, the Playground that we built is powered by a small neural network, 300 lines of vanilla JavaScript that we built as a one-off library. It doesn't scale. It's a simple loop and wasn't engineered to be reusable. But it was clear to us that if we were to open the door for people to merge machine learning and the browser, we had to build a library. And we did it. We released Deep Learn JS. A JavaScript library that is GPU-accelerated and does that via WebGL, standards in the browser, and allows it to render graphics. And deeplearn . js allows it to both run inference and training in the browser. When we released it, we had an incredible momentum. The community took deep deeplearn. js and forwarded it the into browser and built fun things with it. One example is the file transfer. Another had the character RNN and built a novel interface that allows you to explore all the different possible endings of a sentence. All generated by the model in real-time. Another example is the model -- this was a post about this one, that the person that built it allowed users to explore the hidden dimensions, the interesting dimensions in the embedding space. And you can see how they relate to boldness of the font. And there was even education examples like teachable machines that built this fun little game that taught people how computer vision models work so people could interact directly with the webcam. Now, all the examples I showed you point to the incredible momentum we have with deeplearn.js. And building on that momentum, we're very excited today to announce that deeplearn. js is joining the TensorFlow family. And with that, we are releasing a new ecosystem of libraries and tools and machine learning with JavaScript calls TensorFlow.js. Now, before we get into the details, I want to go over three main use cases of how you can use TensorFlow.js today. The first use case is write models directly in the browser. Huge implications. Think of the playground they just showed. The second use case is -- a major use case -- is you can take a pre-trained model in Python, use a script, and you can import it into the browser to inference. And a related use case is the same model that you take during prep, you can re-train it, potentially with private data that comes from those censors of the browser -- in the browser itself. Now, to give a schematic view, we have the bruiser that uses WebGL to do fast algebra. And two sets of APIs, the ops API, which was deeplearn.js, and we worked hard to align with TensorFlow Python. It is powered by an automatic differentiation library. And on top of that, we have a high-level API work layers API that allows you to use best practices and high-level building blocks to write models. But I'm also very excited today to announce is that we're releasing tools that can take an existing Keras model, or TensorFlow savedmodel and forward it automatically for execution in the browser. Now, to show you an example of our API, we're going to go over a small program that tries to learn a quadratic function. They're trying to learn A, B, and C from data. So, we have our import tf from TensorFlow JS. This is a standard ES in JavaScript. We have the three, A, B, C, we mark them as variable. Which means they are viewable and the optimizer can change them. We have the function that does the computation. You can see tf.add and tf.square like TensorFlow. Notion that API, we have a chaining API which allows you to pull these math operations on tensors themselves, and this leads to better readable code that is closer to how we write it. It is very popular in JavaScript world. That's that part of the model. Now, for the training part, we need a loss function. This is a loss function, an error between the prediction and the label. We have our optimizer, the edge of the optimizer. And we train the model, optimizer.minimize for some model. And I want to emphasize for those who have used tf before, Alex's talk, it's aligned with the Eager API in Python. All right. So, clearly, that's not how most people write machine learning. Because those low-level algebras can be quite verbose. For that, we have our layered API. So show you an example of that, we're going to build a recounter neural network that learns the numbers. But the complicated part is that those numbers, like the number 90 plus 10, are being set character by character. And then the neural network has to maintain an internal space with an LSTM cell. And that gets passed into a decoder. And the decoder has 100, character-by-character. It's a sequence-by-sequence model. This may sound complicated, but the layered APU is not that much code. We have the import in TensorFlow.js. We have the sequential model. Those familiar with Keras, this API looks very familiar. We have the first two layers of the encoder, the last three layers are the decoder. And that's our model. We then compile it with a loss, an optimizer, and a metric we want to monitor, like accuracy. And we call model.fit with our data. What I want to point out here is the "Await" keyword. This is an asynchronous call which means -- because in practice, that can take 30-40 seconds in the browser. And in those 30-40 seconds, you don't want the main UI thread of the browser to be locked. And this is why you get a call back with a history object after that's done. And in between the GPU is going to do the work. Now, the code I showed you is when you are trying to write models directly -- when you want to write model the directly in the browser. But, as I said before, a major use case -- even with deeplearn. js, were people importing models that were pre-trained and they wanted to do it in the browser. Before the details of that, I want to show you a fun little game that our friends built that takes advantage of an automatically pre-trained model and imports into the becauser. It's called emoji scavenger hunt. I'm going to show you a real demo with the phone. It's in the browser. Let's see. And you can see here. So, you can see I have a Chrome browser opened up on a Pixel phone. You can see it at the top. And the game uses the webcam and shows me an emoji and I have some number of seconds to find the emoji before the time runs out. Nikhil is going to help me identify the objects. Are you ready? >> I'm ready. >> All right. Let's go. All right. Watch. >> Have a watch. >> Nice. Yay! We got that. Let's see what our next item is. Shoe. >> Shoe. >> Help me out here, buddy. We got the shoe! >> What's next? >> That's a banana. >> Does anyone -- this guy's got a banana. >> Come over here. Yay! >> All right. >> All right. >> I'm ready. >> We're going to have a high score here. Beer. >> Beer. It's 10:30 in the morning, Daniel. Step out -- >> All right. All right. So, I'm going to jump into some of the technical details of how we actually built that game. Stand by, please. So, what we did was we trained a model in TensorFlow to be an object recognizer for the game. We chose about 400 different classes that would be reasonable for a game like this. You know, watches and baa bananas and beer. We used the TensorFlow for poets code lab. And in that code lab, you take a pre-trained mobile net model. If you don't know what MobileNet is, it's a state of the art computer model for edge devices. We took that model and retrained it. Now we have an object detector in the pipeline. How to get this into the browser? We provided this tool today to help you do that. Once it's in, you get the same and make the computer talk and all that kind of fun stuff. Let's jump into how but convert that model. As Daniel mentioned earlier, we support two types of models. TensorFlow saved models, we have a converter for that, and a converter for Keras saved model. You define the model and define it with the saved model. The standard way to do that. Similarly, this is the code for Keras. The next piece is that we actually convert it to the web. Today, we're releasing a package, TensorFlow. js, a script lets you point to the TensorFlow save model and lets you point to the output director. That's where the static built art facts go. Keras is the same flow. Point to the output and you have a directory. Now, you statically host those on the website, simple static hosting. And on the JavaScript side, we provide an API that lets you load that model. So, this is what it looks like for TensorFlow. And the TensorFlow save model, we noticed that it was a model, we don't right now support continuing training of this model. While in the Keras case we actually let you continue training. And we're working hard to let you keep these APIs alive in the future. Under the cover, what are we actually doing? Graph optimization. Which essentially means we prune out nodes you don't need to make the prediction. You don't need them. We optimize waits for browser caching. We park in 4 megabytes, helps the browser be quick the next time you load. Today we support about 90 of the most commonly used TensorFlow ops, and we're working hard to control more flow ops. And we support 32 of the most commonly used Keras layers today. And as I mentioned, we let you continue training for Keras models and let you do evaluation as well as make predictions. Okay. So, obviously there's a lot you can do with just porting your models to the web. Since the beginning of deeplearn.js, we have made it a high priority to train directly in the browser. This opens up the door for education and interactive tools like we saw with the playground. As well as lets you train with data that never leaves your client. This is huge. To show off what you can do with something like this, we built another little game game. The goal of the game is to play Pac-Man. Daniel is much, much better at this game than I am. Say hi. There's three phases of the game. Phase one, we're going to collect frames from the webcam and associate those with up, down, left and right. Daniel's going to move his head up, down, left and right and he's going to play the game like that. And you'll notice, as he's collecting frames, he's kind of moving around a little bit, which helps the model see different angles for that class and generalize a little bit better. So, after he's done collecting these frames, we're going to go and train our model. So, we're not actually training from scratch here when we hit that "Train" button. We're taking a pre-trained mobile net, porting that to the web, and doing a re-training phase. And using a layered APU to do this that the browser. You want to hit the train button. It's going down, looks like we're learning something. That's great. So, as soon as we press that play button, what's going to lap is we're going to make predictions from the webcam. Those are going to get plugged into those controls and it's going to control the Pac-Man game. Ready? All right. So, you can see in the bottom right, he's highlighting the class that is could inned. And he moves his head around, you'll see a change by class. And he's off. So -- so, all of this code is online and you can go fork it. We invite you to do so. Obviously this is just a game. But you can imagine, you know, other types of applications of this, like make a browser extension that lets you control a page for accessibility purposes. So, again, all this code is online. Please go fork if and play and make something else with it. Okay. Daniel, I know this is fun. >> I got it. >> Okay. So, let's talk a little bit about performance. So, what we're looking at here is a benchmark of MobileNet 1.0 running with TensorFlow. TensorFlow classic, not with TensorFlow.js. And I want to point out, this is a batch size of one. This is important because we're thinking about this in the context of an interactive application. Maybe this Pac-Man game, feeding in webcam data, you want to know the prediction time for one. Can't really batch it. In the top row, TV Cuda running on a 1080 GT X, it's about 3 milliseconds. And the shorter the bar, the faster. The second row we have TensorFlow CPU running with AVX512 on a Macbook pro here. And 60 seconds for that. Where does TensorFlow.js come into the picture? We're getting about 11 milliseconds for this. Which is pretty good if you think about this in the context of an interactive game. So, on the laptop we just showed the game, we're getting about 100 milliseconds for that. And that's still pretty good. Like, you can build a whole interactive game with what's running on there. The web is only going to get faster and faster. There's a whole new set of standards coming, like web GPU to push the boundaries. But the browser has limitations. You can only get access to the GPU through WebGL on these APIs. How do we scale beyond that? How do we scale beyond the limitations we have in the browser? There's a whole ecosystem of server-side JavaScript tools using NodeJS that we would love to take advantage of. So, today I'm really happy to tell you that we're working on NodeJS binding to the TensorFlow C API. That means you'll be able to write the same with the Eager mode, the poll polynomial, and the Pac-Man example, and bind to TensorFlow C and have your TensorFlow running with CUDA installed. Eventual we will run the TensorFlow opt backend, that same JS code. These bindings are under active development. Stay tuned. All right. So, let's recap some of the things we launched today and talked about. This low-level ops API which does hard level linear algebra and the Eager mode. This is previously known as deeplearn.js, rebranding today. We released these high-level layers API. That mirrors TensorFlow layers. And we saw that with the addition of RNN and the Pac-Man demo. We also showed you how you can import TensorFlow saved Model and Keras for re-training in the browser. We have released a bunch of demos and examples on GitHub. These are not the only two. There is a whole repository that can get you started. They have live links and you can poke around and play. I invite you to do that. We really want to see you get involved in this project. We have a bunch of links here. JS.tensorFlow.org is the website. There's tutorials, documentation, et cetera. Our code is open source under TensorFlow.js, play there too. And we started a community mailing list today. That's the short link here. And the community mailing list is for people to post demos and ask questions. So, this project was not just Daniel and myself. This is a larger team effort between many of our amazing colleagues at Google. We want to thank them. And we want to thank all of the amazing open source contributor for deeplearn.js. And we're really excited to build the next chapter of machine learning in JavaScript with you. Thank you. >> Thank you. >> We can >> Okay. We are now going to take a break. We have about a little under half an hour. We are going to be back here at 11:30 for a great talk on performance. Head on over to the auditorium. We have food, demos, we have the speakers there to ask questions. Have fun. Test. Test. >> So, we've got a few more seating down over here. We can help you out on that end. We have a few seats left over here. >> Hi. >> A few seats over there. >> I would like to get started again. So, hi, my name is Brennan. I'm talking about training performance today. Now, this talk is centered around a user's guide to how to improve performance. So, there's -- performance is very complicated, a lot of internals to TensorFlow as to things we're going to optimize your training time. But I'm going to talk today about how you can make the most of the TensorFlow you know and love to converge faster. Now, before I go further, I want to take a moment to acknowledge the great work that's done not just by our engineering TensorFlow team and not just by other teams at Google, but by our partner teams, for example, the partner team at NVIDIA, doing a lot of work to make TensorFlow work fast and I want to acknowledge that. With that, dig into motivation. Why do we need performance and how to improve performance. Is it just fine today? Some folks at Baidu put together research. If you want to improve the quality of your models, just train on larger data sets. These beautiful straight lines are showing for multiple different models, as you give more and more training data, you get linearly more accurate. Now, I'm being slightly facetious here, if you look closely, the axis on the graph are log rhythmic, not linear. We don't need linearly increasing amounts of data, we need exponentially more data. This holds not just for model classes on -- in this case, it was sequence-to-sequence-type work, but they found this applied to images and translation. To multiple different areas across multiple model tops. We're going to need to train on exponentially more data to improve our model quality. Unfortunately, we have quite the obstinate adversary, physics. Here's a graph of microprocessor trend data over 40 years. And we can see that clock frequency has hit a wall. And are performance is not getting that much faster compared to how it used to. We are going to have to work a lot harder to meet the challenges of today and tomorrow with performance. Silicon itself is not going to get us there without a little bit of clever cleverness. The result of these two forces coming together has resulted in a Cambrian explosion of hardware. We have TPUs and other exciting things coming. There's startups like Nirvana is part of Intel, and the IPU from Graphcore. Taking points in the design space and trying different hardware methodologies and layouts to get to get the best machine learning performance. This is going to be a very exciting, exciting area coming forward as we think about performance in the future. Now, before I dig into the meat of my talk, I do want to give a -- I do want to acknowledge, this is the Jurassic period and not of the pre- Cambrian era. You have a picture of tetal bytes that's not licensed, send it my way. Across models and paradigms, it looks roughly as follows. You have the training data. You need to load that if in. This is phase one, read from neither disk or generate from a reinforcement learning environment. Decompress it, parse it, image model, image augmentations, random flips, color distortions. And compute the forward pass, the loss, the backwards pass, your gradients. And after the gradients, update the things you're trying to learn and repeat the cycle again. Now, again, there's a wide variety of accelerators. The phase one happens mostly on CPUs. The accelerator takes over phase two and phase three. With that, let's dig in. In my experience, as people migrate to modern accelerators, new TPUs, new generation of GPUs, et cetera, phase one is actually where the most performance problems are. Things everyone hits. There's a problem with phase one. We're going to spend a bit of time digging into input pipelines. Now, you heard earlier from Derek about tf.data. This is the far and away recommended API and way to load data into TensorFlow. And here is if you're doing simple image model, for example, ResNet50, it will start like this. Images batched together into tf .record files. You can shuffle and repeat. The parser function you map across every input image, that will do things like parse the examples, jpg decode, and you batch it up and return the dataset. Now, if you run this on a fancy-pants CloudTPU, a modern accelerator, only 150 images a second. This is nowhere near what you should expect. Now, before you think, wow, cloud TPUs must be garbage, I'm going back to what I was doing before, it behooves you to try to optimize performance. When you're optimizing performance, it's important to follow a methodology. You have to measure your performance, find your bottleneck, optimize your bottleneck and repeat. What does that look like with a cloud TPU. There's tools in TensorFlow for GPUs and TPUs. But you have capture TPU profile. And you can run this to -- pointing it to your TPU. In this case, the TPU's name is Seta. And capture into a profile log directly. The same as with TensorBoard. With that, I would like to switch to the laptop where you can see what the profileing tools look like. Here it a trace from actually the very same input pipeline, or similar to the input pipeline I just showed. And you can see here that your step time graph, you have this tiny bit of orange at the bottom. And this is the compute time on the cloud TPU. And everything up here, it had blue, that's actually waiting for data. That's input pipeline processing. So, our TPU is sitting idle and this is telling you 92% of the time. Totally, totally not what we want to be doing. So, let's dig in. Now, what I recommend using -- we have a bunch of tools and they're constantly improving and getting better. To really understand what's going on underneath the hood, we're going to use the trace viewer. So, here I've loaded it up. One thing I should note, the trace viewer is designed for power users. It may be a little bit unapproachable. Let me walk you through where to look in the trace viewer. In the top, you have the TPU. Now, one cloud TPU has eight compute cores. And these operate independently, although typically they're operating on the same sorts of data, just in parallel. So, these are on the top you can see your step number. You can see the TensorFlow ops that it's executing, and finally with the XLA ops. So, TPUs are programmed by XLA and you can see what's going on underneath the hood. Below that, the CPU compute threads. These are the general TensorFlow thread pool threads. The iterator thread. And finally, a set of threads for in-feed and out-feed. These are managing DMAs to and from the TPU device. Now, it's a little hard to see at the top, so, we're going to need to zoom in. We're going to need to look in a little bit more depth. In order to navigate around within the trace, the keyboard shortcuts are the most useful. Keyboard shortcuts are from the left hand. A and D move left and right. W and S move in and out. So, this is a little bit like the arrow on like a key pad, just like with your left hand on the home row. There's a couple other keyboard shortcuts. So, for example, if you click on something, you can see the details about what this is. And if you press F, you'll focus in on just that sort of element of the timeline. So, you can zoom in and navigate around really easily. If you want to see a little bit more about it, you can press M. This marks it on the UI. You can see that our step time, our training step took 5.4 seconds. We go to the infeed queue, that was also 5.4 seconds. So, let's dig into what's going on the CPU since the TPU is sitting idle waiting for data. There's a lot of things going on. We need to zoom in a lot farther. Not just at the second range, but down to the millisecond range. Here we can see each of those vertical bars, that's 5 milliseconds, okay? And if we zoom in this far, we can see that our iterator is running continuously, and the map function is what's taking the longest amount of time. Okay? The map function, there's a bunch of other little ops that are happening here that are your batching or your repeating or whatnot. But the map function is the bulk of the time. That's are the focus of the optimization efforts. And the map function runs the elements of the map on your normal, standard TensorFlow thread pool. And look closely, we can zoom in further, no two ops running at the same time. We're using multiple threads in the thread pool, this is processing single threaded. That leads to the first optimization. I'm going to switch back to the slides. This is what you need to do to use multiple threads for your input pipeline for your map function. Hit numb parallel calls to 64 and you'll be using up to 64 threads, and because cloud TPUs are hooked up to a powerful machine, you can use all of these threads concurrently. If you do this and rerun your model, you have a 4X improvement, over 600 images a second. That's pretty great. But, we're not done. An important part of the performance methodology is step three. Repeat. You have to repeat again. So, we take a new trace. I'm not going to do it live on the laptop. Because we want to go through. We have a lot of stuff to cover. We now see right here. We have a lot more compute threads going on, but we're still very much input-bounder. And if we zoom in a lot, you can actually see that the bottom element here, this tf. record, waiting for data to load from the file system. We process things in parallel quickly and take a while to transfer them to the device over PCIE. This presents a pipelining opportunity. Now, to give you a bit of intuition for what I mean, input pipelines you should mentally associate with ETL. Extract is the first phase where you load the data from storage. Transform phases to prepare for training. And finally load into the accelerator. Not just in the API, but a useful mental model for performance. To give you a bit of intuition for that, each of the different phases of ETL use different hardware components in your server system. The extract phase is emphasizing the disk in the storage system or the network link if you're from a remote storage system. Transform typically happens on CPU and it's CPU-hungry. And your load phase is emphasizing the DMA, your connections to your accelerator. This is true with a GPU, TPU or any other accelerators you might be using. And so what's going on is if you map this out over time, you're extracting and while you're extracting, you're doing nothing with the CPU. And during a transform phase, doing nothing with the connection between the CPU memory and the accelerator. And training, the entire CPU and the rest of the machine is sitting idle. This is incredibly wasteful. Because they're all using different components in the system, you can actually overlap all of this in a technique called software pipelining. So, you are extracting for step five, transforming for step four, loading data for step three and training for step two. This is an efficient use of your compute resources. You can train faster. You'll notice in a well-pipelined model, your accelerator will be 100% utilized. But it's possible that your CPU or disk will be a little bit idle. That's okay. Your accelerator is typically your most protecter resource. That's the bottleneck. It's okay if the others are slightly faster than you need them to be. How do you enable software pipelining with datasets? Set numb Juried parallel Juried reads to 32. And it's using party interleaf. A key dataset transformation that anaI believes pipelining. And you can set the prefetches right at the end. And that means everything is pipelined with everything below. Your extraction is pipelined from your loading into your accelerator. Now, one thing I want to mention, when you net numb numb_parallel_reads is equal to 32, you're using parallel_reads. We have conflated these in the API because we believe that distributed storage is critical going forward for machine learning workloads. Why is that? As we have the research, we see that datasets are going to become larger and larger over time. So, you're really going to need -- they just won't fit on a single machine. You need to distribute them across a cluster. Additionally, when you have data disaggregated from your accelerator nodes, it means you can more efficiently share your accelerator nodes. If you're training on them today or tomorrow and in three minutes someone else wants to train, you're not copying datasets around. It's easier to use. And finally, it makes it a lot nicer doing large-scale hyperparameter searches. You have one cluster and a fungible pool of resources. We believe that districted resources are important and we have worked hard to make that fast with tf. data. So, what happens when you do this? It turns out, the cloud TPU, you'll get over 1700 images a second. With these optimizations. So, we're now about 12 times faster than our initial input pipeline with less than 60 characters worth of typing. So, that's pretty good. But we can do better. If you capture the trace, you'll actually see that our transform step is slightly longer than our accelerator trading model time. The TPU is just too fast. And we need to break out some advanced optimization techniques that are available today. One of the most powerful ones is to use these fuse dataset operators, map and batch, shuffle and repeat, fusing together the operations to improve performance on your CPU. Tf.data works hard to ensure that the elements produced out of your dataset by your iterator are in a deterministic order. But if you give tf.data the opportunity to reorder, we can enable performance optimizations. We can use sloppy interleave underneath the hood, working around variability in similar storage systems. There's other tweaks you can do. And apply them together in this optimized input pipeline, we get over 2,000 images a second and we are now accelerator bound. Here is TensorBoard, and everything is 100% orange. We are entirely accelerator bound and it's churning 100% of the time. This is great. We can now start looking into optimizations that we can do on the TPU to make the TPU faster. We can see that our CPU is idle. We can see that there's some overhead, reshape and copy that we might think about optimizing away with some device-specific optimizations. Which brings me to phase two. Now, as I mentioned before, we're in this sort of Cambrian explosion. We're still in the early days, we're finding that a lot of the accelerators work differently. Some chips are smaller, some chips are bigger, some chips use HBM- HBM-2, for example, TPUs and CPUs. Some do away with that entirely, Graphcore's IPU and optimized for communication. And it's hard to provide out of the box performance recommendations that are going to apply to all of these different hardware platforms. That said, there's a few common things if we peer into the future, gaze into our crystal balls, we think this is what we're going to see more of. What I expect to see a lot of is interesting numerical formats. This is a 32-bit format we know and love. Most models today have been trained in fp32. There is great work at NVIDIA and Baidu, and you can train an fp16, pf32, but the activations in fp16. This is a big win on two dimensions. You can run a larger model because more layers fits in memory. But additionally, your model tends to run faster, because accelerators today, GPUs and TPUs are not compute-bound. They're actually memory bandwidth bound. The memory is too slow. So fp16 can unlock a lot of great performance on devices. But one other floating point format, there's bfloat 16. And this is different than fp16. Even though it uses just 16 bits. The range is us same as fp32. You don't worry as much about vanishing or exploding gradients and NANs that you might when using fp16. There are a number of numerical formats such as flex point. Folks from Intel parenned at NIPS is in a poster section. We are going to make it easier to use the numerical floating formats. Stay tuned for APIs in this space. Another hardware trend is optimization nor matrix optimization. Especially in this mode. In Volta GPUs, they have tensor cores with. TPUs are built around a 128X128 matrix unit. This is a systolic array, this tran sisser configs ration that makes it easy to compute convolutions. And here is showing a systolic array does. It's named after the heart, which is pumping blood in these cycles. Plumbs through the data and you get fast matrix multiplication. What this means, because we're seeing hardware-supported matrix multiplication at different sizes and scales, the way you lay out the data and implement it, can make a huge difference on performance on these accelerators. Here I'm calling out two different models running on GPUs. You use channels last, you end up losing a fair bit of performance compared to a channels-first implementation. If you compare different LSTM cell implementations, the folk the at NVIDIA worked really hard to make really fast kernels for LSTMs and make it available as part of the package. If you're using a GPU and want better performance, use the optimized libraries for your platform. This is true. You need to use the latest version of TensorFlow. We're constantly working on performance improvements, and cuDNN and Intel MKL we talked about today. And investigate 16-bit numerical representations. We see a lot of potential performance advantages there. And doing inference, we have talked about TensorFlow RT, which is available for NVIDIA platforms that can quantize and make inference really fast. And as part of the technique, if you see, for example, that a particular computation is a bottleneck, you can substitute it with somethings that computationally faster. You have Tor careful because you may change the quality of the model doing this. With that, I would like to move on to phase three. Now, typically, when you use an accelerator, you're actually using more than one accelerator. You're using a single device, it may have different components that operate in parallel. Here is, for example, a picture of the NVIDIA DGX-1 and shows the connectivity between the GPUs. You have two of four each with connectivity between them. If you don't take advantage of the topology, if you do a naive gradient within the server going via the CPU or TCIP switches, it will be a significant disadvantage due to a clever utilization by nickel and CCL-2. We have an optimizeed implementation available as part of the benchmarks, but it's a little tricky to use. We are working on making this easy for everyone to use in distribution strategies. And you'll hear more about distribution strategies in just a few minutes. TPUs, you also need to carefully aggregate your gradients. And we have the cross John Boehner shoredoptimizer. You take the existing SGD and just wrap it with a TPU cross shardoptimizer. This will aggregate across the compute shards within a single device. But the exact same code works all the way up to a whole cloud TPU pod across 64 different devices. Now, I want to take one moment to actually talk a little bit about measuring performance. I guess the saying goes, there's lies, damn lies and statistics. Well, I'm going do add a fourth one. Performance benchmarks. The Internet is pleat with shoddy benchmarks and misinformation. And this irks me to no end. We have seen benchmarks, they use synthetic data or measuring only certain subsets. You have incomplete comparisons. One benchmark is comparing the full device. One is comparing only one part of the device. We've seen bugs in the machine learning where they've optimized a way, or done performance tricks that make it run faster, but actually make it not converge to the same accuracy. You've lost quality of your model. Additionally, as we look forward, this is actually, to be fair, a nuanced space. And hardware is harder and harder to give an apples to apples comparison. We have different numerical formats and different algorithms fit better on different hardware. Some chips have small amounts of memory. If you have a very, very big model that can't fit, that's a very unfair comparison. As a result, I strongly encourage you, if you're trying to choose and evaluate different hardware platforms, take your workloads and measure them end to end to the accuracy you want in the application. That's a fair amount of work. If you can't run your own workloads, look to quality end to end benchmarks that measure accuracy. And I think a good example is Stanford's DAWNBench. There's nuance in the parameters of how they're set. For example the dataset size, a small dataset size or a large dataset size. And there's nuance in how to set the accuracy threshold. Despite these, it's a lot harder to perform well on an end to end benchmark and not work on real work. You're less likely to be mislead looking at the end to end benchmarks. That said, we're pushing for the end- end-to-end benchmarks, there's a lot of utility in microbenchmarks to understand how fast are the components. When I was optimizeing ResNet50 for the cloud TPU case, how did I know it was slow? As Derek mentioned, input pipelines can have over 13,000 images a second. That's using VDG pre-processing. This pre-processing is computationally cheaper than the ResNet on Inception pre-processing, but shows we can go really, really fast. ResNet50 on a DGS-1, this is using a mixed precision 16. This is have TensorFlow nightly. This is the performance you can expect in the future. Now, if you want to test the performance of the GPUs themselves in isolation, that's about 6.1 synthetic images a second. You are excluding the cost of your input pipeline. For a cloud TPU we have a few other microbenchmarks. For TensorFlow 1 1.7 that's available today, you can expect to achieve 2 . 6-5,000 images a second, mixed flow B16. You're streaming the data in with a batch size of 32, and get over 76% accuracy in about 13 hours. If you lop off the input pipeline and just test the device performance, you're actually over 3200 images a second, which is very cool with TensorFlow 1.7. And with TensorFlow nightly, coming in TensorFlow 1. 8, we have optimized the input pipeline performance, and you go from 2600 images a second to over 300 image 0 images a second in TensorFlow 1.8. Very exciting. A lot of work happening underneath the hood. As we stare into the future even further, what is coming in TensorFlow with performance? This is actually our optimized input pipeline, or something very close to it, and you'll notice there's a lot of magic numbers we have hand-tuned and picked out. How did we choose them? We spent a lot of time playing around with it and picking them out. Do you need to do that? There's no reason you need to do that. We can autotune a lot of these values and we're working on adding in smarts to TensorFlow to tune your pipelines. This is true not just for the magic numbers, but for these fused dataset operations functions, we'll be working on switching a naive straightforward implementation to use the functions underneath the hood. If you give us permission, we'll be able to adjust things where we're not preserving necessarily the order, but we can do the right things for you. Automatically tuning the prefetch buffer size, that last line, that's available and coming in TensorFlow 1.8. We're not just working on optimizing the input pipelines, we're working on optimizing in-device performance. There's XLA and grab grappler. And they're rewriting the model to work well on different platforms. There's exciting work here that we will be excited to share with you over time. There's a lot more reading and there's a huge amount of literature on this. Check out some of these things. If you want to learn about reduced precision training, here are references. And I'll Tweet out this link shortly. They'll be available and you can load them up into TensorBoard and play around with them yourself. With that, thank you very much for listening to me today. [ Applause ] Next up is Mustafa who is going to talk about TensorFlow's high-level APIs. >> Thank you, Brennan. Hello, everybody. [ Applause ] My name is Mustafa. Today I'll talk about high-level APIs, but keep practitioners like you in mind. To keep you in mind, we'll provide an example to increase user happiness by the power of machine learning. After defining the example project, we'll use pre-made estimators to start our first experiment. Then we'll experiment more with every feature we have by feature columns. And we'll introduce a couple of pre-made estimators that you can experiment more. And we learn, how can you experiment with other modeling ideas too? So, those are the topics we will cover in this talk, and talk about how to scale it up and how you can use it in your production. Let's talk about estimators. So, it's a library that lets you focus on your experiment. There are thousands of engineers. This is not a small number. And hundreds of projects in Google who use estimators. So, we learned a lot from their experiments. And we created our APIs so that the time from an idea to an experiment will be as short as possible. So -- and I'm really happy to share all this experience with all of you. Whatever we are using internal at Google is the same as the open source. So, you all have the same things. Estimator keeps the model function. We'll talk about what the model function is later, but it defines your network and how can you train or what is the behavior during the evaluation or during the export time? And it provides you some loops such as training, evaluation, and it provides you some interface to integrate with tf. Also, estimator keeps sessions so you don't need to learn what is tf. session. It handles it for you. But you need to provide data. And as Derek mentioned, you can return a tf.data set from your input function. So, let's define our project and start with our experiment. I love hiking. This is one of the pictures I took in one of my hikes. And let's imagine there's a website, hiking website, similar to IMDB, but it's for hiking. And that website has information for each hike, and users are labeling those hikes by saying, I like this hike, I don't like this hike, this is my rating, and all this stuff. And we want to use this data. Let's imagine you have this data from that website. To recommend hikes for users. How can we do that? There are many ways of doing it. Let's define one way machine learning can help us. In this case, we want to predict probability of like. Whether a given user will like a given hike or not. What you have, you have hike features and user feature features. And where can you learn from? The label data, you can have whether users like that hike or not. So, whack we use to predict if they like a hike? You can use one of the pre-made estimators. In this case, the pre-made estimator. It's a binary estimation problem. We designed estimators to go to this kind of problem. This means you can use it as a black box solution. Pre-made estimators are surprisingly popular within Google and in many projects. Why? The engineers are using pre-made solutions instead of building their own models. I think, first of all, it works. It handles many implementations so you can focus on your experiment. It has reasonable defaults for initialization, partitioning, or optimization so you have a reasonable baseline as quick as possible. And it is easy to experiment with new features. So, we learn about that. You can experiment with all of your data by using the same estimator without changing it. So, let's jump into our first experiment so that we can have a baseline that we will improve. I will talk about it, but in this case, you are useing hike _id, it might be hike name, as an identification to your model. And say you have hidden_units one. What this will learning with it will learn the label for each hike idea. That may be a good baseline for your overall progress. You need to say what is your evaluation data, what is your training data? Then you can call train and evaluate. Just by this couple of lines of code, you should be able to experiment. And you can see the results on the TensorBoard. For example, you can see training and evaluation. Or how the metric is moving. Since this is a classification problem, you will see accuracy metric. And this is a binary estimation, you will see that. All of these things are free and ready to be used. Let's experiment more. Let's start with the data. Experimenting with the data itself. The design feature columns with the same mind-set. We want to make it -- make it easy to experiment with your features, with your data. And based on our experience -- internal experience -- it reduces the lines of code and may improve the model. There are a bunch of transformations you can handle via feature columns. These are bucketing, crossing, hashing and embedding. Each of these needs a careful explanation. Unfortunately I don't have enough time here, but you can check Magnus's tutorial and the video, they are very good. Let's experiment with all the hike features we have. Each hike may have text such as kid friendly, dog friendly, birding. You may choose indicator column instead of embedding column in this case because you don't have a huge number of tags. You don't need the dimension. And for a numerical column, such as each hike may have elevation gain, you need to normalize so that optimization will be well-conditioned. Your problem with condition. And you can use a normalizer function here. Or, you may choose buckettizing. In this case, the distance of the hike, we bucket-ize it so it will learn the model for different things for different segments. You can concentrate as a different kind of normalization too. How can you use all of these things together? Just putting them into a list, that's it. Then your system should work. So, let's experiment with personalization. What we mean by personal ization is instead of recommending the same hikes to all users, let's recommend different hierarchies for different users based on their interests. And the way -- one way to do that is using user features. In this case, we are using user embedding by embedding_column. So, this will let the model to learn a vector for each user and put the users closer if their hike preferences are similar. And how can you use that? Again, it's just depending into your list. And you need to also play with hidden units because rough minimal features now and you need to let your model to learn different transformations. And the rest of the pipeline should work. You will hear this a lot during this talk because it's based on that. The rest of the pipeline should work and you should be able to analyze your experiments. Let's experiment more. We have a couple of pre-made solutions. I mentioned it's very popular and I picked only two of them here to show. One is wide-n-deep, it's a joint training of the neural network and the model. You may like it or not. So let's start the experiment. You need to define what are the features you want to fit to the neural network. Again, via feature column, it's the list. And you need to define the features to feed into the linear part. In this case, user id and picks. For example, if a user always picks dog of had friendly-friendly hikes, the model will learn this feature. And you can instantiate dnn and the rest of the pipeline should work. Baseed on the 2017 Kaggle survey, they are very popular. And we are introducing gradient boosted trees as a pre-made estimator. And you can experiment without changing your pipeline. Let's start our experimentation. In the current version, we only support buckettized column. And we are working to support numerical column and cot categorical column too. Here is hike distance and hike elevation gain. And we buckettize them. And then you can have the classifier and the rest of thepipe line should work. We know that trees are those at computationally expensive -- or training trees is not as computationally expensive as training neural networks. And they fit into memory. So, by leveraging that, we provide you a utility so that you can train your model in order of magnitude faster than the usual one. And the rest of the pipeline should work. So, let's say this solutions are not enough for you and you want to experiment more ideas. Let's talk about them. Before delve into this high-level solutions that you can use, let's look at a network in a supervised setting. In this case, you have a network which you fit the features. And based on the output of network and the labels, you need to decide what is the loss? What is the objective you want to minimize? And what is -- what are the metrics that you will use as a success metric for your evaluation? And your predictions on serving time may be didn't than in training time. For example, if you have a large setting, you may want to use just the ranking of the classes instead of the priorities in the serving time. For that you don't need to calculate the priority. You can use the losses to rank them. For all of these, we can sectored them out under head API. It expects you to give the label and out of your network and provides these things for you. You'll see it in action. And model function is an implementation of this help and network together. We talk about the DNN class 5. DNN class 5 has a model function. It has a specific implementation for head network. Let's implement this with the head API. In this case, DNN estimator. And we can instantiate ahead. In this case, it's binary classification head because we are trying to predict whether it's like or not like. And why are we introducing this head since these two lines are the same as DNN classifier? Why to introduce this head? So, you can experiment with different ideas by combining different network architectures and different heads. For example, you can use wide-ended, or DNN estimator with a multi-label head. You can even combine different heads together, we introduced multi-head here. So, it's the one way of experimenting with the multi-task learning. With a couple lines of code you can experiment with multi-task learning. Please check it out. Say the architectures are not enough for you and you want to expand more, you can write your own model function. We strongly recommend to you to use TFK layers to build your network. You can do whatever you want there. You can be as creative as possible. And after you have the output of network, you need to pick one of the optimizers available in TensorFlow. And you can use one of the heads we mentioned which will convert your network had output and the labels into training behaviors or evaluation metrics or expart behavior export behaviors. Then you can fit to the estimator. Again, the rest of the pipeline should work. Keras model is another way of creating your model. And it's very popular, it's very intuitive to use. For example, this is one of the Keras models you can build. So, how can you get this estimator so that the rest of the pipeline should work? You can use model to estimator which gives you the estimator so you can run your experiments without changing your pipeline. Transfer learning is another popular technique we do. Experiment with. One way of extending is using model A, which is already the trend, to improve the prediction of model B. How can you do that? Surprisingly, just copying and transferring from model A to model B works. That's simple, but it works. And we provide that for you. You can use one start -- this one line is saying that transfer all of the model A into model B. Or, you can define a subset of model A to transfer from model A to model B. Let's talk about image features. We talk about embedding categorical column and image features. But what if you have image features, how can you use them in your pipeline with a couple of lines of code? You can implement one of the image classifiers. Which is not a couple of lines of code. Or you can, thanks to TF-Hub, you will learn later, you can use one line from their hub to instantiate the feature column, called image embedding column. In this case, you may remember Jeff mentioned AutoML. It is one of the AutoML models. It's really good. It's one of the top models you can use. Here you will use nasnet as a feature. It will use only the optimizeer of nasnet into your feature. How can you use it as the classifier? Just depending it into your feature columns, and then done. You can experiment with it. Let's say you experimented and you find some models, but you need to scale it up. Not all of you, but some of you may need to scale your training. You can use multi-GPU, means multiplication on different GPUs. You can learn about that after my talk. You don't need to change the estimator or model code. Everything should work with just a single line of configuration change. Or, you may want to distribute your training to multiple machines by saying these are workers, these are primary servers and there's very little going on. Same. You don't need to change your estimator or model code. Everything should work based on the configuration. Or, you may want to use TPU estimator for TPU. There's a minimal change in the model function. Hopefully later we will fix that too. But now there's a minimal change in your model function you need to do. To use this in your production, you need to expart. Or you can -- you need to serve. And we recommend you to use TF serving. In the serving time, instead of reading data from the files, you have a request. And you need to define the receiver function, which is defining how can you connect that into the model? In this case -- and after that, what would be the output of that model? Which is defined by a signature definition. So, here, again, a couple of lines of code. You will export your chain model with the TF serving. For example, if your request is tf. example, you will use this function to get your receiver function. And you can use export set model so that it will be used by TF serving. These are the modules I mentioned, tf.estimator, tf. feature Curdy column. Don't use tf.learn, we are deprecating it. And these are a couple that I picked. You can check it out. Thank you. I hope some of you will improve your products with the tools that we mentioned. Thank you. And I'll introduce Igor. Igor will talk about how you can do distribution with TensorFlow. And he's coming. Yes. He's coming. >> Hey. Hello, everyone. My name is Igor. I'm going to -- I work in the TensorFlow team and I'm going to talk to you today about distributed TensorFlow. Well, why would you care about distributed TensorFlow? Many of you know the answer, probably. But just in case, it's a way for your models to train faster and be more parallel. It's a way for you to get more things done, iterate quicker. When you train in models, it can take a long time. And when I say a long time, I mean week weeks. Without With all the available hardware to you out there, scaling up to hundreds of CPUs or GPUs are really make a difference. How could you scale up? Well, you could just add a GPU to your machine. In this case, this is just plug and play. You insert a GPU, TensorFlow handles all the details for you and you see a nice bump in the training speed. You could also insert multiple GPUs. In this case -- in this case, you would have to write additional code. You need to replicate your model. You need to combine gradients from every GPU and if you're using batchnorm layer, you have the tricky question of what to do with the statistics on each GPU. The point I'm trying to make is that you need to do additional work to make this work. And you need to learn stuff that you didn't plan on learning. You can also use multiple machines. And this situation is similar to the one before. But in this case, your bottleneck is probably going to be that communication between the machines. You'll start thinking about minimizing that communication and probably doing more work locally. For example, combining the gradients in the local GPUs before exchanging them with the remote GPUs. Unless specialized network and hardware is used. The coordination costs in this setup are probably going to limit your scaling. But there is an approach -- there is a solution to this. This approach is called parameter server. Some hosts we call them parameter servers. They're going to only hold training weights. Other hosts, workers, they're going to have a copy of the TensorFlow graph. They're going to get their own input, compute their own gradient, and then just go ahead and update the training waits without any coordination with other workers. So, this is an approach with low coordination between a large number of hosts. And there you go. This scales well. And we have been doing this at Google for a long time. But there is a wrinkle with this approach. You give up synchronicity. And that has benefits. And if you think about it, parameter server approach, it's an approach from the -- from the CPU era. With all the reliable communication between GPUs, we can consider designs which have tighter coupleing and more coordination between the workers. One such approach is based on overages. That's not a new idea. The general goal of all-reduce is to combine all the values and distribute results to all the processes. All- reduce is kind of tricky to explain in this light. But you can think of its results as reduce stipulation followed by a broadcast stipulation. But don't think of it that way in terms of performance. It's a fused algorithm. And it's way more efficient than those two operations together. In addition to it, hardware vendors -- hardware vendors specialized or reduce implementations that TensorFlow could secretly use behind the scenes to help you. Alternative approaches. They typically send all data to a central place. All-reduce is not going to have such a bottleneck because it distributed coordination between GPUs way more evenly. With every tick of all-reduce, each GPU sends and receives a part of the final answer. So, how could all-reduce help us with our models. Well, consider -- let's say you have two GPUs. You copied layers and the variables in every GPU and you performed the forward pass, nice and parallel. But then during the backward pass, as the gradients become available, we can reduce -- we can use all-reduce to combine those gradients with the counterparts from other GPUs. In addition to that, because of the way -- in addition to that, gradients from the other layers are available before the gradients from the other layers. So, we could overlap backward propagation computation and reduce communication. That gives you even more per second. The bottom line is, when all -- when communication between GPUs is reliable, all-reduce can be fast and allow you to scale well. How could you use all-reduce in TensorFlow? Well, so far in this talk I told you to take advantage of multiple GPUs you need to write additional code, change your model and learn stuff. Chances are you're using -- you're following -- of using the highest level API that works for your use case. That probably isette is Estimator. It is the model function, it has no knowledge about GPUs or devices. So, to have that model use multiple GPUs, you just need to add one line. You need to pass an instance of a new class called MirrormirrorStrategy. And it is one implementation of our new distribution strategy API. TensorFlow, how to replicate your model? Oops. Sorry. Another thing I want to say is that MirrorStrategy could take a number of GPUs or a list of GPUs to use. Or you cannot give it any arguments at all and then it will just figure out what GPUs to use. MirrorStrategy works in a way exactly as I described before. It replicates your model, it uses all-reduce for communication. So, gradient updates from every GPUs, from all GPUs, they're going to be combined before updating the waits. And each copy of your model, an average GPU, is part of a single TensorFlow graph. That means there is the replication with synchronous training that uses all- all-reduce on many GPUs. Now, the last ten minutes are kind of a waste of time for you if this doesn't perform well. And it does perform well. As you add GPUs, this implementation scales well. We have a team at TensorFlow that specifically works on fast implementations of all-reduce for various machine configurations. And this implementation gets 90% scaling on 8 GPUs. And, again, it didn't require any change to the -- to the user's model. It didn't require any change are because we changed everything in TensorFlow that's not your model. Things like optimizer, bench norm, summaries, everything that rides state, now needs to become distribution aware. That means it needs to learn how to combine its state with other GPUs. And this is important because alternative APIs out there, they typically ask you to rephrase your model to supply optimizeers, for example, separately so that they can do all the state coordination behind the scenes. And if you have some experience with training your models on multiple GPUs, you might be wondering, well, can I -- can I save my model own a computer with 8 GPUs and then do an evaluation on it on a computer with, say, no GPUs? Typically this causes a problem. But with distribution strategy API, we maintain backward compatibility on the checkpoint level. So, Mirror Mirrorstrategy, it has multiple copies on the GPU. It's going to save one coupe, and then at the restore time, it's only going to restore that state to a required number of GPUs. So, this use case is supported. Distribution strategy works with Eager mode as well. But we are still fine-tuning the performance. And distribution strategies are a very general API that I hope in the future will support many use cases. It's not tied to Estimator and we are looking into ways of creating even more -- better APIs based on distribution strategy. We -- in the future, soon, pretty soon -- we intend to support all kinds of -- many kinds of distributed training. Synchronous, asynchronous, multi-node, parallelism, all of that is as part of distribution strategy API. Until then, for multi-node, use estimator.train and evaluate. Or Horovod, that offers a multi-node solution. MirrorStrategy is available for you in our nightly build. And we are very, very actively working on it. And it's a product of work of many people. And I would really encourage you to try it out and let us know what you think about it. By GitHub or talk to us after my talk. All right. Thank you. Thanks for your attention. [ Applause ] Next up is Justine and Shanging to tell you how to debug TensorFlow using TensorBoard. >> Well, thank you, everybody, for being here today. We're going to be giving a talk about the new TensorFlow debugger,, which comes included with TensorBoard. It's basically a debugger like you would see in an IDE that lets you step in separate points and models and watch tensors. But before we do that, I would like to give you some background on TensorBoard and some of the other developments which happened in the last year. Which we unfortunately don't have too much time to go into. But TensorBoard is basically a weapon application. It's a suite of weapon applications that was authored by about 20 people. And it's all packed into a 2 megabyte command web server that works offline. And TensorBoard can be used for many purposes with the different plugins baked into it. The one you're all most familiar with for those who have used TensorBoard is the scalers dashboard. You can plot anything you want. It could be like loss curves, et cetera, accuracy. And these things like sort of help us understand, like, whether or not our model is converging on optimal solutions. And here is the really interesting, underutilized feature called the embedding projector. And this was originally written by Google so we could do things like, you know, project our data into a 3D space, see how things cluster. doing MNIST, the 7s here and the 9s here. And we actually recently -- what you see on the screen is we got a really cool contribution from Francois at IBM research. He sent pull requests on the GitHub repository, since we are in the open. He added interactive label editing. So, you can sort of like go in there and change things as algorithms like TSNE, give your data -- sort of reveal the structure of your data. To learn more, search Google for interactive super vision with TensorBoard. This is another really amazing contribution that we received from a university student named Chris Anderson. It's called the holder plugin. And this basically gives you a real real-time visual glimpse into TensorFlow data structures. Like, for example, as you're training script is running, it's real-time. It doesn't require a hard drive. It doesn't work with something like GCS at this point in time. I think this would be a very useful tool going forward in terms of model explainability. Now, TensorBoard also has some new plugins for optimization. Cloud recently contributed a TPU profiling plugin. And TPU hardware is a little different from what many of you might be used to. And TensorBoard, with this plugin, can really help you get the most out of your hardware and ensure that it's being properly utilize utilized. Now, the TensorBoard ecosystem, part of the goal of this talk, before we get into the demo, is I want to attract more folks in the community to get involveed with TensorBoard development. We use many of the tools you're familiar with such as TypeScript and Polymer. We also use some tools you might not be familiar with, luck Bazel, for good reasons. You can go to the Readmes for the plugins we wrote originally. Now, with TensorBoard, the reason this is just a little bit more challenging compared to some of the other web application you may have used or written in the past, is we deal with very challenging requirements. Like, this thing needs to work offline. It needs to be able to build, regardless of, like, corporate or national firewalls that may block certain URLs when it's downloading things. For example, one of the first things I did when I joined the TensorBoard team wasn't actually visualizing machine learning, but adding a contribution to Bazel which allows downloads to be carrier grade internationally. And there are a whole variety of challenges like when it comes to an application like this. But those burdens are things we've mostly solved for you, and here is a concrete example. Writing that toilsome thousand line file was what it took to make TensorBoard look good anywhere in the world without having to ping. That is one of the many burdens that the TensorBoard team carries on behalf of plugin authors. Now, I want to give you a quick introduction for Shanging who is the author of this TensorFlow debugger. And with the help of Che Che Zhang. As I mentioned earlier, TensorBoard has been the flash light that's giving broad overviews of what's happening inside these black box models. What the TensorFlow debuggers does, it turns that flash light into an X had-ray. Using this plugin, you can literally watch the tensors as they flow in real-time while having complete control over the entire process. This X-ray is what's going to make it possible for you to pinpoint problems we've previously found difficult to identify. Perhaps down to the tiniest nan at the precise moments they happen. That's why we call it an X-ray. It reveals the graph and math beneath the abstractions we love, Keras or Estimator, or as was announced today, swift. Whatever tools you're using, this could potentially be a very helpful troubleshooting tool. I would like to introduce its author, Shanging, who can show you a demo. >> Thank you very much. [ Applause ] Okay. So, in a moment the screencast will start. Great. Thank you, Justine for the generous intro. I'm Shanging. And I'm glad and honored to present the feature plugin for TensorBoard. Among the many createed for TensorBoard so far. For those that know TensorFlow debugger or TFDGB, it's only had a can command line interface until recently. Like the command line interface, the debugger plugin allows you to look into internals of the run in TensorFlow model, but in a much more intuitive and richer environment in the browser. In this talk I'm going to show two examples. One example of how to use the tool to understand and probe and visualize a working model that doesn't have any bugs in it. I'm also going to show you how to use the tool to debug a model with a bug in it so you can see how to use the tool to catch the road cause of problems and fix them. So, first, let's look at the first example. And that's the example on the right part of the screen right now. It's a simple TensorFlow program that does some regression using some generateed synthetic data. And if we run program in the con console, we can see a constant decrease in the loss value during training. Even though the model works, we have no knowledge how the model works. That's mainly because in graph mode, the sessions are run as a black box. That wraps all the computation in one single line of Python code. What if we want to look in the model? Look at the matrix multiplication in a dense layer and have a gradient and so forth? The TensorFlow debugger or TensorBoard debugger plugin as a tool can allow you to do that. To start the tool, we start the TensorBoard with the flat debugger port. We specify the port to be 7,000. Once it's running, we can navigate to our TensorBoard URL in the browser. At the startup, the plugin tells you it's waiting for connections from TensorFlow to run. That's because we haven't started program yet. And code snippets, estimators or Keras models. In this model, we're using tf. session. The first line is an import line, and the second line is a line that wraps are the original objects with a special wrapper that has the information where to connect the port number. Now, without our -- with our program implemented we can start program again. Now, as soon as program starts, we can see the graphical user interface in the browser switch to a mode that shows you a graph of the sessions that are run in two ways. In a tree view on the left and in a graph on the right. On the bottom left corner, you can also see what session is currently executing. The tree structure on the right corresponds to name scopes in your model. For example, the dense layer -- the dense name scope corresponds to the dense layer. You can open the source code to look at the correspondence between the graph nodes and the lines of the path and program created those programs. If you click nasnal, you can see which line of the path and source code is responsible for creating that node. In this case, it's dense layer. As expected. If you click another -- if you click the last tenser, you will see the corresponding node in the graph. You can see where it's created in the path and source code. It's where we call the me squared error. And the gradients, name scope corresponds to the back propagation part of the model. You can click around, poke around and explore how a TensorFlow model does optimization and propagation if you are interested. And these nodes are created when we created the decent optimizer. You can continue to any node of the graph and pause there. So, we have just continued to the m node in the dense layer. And you can continue to the gradient at the matmul. And we did. And you can see the summaries of the tensor values. You can look although the data type, their shape and also the range of their values. So, in the so-called health pills, you can look at how many of those values are zero and negative or positive and so forth. You hover over those, you can get more information such as the mean and the standard deviation of the values in the tensor. So, next, we can click these links to open a detailed view of the tensors. You can apply slicing to reduce the dimensionality so it's easy to look at the values. We have reduced the dimension from 2 to 1, looking at it as a curve. Now continue to the loss tensor. Which is a scaler. And yep. It's a scaler. And the shape is an empty list as we can see here. We can switch to the history -- full history modes. We can look at how the value changes as it is being trained. So, with the full history mode enabled, we can continue other the sessions. Like 50 of them. We can see in real-time how the loss value decreases and how the value on the Matmul changes. That's how you can use it as an X-ray animater for the models to have a better understanding how your model works. Next, let's look at a broken model. That's the debug model we ship with TensorFlow. That's the only broken model we ship with TensorFlow as far as I know, and I'm proud to be the author of it. We can see that the model doesn't quite work. After two iterations of training, the accuracy is stuck at about 10%. We suspect there might be bad numerical values like not a number or infinities. But we're not sure which nodes of the graph are responsible for generating those infinities. To answer that question, we can use the debugger tool. We do a refresh in our browser. And then we can start our debug MNIST example to connect to the debugger plugin. So, again, we're looking at the graph. Now, in order to find the nodes responsible for that infinities, we can look at the watch points. And use the conditional break points feature to continue running the model until any tenser includes it. You are seeing a list of tensor values. Those a complete list of tensors involved in training the model. In a moment, the model is stopping because it hit an infinity in the tensor, cross entropy/log. We can see in the health pill and see in the detailed tenser view those orange lines. Showing you the infinity values. Now, the question is, why do those infinity values happen? So, we can go back to the source code and find the line of Python code where it's created and that's where it's tf.lot. And we can open up the graph view and we see the inputs. So, we can trace the inputs. In this case, the input is the softmax tensor. We can expand and highlight and look at the value of the inputs, which is softmax. There are, indeed, five values of the is tensor. And the reason for infinity is because we have log of zero. With that knowledge, we can go back to the source code and fix it. We're not going to do this in this demo here. All right. So, that's the TensorBoard debugging. And I encourage you to use it, explore it. And hopefully it will help you understand your model better and help you fix bugs much more quickly. You can just use this simple command line, TensorBoard, with a special flag. With that, I would like to hand this back to Justine. [ Applause ] >> Well, thank you, Shanging. I thought that was a really interesting demo. And it is -- and it was a great leap forward for TensorBoard. And it really shows that one of the things we have been doing recently is, rather than being a read-only reporting tool, we're trying to explore more interactive directions as we have shown you today. These are things folks who are productionizing TensorBoard, such as cube flow, should take into consideration. We want to attract more contributors. We have two approaches for this where you can develop an artificial repo and send us pull requests. We do our work in the open. This does need approval on security footprint, et cetera. And there is an escape hatch if that doesn't work out. You can independently develop plugins, you can create custom static builds without anyone's approval. You can do whatever you want. Because part of the goal on this team is to liberate the tools. With that said, I want to thank all of you for attending. And thank you watching on YouTube. If you like this talk, hashtag Tweeter or, you know, reach out. Thank you, again. [ Applause ] Test. Test. >> Hi, everyone. Hope everybody had a good lunch. I'm Sarah Sirajuddin, an engineer on activity. And my colleague, Andrew Aselle, also on the same team. And we are excited to talk about the work we have been doing to bring machine learning to mobile devices. So, in today's talk, we'll cover three areas. First, how machine learning on devices is different and important. Then we'll talk about all the work that we have been doing on TensorFlow Lite. And then how you can use it in your apps. Let's talk about your devices. Usually a device is a mobile device, basically a phone. Our phones are with us all the times. These days they have lots of censors giving rich data about the world around us. And lastly, we use the phones all the time. Another category of devices is edge devices. And this industry has seen a huge growth in the last few years. By some estimates that are 23 billion connected devices, smart speakers, smart watches, smart sensors, with what have you. And technology that only was available on the most expensive devices is now available on the cheaper ones. So, this rapid increase in the availability of these more and more capable devices has now opened up many opportunities for doing machine learning on device. In addition to that, though, there are several other reasons why you may consider doing on device machine learning. And probably the most important one is latency. So, if you're processing streaming data such as audio or video, then you don't want to be making calls back and forth to the server. Other reasons are that your processing can happen even when your device is offline. Sensitive data can stay on device. It's more power-efficient because the device is not sending data back and forth. And lastly, we are in a position to take advantage of all the sensor data that is already present on the device. So, all that is great. But there's also a catch. And the catch is that on-device machine learning is hard. And the reason it is hard is that many of these devices have some pretty tight constraints. Small batteries, low compute power, tight memory. And TensorFlow wasn't a great fit for this. And that is the reason we built TensorFlow Lite. Which is a light-weight library and tools for doing machine learning on embedded and small platforms. So, we launched TensorFlow Lite late last year in developer preview. And since then we have been working on adding more features and support to it. So, I'll just walk you through the high-level design of the system. We have the TensorFlow Lite format. This is different from what TensorFlow uses and we had to do so for reasons of efficiency. Then there's the interpreter, which runs on device. Then there are a set of optimized kernels. And they are there are interface which is you can use to take advantage of hardware acceleration when it is available. It's cross-platform, so, it supports Android and iOS. And I'm really happy to say today that we also have support for Raspberry Pi and pretty much most other devices which are running Linux. So, the developer workflow roughly is that you take a trained TensorFlow model and then you convert it to the TensorFlow Lite format using a converter. And then you update your apps to invoke the interpreter using the Java or C++ APIs. One other thing that I want to call out here is that iOS developers have another option. They can convert the trained TensorFlow graph into the CoreML graph and use the CoreML run time directly. And this TensorFlow to CoreML converter is something we worked on together with the folks that built CoreML. There are top of the mind every time you talk about TensorFlow Lite. The two most common, is it small in size? And is it fast? Let's talk about the first one. Keeping TensorFlow Lite small was a key goal for us when we started building this. So, the size of our interpreter is only 75 kilobytes. When you include all the supported ops, this is 400 kilobytes. Another thing worth noting here is a feature called selective registration. So, developers have the option to only include the ops that their models need and link those. And thereby keep the footprint small. So, how do we do this? So, first of all, we have been pretty careful in terms of which dispense dependencies we include. And TensorFlow Lite uses flat buffers which are more memory efficient than critical buffers. And moving on to the next question, which is performance. Performance, a super-important goal for us. And we made design choices throughout the system to make it so. So, let's look at the first thing, which is the TensorFlow Lite format. So, we use flat buffers to represent models. And FlatBuffer is it a cross-platform serialization library developed for game performance, and since used in other sensitive applications. And the advantage of using FlatBuffers is we are able to access data without going through heavy weight, parsing or steps of the large files which contain weights. Another thing we do at the conversion is we look at the biases and that allows us to execute faster later on. The TensorFlow Lite interpreter uses static memory and execution plans which allows us to load up faster. There are a set of optimized kernels which have been optimized to run fast on the NEON and ARM platforms. We wanted to build TensorFlow Lite so that we can take advantage of all the innovations that are happening in silicon for these devices. So, the first thing here is that TensorFlow Lite supports the Android neural network API. The Qualcomm HVX is coming out soon. And MediaTek and others have announceed their integration with Android neural network API. So we should be seeing those in the coming months as well. Second thing here is we have also been working on adding direct GPU acceleration. And useing Metal on iOS. So, quantization is the last bit that I want to talk about in the context of performance. So, roughly speaking, quantization is this technique to store numbers and perform calculations on them in representations that are more exact than 32-bit floating point numbers. This is important for two reasons. One, the smaller the model, the better it is for these small devices. Second, many processers have specialized instruction sets which process fixed point operations much faster than they process floating point numbers. So, a very naive way to do point optimization would be to shrink the weights and activations after you're done training. But that leads to suboptimal accuracies. So we have been working on doing quantityization at training time. And we have recently released a script which does this. What we have seen is for architectures and inception, we are able to get accuracies that are similar to their floating point counterparts while seeing pretty impressive gains in the latencies. So, I've talked about a bunch of different performance optimizations. Now let's see what all of these translate to together in terms of numbers. So, these are two models that we benchmarked and we ran them on the Android Pixel 2 phone. We were running these with four threads and using all the four large cores of the Pixel 2. And what you see is that we are seeing that these quantized models run three times faster on TensorFlow Lite than their floating point counterparts on TensorFlow. So, I will move on now and talk about whats supported on TensorFlow Lite. So, currently, it is limited to inference only, although we are going to be working on supporting training in the future. We support 50 commonly-used operations which developers can use in their own models. In addition, they can use any of these popular open source models that we support. One thing to note here is that we have an extensible design, so, if a developer is trying to use a model which has an app not currently supported, they have the option to use what we call a custom op and use that. And later in this talk, we will show you some code snippets on how you can do that yourself. So, this is all theory about TensorFlow Lite. Let me show you a quick video of TensorFlow Lite in practice. So, we took this simple mobile-like model and we retrain on some common objects that we could find around our office. And this is our demo classification app, which is already open sourced. As you can see, it is able to classify these object objects. [ Laughter ] So, that was demo. Now let's talk about production stuff. So, I'm very excited to say we have been working with other teams in Google to bring TensorFlow Lite to Google apps. So, the portrait mode on Android camera, 'Hey Google' on Google Assistant, Smart reply on Google OS, these are going to be powered by TensorFlow Lite in the future. And I'm going to hand it off to Andrew who can tell you how to use TensorFlow Lite. >> Thanks for the introduction. So, now that we see what TensorFlow is, or TensorFlow Lite is, in particular, let's find out how to use it. Let's jump into the code. So, the first step of the four-step process is to get a model. You can download it off the Internet or you can train it yourself. Once you have a model, you need to convert it into TensorFlow Lite format using our converter. There might be ops you want to spot optimize using special intrinsics or hardware specific to your application. Or we don't support. write custom ops for that. You can go to the app and write it using the client API of your choice. Look at the conversion process. We support save model or frozen graph def. And we are showing the Python interface. We give it a direct directory of a save model, and it gives us a FlatBuffer out. Before that's done, there might be some things you need to use to make this work better. The first thing is you need to use a frozen graphdef. A lot of times a training graph has conditional logic or checks that are not necessary for inference. Sometimes it's useful to create a special inference script. Lastly, if you need to look at what the model is doing, the TensorFlow is good, but we have the TensorFlow Lite visualizer and looking at these compared to each other can help you. If you find issues, file them with GitHub and we will respond to them as we get needs. So, lastly, write custom operator. Let's see how to do that. To write it in TensorFlow Lite is relatively simple. The main function is invoke. I've defined an operator that returns PII. A one scaler Pi. Once you have done that, you need to register the new operations. There's a number of ways to register operations. If you don't have custom ops and don't need overriding, you can use in the built-in resolver. But you might want to ship binary that's much smaller. You might want selective registration. You should ship a needed ops resolver or include your custom ops in that same thing. Once you have the ops set, you just plug it into the interpreter. Okay. We have talked about custom operations. Let's see how we put this into practice in the Java API. In Java, you put it in the interpreter, fill in the inputs and outputs, and run run, which will populate with the results of the inference. Really simple. Next, how to include this? Compile a bunch of code? We're working hard to make it so you can use TensorFlow from the PIP and do the training and you don't need to compile TensorFlow. We parade an Android Gradle file and you don't need to compile for an Android app. We have a similar thing for Cocoapods. Once we know how to use TensorFlow, look at the roadmap. As we move forward, we are going to support more and more TensorFlow models out of the box with more ops. Second, add on-device training and look at so you can do hybrid training. Some of it on your server, some on your device. Wherefore it makes sense. Should be an option. And include tooling to analyze graphs better to do more optimizations. We have more that we can talk about that we're working on, but hope this will make you interested and excited to try it. So, there's one remaining question left, which is, should I use TensorFlow mobile or TensorFlow Lite? TensorFlow mobile is a stripped down set of TensorFlow that uses a subset of the ops. We are going to improve TensorFlow Lite and its ability to map to custom hardware. We recommend you target TensorFlow Lite as soon as possible if it's possible. If there's functionality you need in TensorFlow mobile, let us know and we'll work to improve TensorFlow Lite ins a commensurate way. Okay. Demo time. So, nothing like a live demo, right? So, let's switch over to the demo feed and we'll talk about it. And, so, we saw some mobile phones. Mobile phones are really exciting, you know, because everybody has them. But another thing that's happening is these edge computing devices. One of the most popular ones for hobbyists is the Raspberry Pi. I have built hardware around the Raspberry Pi. As we zoom in, we have the Raspberry Pi mobile board. This is a system on chip, similar to a cell phone chip. And one of the great things about the Raspberry Pi is that they're really cheap. Another great thing, they can interface to hardware. Here we're interfaced to a microcontroller that allow us to basically control these motors. These are server motors, common in RC cars. And they allow us to move the camera left and right and up and down. Essentially it's a camera gimbal. And it's connected to a Raspberry Pi compatible camera. What to do with this? We showed the classification demo before. Let's look at an SSD example. Single shot detection. It can identify bounding boxes in an image. Given an image, I get multiple bounding boxes. So, for example, we have an apple here. And it identifies an apple. Now, the really cool thing we can do with this, now that we have the motor, we can tell it to center the apple. We turned on the motors, and they're active. And as I move it around, it's going to keep the apple as centered as possible. If I go up, it will go up, if I go down, it will go down. So, this is really fun. What could you use this for? Well, if you're a person, it can identify you. Currently I have that filtered out. So, if I stand back, it's going to center on me. So, I could use this as sort of a virtual videographer. Imagine a professor wants to tape their lecture, but they don't have a camera person. This would be a great way to do that. I'm sure that all the hobbyists around can now use TensorFlow in a really simple way, can come up with many better applications than what I'm showing here. But I find it fun and I theme you do. And I'm not an electrical or mechanical engineer, so you can do this too. All right. Thanks for showing the demo. Let's go back to the slides, please. So, I had a backup video just in case it didn't work. It's always a good plan. [ Laughter ] But we didn't need it. So, that's great. So, let's summarize. So, what should you do? I'm sure you want to use TensorFlow Lite. Where can you get it? You can get it on GitHub right around the TensorFlow repository. How you can find out about it is looking at the TensorFlow Lite documentation on the TensorFlow. org website. And we have a mailing list, tflite@tensorFlow.org. And you can tell us about issues and what you use TensorFlow Lite for. I hope to hear from all you. One at a time, please. Thanks, everybody, thanks Sarah for her presentation. Thank you, everybody around the world listening to this. And in addition, this was work that we worked very hard with other members of the Google team, lots of different teams. So, there's a lot of work that went into this. So, thanks a lot. [ Applause ] So, with that -- we have our next talk. And where is our next speaker? Our next speaker will be Vijay. And he's going to be talking about AutoML. >> Thank you very much. Hi, everybody. My name is Vijay. And today I'll be talking to you, or hopefully convincing you, that when we try to apply machine learning to solving problems, that we should really be thinking about designing search spaces over solutions to those problems. And then we can use automated machine learning techniques in order to evaluate our ideas much more efficiently. I think a big reason why a lot of us are here today is due to the incredible impact that machine learning can have on practical problems. Two often-cited reasons is that we have increasing amounts of compute capability and access to data to train on. But I think one other aspect is all of you, right? There's so many more people involved in many machine learning today that are contributing and publishing ideas. So, this graph tries to put this into perspective by measuring how many machine learning papers are published on archive every year since 2009. And plotting that against a Moore's Law exponential growth curve. As you can see here, we have been keeping up with Moore's law, 2X, every year. And this is demonstrating how many new ideas are being developed in the field. This is a great thing, right? So, one concrete way of looking at this is in the field of computer vision, we have seen top one image and accuracy start from the 50% range from the AlexNet architecture, which, by the way, revolutionized the field of image classification. And every year we have been getting better and better up until 2017. Now, these improvements haven't come just because we have been training bigger models, right? These improvements have also come from the fact that we have lots of great ideas, right? Things like, batch normalization, residual or skip connections and various regularization techniques. Now, each of these points, like Jeff mentioned earlier, is the result of years of research effort. And we build on each other's ideas. But one of the challenging things is how do we keep up with so much -- so many ideas that are being produced? And Is want to zoom in a little bit in terms of the complexity of some of these models. So, we're going to zoom in a little bit on InceptionV4 and look at the idea embedded in there. These are models within the architecture. Every one of the arrows and operations was designed by a human. Somebody wrote some code in order to specify all of these little details. Now, there are high-level reasons why this kind of architecture might make sense. But our theory doesn't really explain with so much certainty how every detail seems to matter. And as a field, I think, we're definitely working on trying to improve the theory behind this. But for many of us, we're happy to use this kind of complexity out of the box if we can. Because it really helps to solve problems. Now, this isn't too surprising. We know that because machine learning has had such an impact on real products, that we're going to be willing to use anything we possibly can. And even if we don't understand all the little minor details. As long as it solves our problems well and hopefully are understandable. So, given all these ideas, how can we harness this explosion of ideas much more efficiently? So, let's step back and kind of ask a few questions that we might have heard when we were trying to train machine learning models. Simple, but hard questions. What learning rate should I apply for my optimization? If I'm training a deep neural network model, what dropout rate should I apply? How do we answer this question today? I think we combine a few different types of benefits. One of them is leveraging research or intuition and engineering intuition. What this means is that we start with code, or we ask our colleagues, hey, what are good settings for these fields? If it were the case that there was one setting that worked for everybody, we wouldn't be looking at these parameters. But it does matter. So, then, we move on to some trial and error process. We try a certain setting and see how well it works on our problem and we continue to iterate. And I think the other aspect, which is becoming more common, hopefully, is increasing access to compute and data by which we can evaluate these ideas. So, this combination is really ripe for automation, right? And not surprisingly, this exists today. It's called hyperparameter optimization. And this kind of setup, we have a tuner giving out these hyperparameter settings. We have a train their trains our model on our dataset and then tries to give some kind of signal about how good those settings were. And it might give a validation accuracy of some value. And the tuner can then learn from this feedback to find better points from the search space. And, you know, this is an existing big field and there are existing systems like those shown at the very bottom that can help you to do this. But now let's ask a few more complicated or detailed questions that I think people do often ask as well. Why do you use batchnorm before relu? I switched the order and it seems to work better. If you're trying to train a completely new model, use one type of sub architecture or another type of sub architecture? Now, if you think about it, these questions aren't really that different from hyperparameter settings. So, if we think of hyperparameter optimization as searching over a specific domain of ideas, then it seems possible that maybe we can actually treat the decisions made in this type of model as another form of searching over a domain of ideas. And we can therefore think about deemphasizing any specific decision that we make on our architectures. And instead think about the surplus of ideas that we might have. So, let's take a concrete example of a search space design that my colleague Barrett did where he tried to design a search space for a convolutional cell. I'll walk you through how you might design such a work space. So, the first question is, you have to get your inputs. Might say you have access to the previous input. And if you want support for skip connections, you might have the previous, previous input. So, the first job in the search space is to define which inputs I'm going select. And then, once you have those inputs selected, you want to then figure out what operation should I apply to each of those inputs before summing them together? So, I might select something like three by three convolution or three by three maxpooling and combine those together. We can then recursively turn that crank and apply it several more times where we use different operations for different inputs. And we can even use the intermediate outputs of previous decisions in our search. And then finally, you take all of your outputs that are unused and you concatenate them together. And that is your convolutional cell. And if you want to build your model, like ResNet, stack them together. This is one point from the search space of ideas. There are a billion possible ways to construct a cell like this in the search space. Changing the list and the way the connections can be made. Now that we've designed our search space, we go back to the hyperparameter tuning system. We have a program generator on the left that generates samples from this search space. We then train and evaluate on the task at hand. Oftentimes a proxy task. And iterate to quickly find what are the best programs from our search space? And the system on the left, in program generator, can optionally learn from feedback. So, it might use something like reinforcement learning, revolutionary algorithms, or even search can work well in certain situations. So, we did this type of approach. We took this convolutional cell. Trained it on proxy task to make quick progress on the evaluation of an idea. And then we took the best candidate cells that with found from that search. We enlarged in terms of the number of filters and the number of times we stacked it and applied it to the ImageNet dataset. These are two found from the search. Looking at the results, you can see that we were able to do better than the existing state of the art models in terms of top line accuracy top 1 accuracy. This effort was an example where we took a model where decisions were pretty complex and we honestly found another complex model that was better. But next I'll show you an example where we can use this general technique to find even more interpretable outputs. So, let's look at optimization update rules. Most of you are probably familiar with stochastic gradient decent. And shown on the left, gradient by the learning weight and delta. And then we have Adam, these can be expressed fairly concisely just by being given the moving average of the gradient and so forth. But we really only have a handful of these type of optimization update rules that we typically apply for deep learning, for example. What if we, instead, treat these update equation rules as part of a larger search space? And so, you can take these expressions and turn them into a data flow graph that uses the optimization update rule. We can express them using this simple tree, but also a lot of other ideas. And so, you can then turn this crank on this new search space and try to find a better optimization update rule. So, my colleagues ran that experiment. They took a fixed convolutional model and tried to search over the fixed rules. They found optimizations that did better than what I have shown you on this particular task. One nice feature of this search space, the results are more interpretable. The fourth update rule here. Taking the gradient and multiplying it by an expression. The gradient and the moving average the gradient agree in your direction, you should take a bigger step in that direction. And that they disagree that we should make a smaller step. This is actually a form of momentum. And so, one thing we can get from this is maybe we should be designing search spaces that have more notions encoded in the search space ideas. We may be able to find even better results. So, so far I have focused on techniques and search space ideas where we care about accuracy. But what's great about searching over many ideas, we night have the potential to search over more than just accuracy. For example, a lot of us care about inference speed. We want to take a model and deploy it on real hardware, real mobile platform. And we take a lot of time and try to figure out how to take one idea and make it fast enough. But what if could, as part of the search space of ideas, finds ones that balance both speed and accuracy? So, we tried to do this experiment where we included the run time on a real mobile device as part of this inner loop of the evaluation. So, we tried to focus to optimize on both accuracy as well as inference speed. And as this process goes on over time, program generator is able to find faster models while also figuring out how to make those models even more accurate. One interesting side effect of this is that when you run searches over ideas, the output is actually not just one model, it's a culture of models that implicitly codes this tradeoff. This shows you we have points along the space that provide a tradeoff between inference speed on a mobile platform and accuracy on the dataset that we're trying to solve. Rather than manually engineering the one point I want to get working, I can get a result that can maybe be deployed on various types of platforms. So, I'll emphasize this in maybe a slightly different way. Which is that we could define a search space of ideas in TensorFlow, and through this automatic machine learning process, we could get models that have a guarantied run time performance target on a target platform device. And one of the nice things about having an integrated ecosystem like TensorFlow, you can just use the libraries that convert from program program to program to you can get this end to end pipeline working well together. There's nothing required to specifically tune a model. Let me conclude by returning to this process of evaluating ideas in this world where we're trying to explore different ideas. The first is that we designed search spaces to try to test out a large set of possible ideas. Note that when we designed the search space, that required human intuition. There's a need for human ingenuity as part of this process. So, designing the search space properly takes a lot of efforts, but you can evaluate many more ideas much more quickly. When it comes to trial and error, we had to think about how software should be changed so that we can permit this type of search process. So, for example, I think many of us have probably written scripts where you take things like learning rate and dropout rate as command line flags. What if you wanted to test out deeper ideas in your programs? How do you design a program that's much more tuneable at all levels of your program? I think this is a big question for us to tackle. And lastly, we think these ideas will become increasingly relevant as many of you get access to more and more computation capabilities such as TPU pods. Imagine a world where all you have to do is take your idea, submit it to an idea bank and you have a pod of TPUs crunching overnight to figure out which solution organization ideas are the best and then waking up in the morning and it telling you, these were the good ideas, these were the bad ideas and so forth. I think part of the reason this excites me is that automatic machine learning can keep these machines much more busy than we can. We have to sleep. But machines can keep on churning 24/7. So, with that, thanks for listening. [ Applause ] And next up is Ian who will be talking to you about fusion plasmas. So, I want to talk to you about something that's very important to me. And that's how will civilization power itself for the next hundred years? So, in 2100, the projected world's population is 11.2 billion. If all 11. 2 billion people to want enjoy the same power usage that we do now in the United States, that's going to require burning around . 2yata joules of energy over the next hundred years. That's a whole lot. So, to put that in perspective, if we wanted to do that with oil alone, we would are have to ramp up oil production by a factor of 10 for the next hundred years. There's no way that's going to happen. Besides being infeasible, that would contribute to catastrophic climate change. If we want to keep climate change to a not ideal, but reasonable, say, 2 degrees temperature increase, only 1. 2% can come from coal or oil. Where does the other come from? One possible source would be nuclear fusion. So, fusion involves pushing together two smaller nuclei. What you get out is a whole lot of energy. And no greenhouse gas. So, right now the sun runs on nuclear fusion. And the reaction is so energy-dense that . 2 yatajewels would require a trivial amount of boron. So, so far it sounds like a miracle fuel. What's the catch? Well, the difficulty is that people have been trying this for 70 years and so far no one has gotten out more energy than they put in. So, to understand this, you have to imagine that the -- well, the reaction takes place inside of a plasma. And the plasma is a million plus degree swarm of charged particles. And these particles don't to want stay in place. The sun uses a gravitational force to keep everything in place. We can't do that. So, instead, we use magnets. Now, magnets, you try to squeeze it with magnets and they can pop out the end. And you can get little turbulent ripples. And what happens is the plasma breaks up, it gets unstable. It gets cooler. And then the reaction stops. And that's what's been happening for 70 years. So, this is the kind of problem that I like. It combines physics, probability, computation, mathematics. And so, I was like, I want to work on this. How can we accelerate progress? Well, so, Google is not building a fusion reactor. What we have done is we have partnered with TAE technologies, the world's largest private fusion energy company. And we have been working with them since 2015. So, pictured here is their fifth generation plasma generation device. And this thing is huge. It would fill up a large part of this room. And then in the center we have -- is where the Applause ] many plasma is kept. This is elongated toriod. And the goal is to keel this in its place and prevent turbulence. If it gets out of place, then the reaction stops. So, there's magnets and neutral beams and a host of other technologies to keep it in place. Now, what's Google's job specifically? Well, our goal is to take the measurements that come from this experimental reactor. And every time the physicists do an experiment, within five minutes, we want to tell them the plasma density, temperature and magnetic field on a three-dimensional grid. So, how hard is that? Well, first of all, the plasma is very, very hot. So, you can't just poke it with a thermometer like a turkey. The thermometer would melt and you would disrupt the plasma and ruin the experiment. What you do have are measurements along the boundary. But there's only so many measurements you can take, because you can't cut to -- can't cut, you know, that many holes in the side of this device. So, let's look closely at one. Let's look at measuring of electron density, that's done with a device known as as an inner pherometer, it's proportional to the average density along that ray. So, we have 14 lasers shining through the center of the plasma. We know the average density along 14 lines. And from that, we want to know the density everywhere. So, clearly there's no one unique solution to this problem. And instead we'll have a distribution over possible solutions. So, we do this in a Basian sense, and the final output is a probability density function for the density of the electrons given the measurements. And we can visualize that with a graph where you have a mean and some air bars. How does TensorFlow help with this? Well, so, the first place is translating measurement physics into code. So, let's consider the distribution for the camera measurement. So, the cameras measure photons. And say we have some photons being emitted from the plasma. The mean number of photons reaching the camera is given by a sparse tenser, dense matmul. But we don't realize the mean. Instead what we realize is a noisy mean. There's noise due to a finite number of photons. And we have discreetization noise, we have space. The TensorFlow normal distribution library gives you access, so this noisy flux represents a normal distribution. It has a mean. It has -- you can draw samples. You can compute the PDF and so on. That's not all, though, we also have analog to digital conversion process that we model as passing this normal distribution through a non-linear response curve and digitizing it to 8 bits. So, at the end, this digitized charge is another distribution object that has the ability to take samples. You can compute the probability mass function because it's discreet. And so on. And since we want to be Bayesian, we want to reassemble a number of these distributions giving us a likelihood and a prior and so on with the goal of producing a posterior. And then we do Bayesian inference. To do inference, we do this in two different ways. The first way is variational inference, which amounts to min nic minimizing the loss function. And you can get the true posterior. This is done like any other TensorFlow minimization. For example, we use Adam Optimizer. The second way is using Hamiltonian Monte Carlo. So the TensorFlow probability library gives you a number of Monte Carlo sample hers, and the Ham ill toneon allows you to take samples faster. Notice in both cases, it's autodifferentiation. Whether we're taking afraid why notes for the loss or to do the Hamiltonian Monte Carlo sampling. Popping up a level, you'll notice, we're not doing deep learning. As I said, we're doing an inverse problem, measurements given by physicists are into a reconstruction of some physical state. So, there's a few differences I want to highlight. First of all, there are no labels that are given to us. The natch natural label here would be a three-dimensional image of the actual plasma. But we're the ones who are telling people what the plasma looks like, so, we're the ones actually producing the labels. So, begin that there's no labels, you might be tempted to say, this is an unsupervised learning technique like word clustering. Here there really is a right answer. There really was a plasma out there. And if -- or the plasma doesn't fall within our air bars, we have made a mistake. And also you'll notice that our graph here models physics rather than generic functions. So, it's a bit more constrained on these deep neural networks. But that allows us to get the right answer with no labels. At the end of the day TensorFlow does, despite it not being deep learning, TensorFlow adds value with the TensorFlow distributions and the probability library. We have autodifferentiation to do inference. And in order to provide answers to many measurements at once, GPUs and distributed computing is very important. So, thank you very much. [ Applause ] And next up we have Cory talking about machine learning and genomics. >> Hello, everyone. My name is Cory McLean and I'm an engineering on the genomics team in Google brain. And today I'm excited to tell you about nucleus, which is a library we've released today to make it easy to bring genomics data to TensorFlow. So, genomics is the study of the structure and function of genomes. In every cell in your body, you have two copies of the genome, one from each parent. And this is strings of DNA, which is a four-letter alphabet. And about 3 billion letters in the genome. So, here is a picture of a snapshot on chromosome-1, 150,000 letters. What we can see is there's a number of known things about this area. One, there are functional elements, like the genes depicted in that second row. Biological measurements allow us to analyze what our different things that are active in cells. So, on that third row, we can see the amount of gene expression across different tissue types is quantified there. And at the bottom, through sequencing many people, we can identify places where there's variation across individuals. And there's many different computational algorithmic challenges in developing that image. This ranges from, on the experimental data generation side. Can we better take the output of these physical measurements to get accurate DNA readings. We'll reduce noise in the experiments that quantify this expression. Can we take the DNA sequence and interpret where our functional elements like these genes? Or predict how active are they in different tissue types? And can we identify places where individuals vary compared to our reference? And how is that different in small variants versus say in cancer? And how do those changes influence human traits? So, one thing that is really exciting for us is, there are many opportunities for deep learning in genomics. And a lot of that is driven by the increase in the amount of data available. This graph shows the dramatic reduction in cost to sequence a million bases of DNA over the past decade. But also, there's a lot of structure in these datasets that is often complex and difficult to represent with relatively simple models. But this made us display convolutional structure, so we can use techniques from image classification as well as sequence models. And there have been a number of proven successes of applying deep learning to problems in genomics such as deep variant, which is a tool our group developed to identify small variants using convolutional neural networks. So, our goals in genomics are multi-faceted. One is to make it easy to apply TensorFlow to problems in genomics. And do this by creating libraries to make it easy to work with genomics data. We're also interested in developing tools and pushing the boundaries on some of these scientific questions using those things that we've built. And then want to make all of that publicly available as tools that can be used by the community. So, today, I'll focus on the first part of making it easy to bring genomics data to TensorFlow. So, what is a major problem? One major difficulty is that there are many different types of data that are generated for genomics research. You can see here on the right a subset of different types used. And these different file formats have varying amounts of support and in general no uniform APIs. We have some concerns about efficiency and language support where we would like to be able to express some manipulations in Python, but need some effective ways to efficiently go through this data such that native Python wouldn't make that possible. So, to address these challenges, we developed Nucleus, which is a C++ and Python library for reading and writing genomic data to make it easy to bring to TensorFlow models and then feed through the tf.data API that Derek talked about today for training models for your particular task of interest. And we support the reading of many of the most common data formats in genomics and provide a unified API across the different data types. So, we're able to iterate through the different records of these different types and be able to query on specific regions of the genome to access the data there. The way that we developed this uses protocol buffers under the hood so that we can implement all the general parsing in C++ and then make those available to other languages like Python. And for those of you particular with genomics, we end up using HTS lib which is a conical parser for the high-through put sequencing data formats with the variants and then wrap that to generate the protocol buffers and then use CLIF on top of this to make the data available to Python. And finally, we use some of the TensorFlow core libraries so we can write out this data as T FRecords so they can be ingested by the API. The data types we currently support are the following, raking from general genome annotation to reference genomes and different sequencer reads. Whether they're director off the sequencer or mapped as well as genetic variants. So, to give an example of the reading API, it's quite straightforward. So, this is kind of a toy example, but it is essentially similar to what is used for deep variant where we want to train a model to identify actual genome variations based on maps, sequence reads and a reference genome. So, you have three different data types that we need. We import the different reader types. And then say, in this renal region that we're interested in, we can issue queries to each of the different reader types and then have iterables that we can manipulate and turn into TensorFlow examples. On the writing side, it's similarly straightforward. So, if we have a list of variants for the -- the common vcf format, we'll have an associated header which provides metadata about this, and then open a writer with that header and then just loop through the variants and write them. And note that we support writing to block format which is for the subsequent indexing by other tools. We can directly write to TF records and write the methods to write out charted data which we found helps avoiding certain hot spots in the genome using very a very similar API. Finally, we have been working with the Google Cloud team which has some tools for analyzing variant data. And so, they have developed a tool called Variant Transforms, which allows you to load the variant files to big query using Apache Beam. And you can do queries over that data. We're implementing here to have Nucleus under the hood providing that generation of the variants and to learn more about that tool you can go to the link below. So, to summarize, we have developed Nucleus, which is a C++ and Python library to make it easy to bring genomics data to TensorFlow to train your models of interest for genomic problems. And we have the ability to interoperate with cloud genomics and being integrated into the various transforms at the moment. And this ended up being the foundation of our CNN-based variant caller which is available at the link below. So, with that, I would like to thank you all for your attention today. [ Applause ] Next up we'll have Edd to talk about open source collaborations. . >> Thank you. Hi, everyone. Now, I was going to talk to you about my plans to reanimate dinosaurs with TensorFlow, but I don't want to steal those guy's thunder. Actually, I'm here to talk about open source collaboration in the TensorFlow project. That's my job at Google. To work on growing the participation and the collaboration in the project in the whole community. So, you guys all here and everybody watching on the livestream are a huge part of this already. If you saw a slide like this at the beginning of the day in the keynote, you can see the numbers ticked up. In the five days I have been monitoring this slide, I am increasing the numbers. The amount of participation is staggering. As an open source project, it blows my mind when I came to work on it. And so much of that is due to the participation of everybody here in the community. There are parts of TensorFlow that wouldn't exist, many of them, without that collaboration. For instance, whether it's Spark connectors, whether it's support for particular architectures and accelerators, or maybe certain language bindings. We not only benefit from a huge amount of adoption by being open source, but as the adoption grows, this is the way we sustain and grow the project. You saw this map earlier as well. This is just some of the GitHub stars that gave their locations that we could map. And it was as far north as Norway and as far south as the Antarctic. And what's obvious right now, although there's a large team at Google developing TensorFlow, there are for many more people and in far many more places using it . And the open source projects, more adoption were more demand. There's so much we can do together to grow TensorFlow. Now, you remember that thing where you turn up to a party and everyone is having a good time and they all seem to know what they're doing and why it's such fun. But who am I going to talk to with and what are they looking at? And sometimes a large open source project can be a bit like that. You want to get involved and contributing to TensorFlow, but where do you start? You think this module, -- this feature and something you want to work on. How do you find the right person to talk to? How you learn what we're thinking about the direction for this? We have heard some of those things and we recognize that we want to improve our openness, our transparency and our participation. We're trying to work to make it easier to get involved in TensorFlow. We have already, for instance, refreshed our roadmap which you can find on the TensorFlow website about the general direction of a lot of the code. And we'll do that a lot more regularly. But I want to talk about four initiatives that we have going that will enable us to work together more effectively and faster. The first of these is simple. It's a central community for everyone who is working on and contributing to TensorFlow. GitHub has so much energy going on. There's so much great debate in all the issues. Look in there, in the pull requests, really thoughtful conversation conversations and contributions. Thank you for being part of that. But what we don't have is a central place you can collaborate. We have a mailing list, developers @it feel.org. That's a developer@tensorFlow.org. We can work together as a community to get feedback and coordinate together. Many of the projects have mailing lists that you can find at TensorFlow. org/community. Whether it's TF Lite, or TensorFlow.JS. So, that's collaboration. Now, we talked about the fact there are many use cases outside of Google that the core team don't see. Many more. Many Much more happens outside than inside the core team. So, we want to make it possible for people with shared interests in projects to work together. This is where the beauty of open source comes in. How do we do that? We're setting up a structure for groups to work together. Special interest groups. We have been piloting the first of these for a few months now. This is called SIG build. It's about building, packaging and distributing TensorFlow. Familiar with TensorFlow, you know we built it in a certain way. Guess what? Not every architecture or application finds that the best way for them. For instance, the Linux wants to build against the shared libraries in the distribution. That's not something we do. So, we brought together a bunch of stakeholders over Linux, companies like redhat, IBM, and NVIDIA, Intel, to collaborate in a group to look at the build and make it work for effectively for more people in the future. That's just one group. The pilot. But, we want to pave the cow paths. Where there is energy and people collaborating on a particular thing, that's a great candidate to bring a special interest group together. We're also bringing online a group for TensorBoard where key stakeholders of the TensorBoard ecosystem can work together on collaboration. And the bindings, completely built by the group for TensorFlow. And each will have a different way of working, a different community. But the common thing is, we're going to provide forms, if you have a shared interest in a particular area, we can focus on it. Now, I'd like to talk about the design of TensorFlow. You know, one of the most amazing things and the benefits of TensorFlow is that the code that we release is the code that Google uses on a daily basis. It's kind of remarkable for an open source project. And so, we're really causal about changes. We're really careful about design. We have a commitment, obviously, to the API through the 1.X series release. And we have design reviews internally. So, when things change, we have proposals, we get feedback. But by now you are thinking, well, you just said that so many use cases and so many users are outside of Google. Yet you're having these design reviews inside. So, what we're going to do is open up a public feedback phase to our design process so we can engage much more broadly with every contributor and user about how that might affect their needs and what their opinions are. Keep an eye on the developers at TensorFlow.org mailing list. That's where we'll notify it coming online in the next couple months. My hope is this process will be a way that everybody can communicate and discuss about the future direction of TensorFlow. Whether you're in the core team at Google or in the broader community. So, contributing to TensorFlow isn't just about issues or PR requests. In fact, I would say I reckon that there's so much more -- more energy going into blogging, running meetups, doing presentations, teaching, doing courses. So many universities around the world. And we want to amp up and support the amount of content that educates and highlights TensorFlow. We're really excited already that so many of you do such amazing jobs. We would like to be able to point everybody in the TensorFlow community to the work that you're doing. So, there's a couple things that we launched to support this. The first you probably already heard. We now have a blog for TensorFlow. A blog.tensorFlow.org. One of the things I'm most excited about with this blog is that as well as important announcements and education, we're setting it up from the beginning to involve content from around the web and into the community. That's one of the reason we're using the Medium platform to make it easy to integrate content around the web and give you the credit for the work you have done. So, we would really like to hear from you. If you have a blog post to get into the TensorFlow publication, get in touch. Secondly, and if you're on the livestream watching this, you've kind of found out about this, we have a YouTube channel that's launched today. Now, one of the things I'm most excited about this in this is a show called TensorFlow Meets. We are able to get out into the world of contributors and users and highlight some of the use cases. Highlight the work of everybody. This is a chance for you too. We would love to meet you and chat with you about what you're up to and have you featured on the YouTube channel. Again, reach out to us. We would love you to be a part of it. There is one URL to get involved in all these things that I mentioned, TensorFlow. org/community. So, if anyone's mentioned a mailing list or group to you today, please go to that URL and you will find resources there. It's my hope that TensorFlow is going to continue to be a party, but maybe one you can find yourself part of a lot sooner and have more fun. Please, feel free to reach out to me. There's my people address, ew ewj@google.com. And talk about your experiences collaborating around open source and TensorFlow. I would love to hear about it. Thank you so much. [ Applause ] Now, our next speaker is Chris Lattner. He's going to talk about a first principles approach to machine learning. >> All right. Thank you, Edd. Hi there, everyone. I'm excited to introduce a new project that we have been working on that takes a new project to improve usability of TensorFlow. And we care so much about usability here that we're going all the way back to first principles of the computation that we're performing. But first, why usability? I hope that everyone here agrees that productivity and machine learning is critical. Because it leads to a faster pace of innovation and progress in our field. And, of course, question are just want to build beautiful things for TensorFlow users since that's a big piece of it as well. But if you look at machine learning frameworks, there's two major approaches. The most familiar are the graph building approaches where you explicitly define a graph and execute it to run a computation. It's great for performance, but not always for usability. In define by run approach, eager execution, not always the best performance, but you can use it easier. And both approaches are really about allowing Python to understand the difference between the tensor computation of your code and all the other non-tenser stuff like command line processing and visualization and what you do. I think it's interesting to look at how these actually work. In the case of Eager Execution, you write the model and Python parses it. And then it feeds every statement at a time to the interpreter. If it's a tenser operation, it hands it to TensorFlow and takes care of the tensor applications otherwise Python runs it. The key thing about Eager Execution is they're designed within the constraints of a Python library. With the compiler, what we can do. compiler and language involved, there's a whole other set of approaches that can be applied to solving this problem. That's what we're doing. The cool thing about a compiler, after you parse your code, the compiler can see the entire program and all the tensor ops in it. We're all thing a new stage to the compiler, takes the tensor applications out, and because it's a standard TensorFlow graph, you can get access to all the things that TensorFlow can do, including the devices. You get the power and flexibility of TensorFlow, but you get the usability of your execution as well. But there's a catch. There's always a catch,ing right? The catch here is that we can't do this with Python. At least not with the reliability we expect, because it doesn't support the kind of compiler analysis we need. And what do we mean by that? Well, the compiler has to be able to reason about values. Has to reason about control flow and function calls. Variable listing and thing Variable alassing. And we have come to know the things about Python, including using all the standard Python APIs. I know what you're thinking. Does this mean we're talking about doing a new language? Well, that's definitely an approach to solve the technical requirements we want. With a new language, we can build all the nice things we want into it. But this comes at a cost. It turns out we would be foregoing the benefits of a community. That includes tools and libraries, but also things like books, which some people still use. And even more significantly, this would take years of time to get right. And machine learning just moves too fast. No, we think it's better to use an existing language. But here we have to be careful. Because to do this right, we have to make significant improvements to the compiler and the language and do it in a reasonable amount of time. And so, of course, this brings us to the Swift programming language. Now, I assume that most of you are not very familiar with Swift, so, I'll give you a quick introduction. Swift is designed with a lightweight syntax. It's geared towards being easy to use and learn. Swift draws together best practices from lots of different places, including things like functional programming and generics. Swift builds on LLVM, it has an interpreter and scripting capabilities as well. Swift is great in notebook environments. These are really awesome when you're interactively developing in real-time. Swift is also open source. It's part of lots of platforms. It has a big community of people. But the number one thing that's most important to us is it has a fully open design environment called Swift Evolution. Allows us to propose machine learning and compiler features directorially for integration into Swift. When you bring all of this together, I'm happy to introduce Swift for TensorFlow. Swift for TensorFlow gives you the full performance of Graphs. You can use native language control flow. Has built-ins for automatic differentiation. You can detect errors without running. And full interaction with APIs. I would like to welcome Richard Wei to tell you about it now. >> Thank you, Chris. [ Applause ] I'm thrilled to show you Swift for TensorFlow. Swift is a high-performance, modern programming language. And today, for the very first time, Swift has a full powered TensorFlow built right in. I'm going to walk through three major styles of programming. Scripting, interpreting, and notebooks. So, first let me show you the Swift interpreter. This is a Swift interpreter. When I type some code, swift evaluates it and prints a result. Just like Python. Now, let's import TensorFlow. I can create a tensor from some scale scalers. Now, I can do any TensorFlow operation directly and see the result. Just like I would with Eager Execution. For example, A plus A, or ace matrix product with itself. Of course, loops just work. I can print the result. Now, interpreter is a lot of fun to work with. But I like using TensorFlow in a more interactive environment just like Jupiter notebook. So, let's see how they work. This is a Swift notebook. It shows all the results on the right. So, here's some more interesting code. Fun with functions. So, here I have a sigmoid function inside a loop. Now, as I click on this button, it shows a trace of all values produced by this function over time. Now, as a machine learning developer, I often like to differentiate functions. Now, when I type in -- well, since we were able to improve that programming language, we built first class automatic differentiation right into Swift. Now, when I type in gradient effects, it shows the gradients. Swift computes the gradient automatically and gives me the result. So, here is the gradient in the sigmoid. Now, let's look at some Python code. Let's think about Python. Well, as a machine learning developer, I have been using Python a lot. And I know there are many great Python libraries. Just today my colleague, Dan, sent me a dataset in pickle format. Well, I can directly use Python APIs to load it. All I have to do is just type in "Import Python. " And Swift uses a Python API, Pickle, to be specific, to load the data. In here, you can see the data right in the Swift notebook. Now, -- so, here's a Swift notebook. Now, some people like to run training scripts directly in command line. So, let me show you how to train a simple model from command line. So, here is a simple model. I'm using TensorFlow dataset to load the API. I have the forward pass and the backward pass defined in the training loop. Now, I usually like to work on the go, so, this code has been working on the CPU on my laptop. But when I want to get more performance, what do I do? Well, why don't I just enable cloud TPU? So, all I have to do is add one line to enable TPU execution. When I save this file, open the terminal to run this training script. It's initializing TPU. And the Swift compiler automatically partitions this program into a program and a TensorFlow graph. And TensorFlow is sending this graph to the TensorFlow SLA compiler for TPU execution. Now, it's running. And we're waiting for the TPU to give the result. Look! Loss is going down. All right. So, why don't we simply open TensorBoard and see the training curve? So, now I can see the entire training history in TensorBoard. So, this is looking great! Now, this is Swift for TensorFlow. It's an interactive programming experience with super-computing performance at your fingertips. Back to you, Chris. >> Thanks, Richard. Thanks, Richard. [ Applause ] All right. To recap quickly, Richard showed you that Swift has an interpreter. And it works just like you would expect. Now, I know that it's super-frustrating to be working on a program and two hours into a training run, get a shape error are or type mismatch. Swift is catching it early. We built catching mistakes right into TensorFlow. And you can use APIs and other languages from Swift. And give use full access to any of the python APIs that you love to use. Swift is generateing standard TensorFlow graphs, including control flow. Which give you the full performance of the session API. Of course, graphs are also awesome because they give you access to everything that TensorFlow can do, including devices spanning the range from the tiniest Raspberry Pi all the way up to a TPU super computer. You may wonder, what does this mean? This is an early stage project. But we're looking forward to our open source release next month. And not only are we releasing the code, but technical white papers and documents to explain how it works and moving our design discussions out into the public on the Google group so everyone can participate. We're not done yet. We have basic support for automatic differentiation built right into the compiler and the language. But we want to have exotic cases like recursion and data structures. Compatibility issues are super-frustrating. Especially if you use an op or D-type not supported by your device. Swift has great support for detecting issues like this, and we are looking forward to wiring this into supporting TensorFlow. We are interested in high-level APIs. We have some prototypes now, but we would like to design multiple approaches and experiment and settle on the best one based on real-world experience. This has been a super-quick tour of Swift for TensorFlow. Swift for TensorFlow combines the power and flexibility of TensorFlow with a whole knew standard of usability. We think it's going to take your ability to the roof. It's an early stage project. We would like you to get interested and help us to build this future. Thank you. [ Applause ] >> Hello, everyone. Welcome back. Welcome back. I'm Jeremiah, and this is Andrew. We are here from the TensorFlow Hub team. We are based in Zurich, Switzerland, and we're excited to share TensorFlow Hub today. So, this first slide is actually one that I stole. I took it from a colleague, Noah Fiedel who leads TensorFlow serving. And Noah uses this slide to tell a personal story. It kind of shows the growth of things -- the type of tools that we use to do software engineering. And it shows how they mature over time. He connects this to a similar thing happening, the tools we use to do machine learning. And he draws these connections. We're rediscovering things as we grow our machine learning tools. Things like the machine learning equivalent of source control. Machine learning equivalent of continuous integration. And Noah make this is observation that this is lagging behind the software engineering side by 15-20 years. So, this creates a really interesting opportunity, right? We can look at software engineering. We can look at some of the things that have happened there. And think about what kind of impact they may have on machine learning. Right? So, looking at software engineering, there's something so fundamental, it's almost easy to skip over. That's this idea of sharing code. Shared repositories. On the surface, this makes us immediately more productive. We can search for code, download it, use it. But has really powerful second order effects, right? This changes the way we write code. We refactor our code. We put it in libraries. We share those libraries. And this really makes people even more productive. And it's the same dynamic that we want to create for machine learning with TensorFlow Hub. TensorFlow Hub lets you build, share and use pieces of machine learning. So, why is this important? Well, anyone who has done machine learning from scratch knows you need a lot to do it well. Now media an algorithm. You need data. You need compute power and expertise. And if you're missing any of these, you're out of luck. So, TensorFlow Hub lets you distill all these things down into a reusable package called a module. They can be easily reused. So, you'll notice I'm saying "module" instead of "Model." It turns out that a model is a little bit too big to encourage sharing. If you have a model, you can use that model if you have the exact inputs it wants and you expect the exact outputs it provides. If there's any little difference ises, you're kind of out of luck. So, modules are a small piece, right? If you think of a model, like a binary, think of a module like a library. So, on the inside, a module is actually a saved model. So this lets us package up the algorithm in the form of a graph. Package up the weights, you can do things like initialize, use assets. And our libraries make it very easy to instantiate these in your TensorFlow code. So, you can compose these in interesting ways. This makes things very reusable. You can produce one of these and share it. These are also retrainable. Once you patch it into your bigger program, you can back propagate through it just like normal. And this is really powerful because, if you do happen to have enough data, you can customize the tf.hub module for your own application. And to tell us a little bit more about some of those applications, I'll hand it over to Andrew. >> Thanks Jer reMaya. Let's look at a specific example of using a tf . hub module for image retraining. Say we're going to make an app to classify rabbit breeds from photos. We have a couple hundred examples, not enough to train an entire image classification model from scratch. But we could start from an existing general purpose classification model. Most of the high-performing ones are trained on millions of examples and they can easily classify thousands of categories. So, we want to reuse the architecture and the trained weights of that model without the classification layers, and in that way, we can add our own rabbit classifier on top and we can train it on our own rabbit examples. And keep the re-used weights fix fixed. So, since we're using TensorFlow Hub, our first stop is TensorFlow.org/hub. We can find the list of the newly released, state of the art, and also the well-known image modules. Some of them include the classification layers. And some of them remove them, just providing a feature vector as output. And so, we'll choose one of the feature vector ones for this case. Let's use NASNet, a state of the art image module created by a neural network architecture search. You paste the URL of the module. And TensorFlow Hub downloads the graph and all of its weights and importing it into your model. In that one line, you're ready to use the module like any function. So, here we just provide a batch of inputs and we get back our feature vectors. We add a classification layer on top and output our predictions. But in that one line, you get a huge amount of value. In this particular case, more than 62,000 hours of GPU time went into finding the best architecture for NASNet and training the result. All the expertise, the testing, the research that the author put into that, that's all built into the module. Plus that module can be fine-tuned with your model. So, if you have enough examples, you can potentially get better performance if you use a low-learning rate, if you set the trainable parameter to true, and if you use the training version of the graph. So, NASNet is available in a large size as well as a mobile size module. And then there's also the new progressive NASNet. And then a number of new MobileNet modules for doing on-device image classification, as well as some industry standard ones like Inception and ResNet. That list is at TensorFlow.org/hub. All those modules are pre-trained using the TFslim check points and ready to be used for classification or as feature vector inputs to your own model. Okay. Let's look at another example. In this case, doing a little bit of text classification. So, we'd like to know whether a restaurant review is a positive or negative sentiment. So, as Jeremiah mentioned, one of the great things about TensorFlow Hub is it packages the graph with the data. For our modules, that means that all Doing things like normalizing and tokennizing operations. So, we can use a pre-trained sentence embedding module to map a full sentence to an embedding vector. So, if we want to classify some restaurant reviews, then we just take one of those sentence embedding modules, we add our own classification layer on top. And then we train with our reviews. We keep the sentence modules weight's fix weights fixed. And just like for the image modules, TensorFlow.org/hub lists a number of different text modules. We have neural network language models that are trained for English, Japanese, Spanish, we haveword2vec trained on Wikipedia. And EL MO that looks how words are used across context. And something really new, today, you may have seen a new paper this morning from the team, this is the universal sentence encoder. It's a sentence-level training module and enables a variety of tasks, in other words, universal. Some of the things it's good for, semantic similarity, custom text classification, clustering and semantic seven. search. But the best thing is how little training is required to adapt to your problem. That sounds great in our particular case. Let's try it on the restaurant review task. So, we just paste that URL from the paper, and like before, TensorFlow Hub downloads the module and inserts it into your graph. But this time we're using the text embedding column to feed into a classifier. And this module can be fine-tuned with your model by setting trainable to true. Of course, you have to lower the learning rate so that you don't ruin the existing weights that are in there, but it's something worth exploring if you have enough data. Now, let's take a closer look at that URL. As Jeremiah mentioned, a module is a program. So, make sure what you're executing is from a location that you trust. In this case, the module is from tf.hub.dev. That's our new news for Google-provided modules like NASNet and the encoder. We would like to make a place where you can publish the modules that you create. In this case, Google is the publisher. And universal sentencing encoder is the name of the module. And finally, the version number is 1. So, TensorFlow Hub considers modules to be immutable. That way you don't have to worry about the weights changing between training sessions. So, that module URL, and all of the module URLs on tf.hub.dev include a version number. And you can take that URL and paste it into your browser and see the complete documentation for any module that's hosted on tf.hub .Dev. Here's the particular one for the universal sentence encoder. And then we also have modules for other domains besides text classification and imagery training, like a generative image module that contains a progressive GAN that was trained on celeb A. And another model based on deep local features network that can identify landmark images. Both have great co-lab notebooks on TensorFlow.org/hub. The images here were created from them. And we're adding more modules over time for tasks like audio and video over the next few months. But most importantly, we're really excited to see what you build with TensorFlow Hub. Use the hashtag t tfhub, and visit TensorFlow. org/hub for examples of tutorials, interactive notebooks and code labs and our new discussion mailing list. For everyone from our team in Zurich, I want to thank you so much. [ Applause ] Okay. Next up, Clemens and Raz will tell you about TensorFlow Extended. >> Thank you. Hello, everyone. First, thanks, everyone, for coming to the TensorFlow Dev Summit. And second, thanks for staying around this long. I know it's been a long day and there's a lot of information that we have been throwing at you. But we have much, much more and many more announce wants. Stick with me. My name is Clemenss, and this is Ras. And we are going to talk about TensorFlow Extended today. I'm going to do a quick survey. How many of you do machine learning in a research or academic setting? Okay. Quite a big number. And how many much you do machine learning in a production setting? Okay. That looks about half/half. Obviously, also, a lot of overlap. For those who do machine learning in a production setting, how many of you agree with this statement? Yeah? Some? Okay. I see a lot of hands coming up. So, everyone that I speak with who is doing machine learning in production agrees with this statement. Doing machine learning in production is hard. And it's too hard. Because after all, we want to democratize machine learning and allow people to deploy machine learning in their products. One of the reasons it's still hard is that in addition to the actual machine learning, the small, orange box where you use TensorFlow, you may use Keras to put the layers in the model, you need to worry about so much more. There's all of these other things you need to learn about to actually deploy machine learning in a production setting and serve it in your product. This is exact my what TensorFlow Extended is about. It's a Google machine learning platform that allows the users to go from data to a production serving machine learning models as fast as possible. Now, before we introduced TFX, we saw that going through this process of writing some of these components, some of them didn't exist before, glueing them together and actually getting to a launch took anywhere between six and nine months, sometimes even a year. Once we deployed TFX and allowed developers to use it, most can use it and get up and running in a day and get to a deployable model in production in a matter of weeks or just a month. Now, TFX is a very large system and platform that consists consists of of a lot of components and services, so I can't talk about it all in the next 25 minutes. We can only cover a small part. But talking about the things we have open sourced and made available to you. First, we're going to talk about TensorFlow transform. And how to apply transformations on your data consistently between training and serving. Next, Ras is going to introduce you to a new product that we are open sourcing called TensorFlow model analysis. We're going to give a demo of how all of this works together end-to-end. And then we can make a broad announcement for our plans for TensorFlow Extended and sharing it with the community. So, let's jump into TensorFlow transform first. A typical ML pipeline that you may see in the wild is during training you do a -- you have a distributed data pipeline that applies transformations to your data. Because usually you train on a large amount of data. This needs to be distributed. And you run that pipeline and sometimes materialize the output before you actually put it into your trainer. Now, at serving time, you need to somehow replay those exact transformations online. As a new request comes in, it needs to be sent to your model. Now, there's a couple of challenges with this. The first one is, usually those two things are very different code paths, right? The data distribution systems that you would use for batch processing are very different from the libraries and tools that you would use to, in real-time, transform data to make a request to your are model, right? So, now, you have two different code paths. Second, in many cases, it's very hard to keep those two in sync. And I'm sure a lot of you have seen this. You change your batch processing pipeline and add a new feature or change how it behaves. And you need to make sure that the code in the production system is changed at the same time and kept in sync. And the third problem is, sometimes you actually want to deploy your TensorFlow machine learning model in many different environments. So, you want to deploy it on a mobile device, on a server, maybe you want to put it on a car. Now suddenly you have three different environments where you want to apply these transformations, and maybe different languages for those and it's very hard to keep those in sync. And this introduces something that we call training serveing skew, the transformations at training time might be different than serving time, which leads to bad quality of your serving model. TensorFlow transform addresses this by helping you write your data processing log at training time. Help you create those data pipelines And at the same time, it emits a TensorFlow graph that can be in line with your training and serving model. Now, what this does is, it actually hermetically seals the model. And your model takes a raw data request as input. And all of the transformations are actually happening within the TensorFlow graph. Now, this is a lot of advantages. And one of them is that you no longer have any code in your serving environment that does these transformations, because they're all being done in the TensorFlow graph. Another one is wherever you deploy this TensorFlow model, all of those transformations are applied in a consistent way, no matter where this graph is being evaluated. Let's see how that looks. This is a code snippet of a pre-processing function that you would write with tf.transform. I'm going to walk you through what happens here and what we need to do for this. So, the first thing we do is normalize this feature. And as all of you know, in order to normalize a feature, we need to compute the mean and the standard deviation. And to apply this transformation, we need to divide by the standard deviation. For the input feature X, we have to compute the statistics. Which is a trivial task if the data fits into a single machine. You can do it easily. It's a non-trivial task if you have a gigantic training dataset and you have to compute these metrics effectively. Once we have these metrics, we can apply this transformation to the feature. This is to show you that the output of this transformation can then be, again, multiplied with another tensor, which is a regular TensorFlow transformation. And in order to bucketize a feature, you also, again, need to compute the bucket boundaries to actually apply this transformation. And, again, this is a distributed data job to compute those metrics for the result of an already-transformed feature. This is another benefit. To then actually apply this transformation transformation. The next examples show you in the same function you can apply any other tensor and tensor route function. And there's also some of what we call "Mappers" itpers" in tf . transform that we don't have to run a data pipeline to compute anything. Now, what happens is the orange boxes are what we call analyzers. We realize those as actual data pipelines to compute over your data. They're implemented using Apache Beam. We can talk about this more later. But this allows us to actually run this distributed data pipeline in different environments. There's different runners for Apache Beam. And all of the transforms are just single instance-to-instance transformations using pure TensorFlow code. And what happens when you run TensorFlow transform is that we actually run these analyze phases, compute the results of the analyze phases, and inject the result as a constant in the TensorFlow graph. So, this is on the right. And in this graph is a hermetic TensorFlow graph that applies all of the transformations. And it can be inlined in your serving graph. Now your serving graph has the transform graph as part of it and can play through these transforms wherever you want to deploy this TensorFlow model. So, what can be done with TensorFlow Transform? At training time, for the batch processing, really anything that you can do with a distributed data pipeline. So, there's a lot of flexibility here, what types of statistics you can compute. We provide a lot of utility functions for you. But you can also write custom data pipelines. And at serving time, because we generate a TensorFlow graph that applies these transformations, we have limited to what you can do with the TensorFlow graph. But for all of you in TensorFlow, there's a lot of flexibility in there as well. And so anything that you can do in a TensorFlow graph you can do with tf.transformations. So, some of the common use cases, the ones on the left I just spoke about. You can scale and continue value to the core. A value between zero and one. You can bucketize a continuous value. If you have text, you can apply a bag of words or N-grams. Or for future crosses, you can cross those strings and then generate the result of those crosses. And as mentioned before, the transformer is extremely powerful in being able to chain look at these transforms. You can apply transform on the result of a transform as well. And another particularly interesting transform is applying another TensorFlow model. You have heard about the safe model before. If you have a safe model that you can apply as a transformation, you can use this it f.transform. You want to apply an Inception model and combine it with another feature or use it as an input feature to the model. You can use any TensorFlow model that's in line in had your graph and in your serving graph. So, all of this is available today. And you can go check it out on GitHub at TensorFlow/transform. I'm going to hand it over to Ras who is going to talk about TensorFlow model analysis. >> All right, thanks, Clemens. Hi, everyone. I'm really excited to talk about TensorFlow model analysis today. We're going to talk little bit about metrics. Let's see. Next slide. All right. So, we can already get metrics today, right? We use TensorBoard. TensorBoard's awesome. You saw an earlier presentation today boutons. It's a great tool. While you're training you can watch your metrics, right? And if you're training isn't going well, you can save yourself a couple of hours of your life, right? Terminate the training, fix some things. But let's say you have a trained model already. Are we done with metrics? Is that it? Is there anymore to be said about metrics when we're done training well? Of course there is. We want to know how well our trained model actually does for our target population. Right? And I would argue that we want to do this in a distributed fashion over the entire dataset. Now, why wouldn't we just sample? Why wouldn't we just save more hours of hour lives, right? Just sample, make things fast and easy, right? Start with a large dataset. You're going to slice that dataset. I'm going to look at people at noon. Noontime, right? That's a feature. From Chicago. My hometown. Okay? Running on this particular device. Well, each of these slices reduced the size of your evaluation dataset by a factor, right? This is an exponential decline. By the time you're looking at the experience for a particular, you know, set of users, you're not left with very much data, right? And the error bars on your performance measures, they're huge. I mean, how do you know that the noise doesn't exceed your signal by that point, right? So, really, you want to start with your larger dataset before you start sliceing. All right. So, let's talk about a particular metric. I'm not sure if, you know, who has heard of the ROC Curve? It's kind of an unknown thing in machine learning these days. Okay. So, we have our ROC Curve. And I'm going to talk about a concept that you may or may not be familiar with, which is ML fairness. Okay. What is father fairness? Fairness is a complicated topic. Fairness is basically how well does our machine learning model do for different segments of our population? You don't have one ROC Curve, you have an ROC Curve for everybody segment and group of users. Who here would run their business based on their top line metrics? No one, right? That's crazy. You have to slice your metric metric. You have to dive in and find out how things are going. That lucky user, black curve on the top, great experience. The unlucky user, the blue curve, not such a great experience. So, when can our models be unfair to various users, okay? Well, one instance is if you simply don't have a lot of data from where which to draw your inferences, right? So, we use tochiastic optimizers, and if we retrain the model, it does something slightly different every time. And you get high variance for some users because you don't have a lot of data there. We have been incorporating data from a lot of data sources. Some data sources are more biased than others. Some users get the sort end of the deal. Other users get the ideal experience. Our labels could be wrong, right? All of these things can happen. And here's TensorFlow model analysis. You're looking here at the UI hosted within a Jupiter notebook. On the X axis we have our loss. And you can see there's some natural variance in the metrics, you know? And we're not always going to get spot on the same precision and recall for every segment of population. But sometimes you'll see, you know, what about those guys at the top there? Experiencing the highest amount of loss. Do they have something in common? We want to know this. Sometimes our -- our users that are our most -- that get the poorest experience, they're sometimes our most vocal users, right? We all know this. I'd like to invite you to come visit ml-fairness.com. There's a deep literature about the mathematical side of ML Fairness. Once you have figureed out thousand measure fairness. How does TensorFlow model analysis give you the sliced metrics? How do you go about getting the metrics? Today, a saved model for serving. That's a familiar thing. TensorFlow model analysis is simple -- is similar. You export a saved model for evaluation. Why are these models different? Why export two? Well, the eval graph that we serialized as a save model has some additional annotations that allow our evaluation batch job to find the features, to find the prediction, to find the label. And we don't want those things, you know, mixed in with our serving graph. So, we export a second one. So, this is the GitHub. We just opened it I think last night at 4:30 p.m. Check it out. We have been using it internally for quite some time now. Now it's available externally as well. The GitHub has an example that kind of puts it all together. So, that you can try all these components that we're talking about from your local machine. You don't have to get an account anywhere. You just get cloned and run the scripts and run the code lab. This is the Chicago taxi example. So, we're using public data from -- publicly-had available data to determine which riders will tip their driver and which riders, shall we say, don't have enough money to tiptoed tip today, right? What does fairness mean in this context? So, our model is going to make some predictions. We may want to slice these predictions by time of day. During rush hour, we're going to have a lot of data. So, hopefully, our model is going to be fair if that data is not biased. At the very least, it's not going to have a lot of variance. But how is it going to do at 4 a.m. in the morning? Maybe not so well. How is it going to do when the bars close? An interesting question. I don't know yet. But I challenge you to find out. All right. So, this is what you can run using your local scripts. We start with our raw data. We run the tf.transform. The tf. transform emits a transform function and our transform examples. We train our model. Our model, again, emits two saved models, one for serving and one for eval. You can try this locally. Run scripts and play with this stuff. Clemens talked a little bit about transform. We want to take our dense features and scale them to a particular Z-score. We don't want to do that batch by batch. The mean for each batch is going to differ, may be fluctuations. We want to normalize these things across the entire dataset. We've built a vocabulary, we bucket for the wide part of our model, and we emit our transform function, and into the trainer we go, right? You heard earlier today about tf estimators. And here is a wide and deep estimator that takes our transform features and emits two save models. And now we're in TensorFlow model analysis which reads in the save model. And runs it against all of the raw data. We call render slicing metrics from the Jupiter notebook. And you see the UI. And the thing to notice here is that this UI is immersive, right? It's not just a static picture that you can look at and go, huh, and walk away from. It lets you see your errors broken down by bucket or broken down by feature. And it lets you drill in and ask questions and be curious about how your models are actually treating various subsets of your population. Those subsets may be the lucrative subsets. Right? You really want to drill in. And then you want to serve your model. So, our demo, our example, has a one-liner here that you can run to serve your model. And make a client request. The thing to notice here is we're mic making a GRPC request to that server. We're takeing our feature tensors, sending them to the server, and back comes probability, right? That's not quite enough. We have heard a little bit of feedback about this server. And the thing we have heard is that GRPC is cool, but Rust is really cool. . But REST is really cool. I tried. This is one of the top feature requests on GitHub for model serveing. You can now pack your tensors into a JSON object. Send that JSON object to the server and get a response back via HTTP. Much more convenient. And I'm very excited to say that it will be released very soon. Very soon. I see the excitement out there. Back to the end-to-end. So, yeah. So, you can try all of these pieces, end-to-end, all on your local machine. Because they're using Apache Beam director rinser. And direct runners allow you to take your distributed job and run them all locally. If you swap in Apache Beam's Dataflow runner, you can run against the entire dataset in the cloud. The example also shows you how to run the big job against the cloud version as well. We're currently working with the community to develop a runner for Apache Flink, a runner for Spark. Stay tuned to the TensorFlow blog and to our GitHub. And you can find the example at TensorFlow/model-analysis. And back to Clemens. >> Thank you, Ras. [ Applause ] All right. We have heard about transform and how to use model analysis and how to serve them. You say you want more. Is that enough? You want more? All right. You want more. And I can think of why you want more. Maybe you read the paper we published and presented last year about TensorFlow expended. We laid out a broad vision of how this platform works within Google and the features and impact we have using it. And that figure one, which allows these boxes and described what TensorFlow extrovert Extended. It's overly simplified, but much more than today. We spoke about these four components of TensorFlow Extended. This is not yet an end-to-end machine learning platform. This is just a very small piece. These are the libraryies we have open sourced for you to use, but we haven't yet released the entire platform. We are working hard on this. We have seen the profound impact it had internally. How people could start using this platform and deploying machine learning in production using TFX. And we are working hard to make more of those come don'ts components available to you. And we are looking at the data components and you can analyze the data, visualize the distributions and detect anomalies. That's an important part of any machine learning pipeline. To detect changes and shifts in your data and anomalies. After this, looking at the horizontal pieces that help tie all of these components together. If they're all single libraries, you have to get them together. You have to use them individually. They have well-defined interfaces, but you have to combine them by yourself. Internally, we have the shared configuration framework that allows you to configure the entire pipeline. And the nice thing on the front end, allows you to monitor the status of these pipelines and see progress and inspect the different artifacts that have been produced by all of the components. So, this is something we're also looking to release later this year. And I think you get the idea. Eventual we want to make all of this available to the community. Because internally, hundreds of teams use this to improve our products. We really believe that this will be as transformative to the community as it is at Google. And we're working very hard to release these technologies in the platform to see what you can do with them in your products and companies. Keep watching the TensorFlow blogs for our future plans. And as mentioned, you can use some today, transform is released, model-analysis yesterday, serving is released. And the end to end example is available under this link. You can find it under the model analysis repo. Thank you from myself and Ras. I'm going to ask you to join me in welcoming a special external guest, Patrick Brandt, joining us from Coca-Cola who is going to talk about applied AI at Coca-Cola. Thank you. [ Applause ] >> Hey, great job. Thanks, Clemens. All right. So, yes, I'm Patrick. I'm a solutions strategyist for Coca-Cola. I'm going to share with you how we're using TensorFlow to support some of our large largest and most popular digital marketing programs in North America. We are going off on a marketing tangent before we come back. As background, what is proof of purchase and the relationship to marketing? As an example, back in the day, folks could clip the bar codes off their cereal boxes and mail the bar codes back into the cereal company to receive a reward. Some kind of coupon or prize back through the mail. And this is basic loyalty marketing. Brands, in this case, the cereal company, rewarding consumers who purchase. And at the same time, opening up a line of communication between the brand and the consumer. Now, over the last 15-some odd years of marketing digitization, this concept has evolved into digital engagement marketing. Consumers in the moment in real-time through web and mobile channels. But proof of purchase is still an important component of the experience. We have an active digital marketing program at Coca-Cola. They can earn a magazine sub corruption or the chance to subscription or a chance to win. We printed these 14-character product Pincodes. And these are what our consumers enter into our promotions. You can enter them in by hand. But on your mobile device, you can scan them. This had been the holy grail of marketing IT at Coke for a long time. We looked at commercial and open source optical character recognition software, OCR, but it could never read these codes very well. And the problem has to do with the code. These are 4x7, dot-matrix-printed. The printer head is an inch under the cap, and they are flying under the printer at a rapid rate. Creates visual artifacts, things that normal OCR can't handle well. We knew if we wanted to unlock this experience, we were going to have to build something from scratch. When I look at these codes, a couple of characteristics jump out at me. We're using a small alphabet. Let's say ten characters. And there's a decedent amount of variability in the presentation of these characters. This reminds me of MNIST, the online database of 60,000 handwritten digital images. And convolutional neural networks, extremely good at extracting the text. I'm probably going to tell you something you know, they work by breaking it down into smaller pieces and looking for edges and textures and colors. And these very granular feature activations are pooled up, often, into a more general feature layer. And then that's filtered. And those activations are pulled up and so on until the output of the neural net is run through a function which creates a probability distribution of the likelihood that a set of objects exists within the image. But they had a nice property, handleing the nature of images well. From our perspective, they can handle the tilt and twist of a bottle cap held in someone's hand. It's perfect. So, this is what we're going to use. We're going to move forward. Now we need to build our platform. That begins with training. The beating heart of any applied AI solution. And we knew we needed high quality images with accurate labels of the codes and we likely needed a lot of them. We started with a synthetic dataset of random strings over blank images and superimposed over blank backgrounds. This was a base for transfer learning once we created our real-world data set. We did a production run of caps and fridge packs and distributing those to multiple third-party displayers along with custom tools to scan the cap and label it with the pin Code. But an important component was an existing pincode validation service we have had in production for a long time to support our programs. So, any time a trainer labeled an image, we would send that label through our validation service, if it was a valid pincode, we knew we had an accurate label. This gets the model trained and now we need to release it to the wild. We had some aggressive performance requirements. We wanted 1 second average processing time. 95% accuracy at launch. And host the model remotely, for the web, and embed it natively on mobile devices to support mobile apps. So, this means that our model has to be small. Small enough to support over the air updates as the model improves over time. And to help us improve that model over time, we created an active learning UI, a user interface, that allows our consumers to train the model once it's in production. And that's what this looks like. So, if I was a consumer, scan a cap, and the model cannot infer a valid pincode, it sends a per-character confidence of every character at every position. It can render a screen like what you see here. I, as a user, am only directed to address the particularly low-confidence characters. I see the characters, tap the red ones, bring up the keyboard, I tap them, entered into the promotion. It's a good user experience for me. I scan a code and I'm only a few taps away from being entered into a promotion. But we have extremely valuable data for training. Because we have the image that created the invalid difference as well as the user-corrected label they needed to correct to get into the promotion. We can throw this into the hopper for future rounds of training to improve the model. All right. When you put it all together, this is what it looks like. User takes a picture of a cap, the image is normalized, then sent into our confident model. The output of which is a character probability matrix. This is the per-character competence of every character at every position. That is further analyzed to create a top ten prediction. Each one of those are fed into our pincode validation service. The first one that's valid, often the first one on the list, is entered into the promotion. And if none are valid, our user sees the active learning experience. So, our model development effort went through three big iterations. In an effort to keep the model size small up front, the data team used binary binary normalization and it didn't produce enough data to get the model. They switched and the model size was too large to support over the air updates. They start over. They just completely rearchitect the net using SqueezeNet, designed to reduce the size by reducing the number of learnable parameters within the model. After making this move, we had a problem. We started to experience internal co-variant shift, the result of reducing the number of learnable parameters. That means that very small changes to upstream parameter values cascaded to huge gyrations in downstream parameter values. This slowed our training process. We had to grind through this co-variant shift to get the model to converge, if it would converge at all. We introduced batch normalization, which sped up training. It got the model to converge. And now we're exactly where we want to be. We have a model, a 25 fold decrease where we started with accuracy greater than 95%. And the results are impressive. These are some screen grabs from a test site that I built. And you can see across the top row how the model handled different types of occlusion. It handles translation, tilting the cap, rotation, twisting the cap. And camera focus issues. So, you can try this out for yourself. I'm going to pitch the newly-launched Coca-Cola USA app. It hit Android and iPhone app stores a couple days ago. It does many things, and you can scan a code. You can go on with the mobile browser, take a picture of a cap code to be entered into a promotion. Quick shoutouts. I can't not mention these folks. Quantiphi built our model. Ellen Duncan, spearheaded this from the marketing side. And my people in IT, my colleague, Andy Donaldson, shepherded this into production. Thank you. It's been a privilege to speak with you. I covered a lot of ground in ten short minutes. There's a lot of stuff I didn't talk about. Please feel free to reach out on Twitter,@patrickbrandt. And wpb. is wpb.is/linkedin. You can read an article I published last year on this solution, on the Google developers >> AUDIENCE: . And you can get there at wpb. wpb.is/tensorFlow. Thank you. Next up is Alex. And Alex is going to talk to us about applied ML with robotics. [ Applause ] >> All right. Hi, everybody. I'm Alex from the brain robotics team. And in this presentation I'll be talking about how we use simulation and adaptation in some of our real-world robot learning problems. So, first, let me start by introducing robot learning. The goal of robot learning is to use machine learning to learn robotic skills that work in general environments. What we have seen so far is that if you control your environment a lot, you can get robots to do pretty impressive things. And where the techniques break down is when you try to apply these same techniques to more general environments. And the thinking is if you use machine learning, this can help learn from the environment and address the generalization issues. So, as a step in this direction, we have been looking at the problem of robotic grasping. This is a project we have been working on in collaboration with people at X. And to explain the product a bit, we have the real robot arm learning to pick up objects out of a bin. There is going to be a camera looking down over the shoulder of the arm into the bin. And from this RGB image, we're going to train a neural network to learn what commands it should send to the robot to successfully pick up objects. Now, we to want try to solve this task using as few assumptions as possible. So, importantly, we're not going to give any information about the geometry of what kinds of objects we're going pick up. And no information about the depth of the scene. So, in order to solve the task, the model needs to learn hand-eye coordination or see where it is in the camera image and then figure out where in the seam it is and then combine these two to figure out how it can move around. Now, in order to train this model, we're going to need a lot of data because it's a pretty large-scale image model. And our solution at the time for this was to simply use more robots. So this is what we called the arm farm. These are six robots collecting data in parallel. And if you have six robots, you can collect data a lot faster than if you only have one robot. So, using these robots, we were able to collect over a million attempted grasps over a million total of robot hours. And we were able to successfully train models to pick up objects. Now, this works, but it still took a lot of time to collect this dataset. This motivated looking into ways to reduce the amount of real-world data needed to learn these behaviors. One approach for doing this is simulation. So, in the left video here, you can see the images that are going into our model in our real-world setup. And on the right here you can see our simulated recreation of that setup. Now, the advantage of moving things into simulation, is that simulated robots are easier to scale. We have been able to spin up thousands of simulateed robots grasping various objects. And we were able to collect millions of grasps in just over eight hours instead of the weeks that were required for the original dataset. Now, this is good for getting a lot of data, but unfortunately, models trained in simulation tend not to transfer to the actual real-world robot. There are a lot of systemic differences between the two. One big one is the visual appearancens of different things. And another big one is just physical differences between our real-world physics and our simulated physics. What we did was we were able to train a model in simulation to get to around 90% grasp success. We then deployed to the real robot and it succeeds just over 20% of the time, which is a very big performance drop. So, in order to get good performance, you need to do something a bit more clever. This motivated looking into similar-to-real transfer, using simulate the data to improve your real-world sample efficiency. Now, there are a few different ways you can do this. One way is adding randomization into the simulator. You can change around the textures you apply to objects, changing their colors, changing how lighting is interacting with your scene. And you can play around with changing the geometry of what kinds of objects you're trying to pick up. Another way of doing this is the main adaptation. Which is a set of techniques for learning when rough you have two domains of data that have structure but are different. The two domains are the simulated and real robot data. And there are feature-level and pixel-level ways of doing this. In this work, we tried all of these approaches. And I'm going to focus primary on the domain adaptation side of things. So, in feature-level domain adaptation, what we're going to do is take our simulated data, take our real data, train the same model on both datasets. But then at an intermediate feature layer, a similarity loss. That's going to make the distribution of features to be the same across both domains. One approach for doing this is domain adversarial networks. This is implemented as a small neural net that tries to predict the domain based on the interview questions put features, and the rest of the model is trying to confuse the domain classifier as much as possible. Now pixel methods look at it from a different point of view. Instead of features, we're going to transform at the pixel level to look for realistic. We take a general adversarial network. We feed it an image from our simulator. It's going to output an image that looks more realistic. And then we're going to use the generator to train whatever task model that we want to train. Now, we're going to train both the generator and the task model at the same time. We found that in practice this was useful because it helps ground the generator output to be useful for training your downstream task. All right. So, taking a step back. Feature-level methods can learn domain-invariant features when you have data from related domains that aren't identical. Meanwhile, pixel-level methods can transform your data to look more like your real-world data, but in practice, they don't work perfectly and there's small artifacts and inaccuracieses from generateor output. You can use both methods. Not getting all the way there, but attach a feature-level to close the reality gap. And combine what we call the grasp, a combination of pixel-level and feature-level. In the left half of the video, a simulated grasp. In the right half, the output of the generator. You can see it's learning cool things in terms of drawing what the tray should look like. Drawing more realistic textures on the arm. Drawing shadows. It's learned how to draw shadows as the arm is moving around in the scene. It's not perfect. There are still odd splotches of color around, but it's definitely learning something about what it means for an image to look for realistic. Now, this is good for getting a lot of pretty images, but what matters for our problem is whether these images are actually useful for reducing the amount of real-world data required. And we find that it does. So, to explain this chart a bit, on the X axis is the number of real-world samples used. And we compared the performance of different methods as we vary the amount of real-world data given to the model. The blue bar is the performance with only simulated data. The red bar is the performance when we use only real data, and the orange bar is the performance when we use both simulated and real data and the domain adaptation methods I have been talking about. When we use just 2% of the original real world data set, we're able to get the same level of performance. This reduces the number of real-world samples we needed by up to 50 times, which is really exciting in terms of not needing to run robots for a long amount of time to learn these grasping behaviors. Additionally, we found that even when we give all of the real-world data to the model, when we give simulated data as well, we're still able to see improved performance. So, that implies that we haven't hit data capacity limits for this grasping problem. And finally, there's a way to train this setup without having real-world labels. And when we train the model in this setsing, we found that we were still able to get pretty good performance on the real-world robot. Now, this was a work of a large team across both brain as well as X. I would like to thank all of my collaborators. Here is a link to the original paper. And I believe there is also a blog post if people are interested in hearing more details. Thanks. [ Applause ] All right. And lastly, we have Sherol who is going to talk to you about art and music with machine learning. >> Amazing. Just amazing. It's really just in awe of what machine learning is capable of and how we can extend human capabilities. And we want to think more than just about, you know, discovering new approaches and new ways of using the technology. We to want see how it's being used how it impacts the human creative process. So, imagine you need to find or compose a drum pattern. You have some idea of -- of a drum beat that you would like to compose. And all you need to do now is go to a website where there's a pre-trained model of drum patterns sitting on the online. You just need a web browser. You give it some human input and you can generate a space of expressive variations. You can tune and control the type of outputs that you're getting from this generative model. And if you don't like it, you can continue going through it exploring this generative space. So, this is the type of work that project Magenta focuses on. To give you a bird's eye view of what Project Magenta is about, it basically is a group of researchers and developers and creative technologists that engage in generative models research. So, you'll see this work pub u pub lushed published at machine learning conferences, a lot of research contributions from Magenta. And you'll see the code, after it's been published, put into open source repository on GitHub in the Magenta repo. And from there, we'll see ways of thinking and designing creative tools that can enhance and extend human expressive creative process. And eventually ending up into the hands of artists and musicians. Inventing new ways we can create. And inventing new types of artists. So, I'm going to give three brief overviews of the highlights of some of our recent work. So, this is performance RNN. How many have seen this? This is one of the demos earlier today. A lot of people have seen and heard of this kind of work. This is what people think of as a generative model. How can we build a computer that has the kind of intuition to know the qualities of melody and harmony and expressing dynamics. It's more interesting to explore this in the browser enabled by TensorFlow.js. This is a demo we have running online. We have the ability to tune and control some of the output that we're getting. So, in a second, I'm going to show you this video of what that looks like. You would have seen it out on the demo floor. But we'll show you, and all of you watching online. We were able to bring it even more alive by connecting a baby grand piano that is also a midi controller and we have the ability to perform alongside the generative model, reading in the inputs from the human playing the piano. So, let's take a look. So, this is trained on classical music data from actual live performers. This is from a dataset that we got from a piano competition. [piano playing] -- I don't know if you noticed, this is Nikhil from earlier today. He's a talented young man. He helped build out the browser version. [piano playing] and so, we're thinking of ways that we take bodies of work, we train a model off of the data, then we create these open source tools that enable new forms of interaction, of creativity and of expression. And this is all these points of engagement are enabled by TensorFlow. The next tool I want to talk about that we have been working on is the variational autoencoders. How many people are familiar with latent space interpolation? Quite a few of you. If you're not, it's quite simple. You take human inputs and you train it through a neural network, compressing it down to an embedding space. You compress it down to some dimensionality and then you reconstruct it. So, you're comparing the reconstruction with the original. And trying to train -- build a space around that. And what that does, is that creates the ability to interpolate from one point to another, touching on the intermediate points where human may have not given input. So, the machine learning model may have never seen an example it's able to generate. It's building an intuition off of these examples. You can imagine if you're an animator, there's so many ways of going from cat to pig. How would you animate that? There's an intuition the artist would have in creating that sort of morphing from one to the other. We're able to have the machine learning model also do this now. We can also do this with sound, right? This technology actually carries over to multiple domains. So, this is NSynth. And we've released this I think sometime last year. And what it does, it takes that same idea of moving one input to another. Let's take a look. You'll get a sense of it. Piccolo to electric guitar . [sound moving back and forth] -- so, rather than recompose, or fading from one sound to the other, what we're actually able to do is we're able to find these intermediary, recomposed sound samples and produce that. So, it looks, you know, there's a lot of components to that, there's a wave net decoder. But really it's the same technology underlying the encoder, decoder. When we think about the types of tools that musicians use, we think less about training machine learning models. We think about drum pedals. Not drum pedals. Guitar pedals. They are used to refine sound to cultivate the art and flavor the musician is looking for. We don't think about parameter flags or trying to write lines of Python code to create the sort of art, you know, in general. So, what we've done, not just are we interested in finding and discovering new things, we're also interested in how those things get used in general. Used by practitioners. Used by specialists. And so, we've created a hardware. We've taken the piece of hardware. We have taken the machine learning model and put it into a box where a musician can plug in and explore this latent space and performance. Let's look at what musicians feel and what thing in this process. [synth music] >> It feels like we're turning a new corner of new possibilities. It could generate a sound that might inspire us. >> The fun part is you think you know what you're doing, there's a weird interaction happening that could give you something totally unexpected. >> I mean, it's great research, and it's really fun and it's amazing to discover new things, but even more amazing to see how it gets used how people think to create alongside it. And so, what's even better is it's just released. And in collaboration with the creative lab London. NSynth super. It's open source. All the specs are on GitHub. We talk about potential to the touch, and the code and what hardware it's running on. This is all available to everyone today. You can go online and check it out yourself. Now, music is more than just sound, right? It's actually a sequence of things that goes on. So, when we think about the -- this idea of what it needs to have a generative music space, we think also about melodies. And so, just like we have cat to pig, what is it like to go from one melody to the next? And moreover, once we have that technology, how does it -- what does it look like to create with that? You have this expressive space of variations. How do we design an expressive tool that takes advantage of it? And what will we get out of it? This is another tool that's developed by another team at Google to make use of melodies in a latent space. How interpolation works and then building a song or a composition with it. Take a listen. Say you have two melodies. [twinkle, twinkle little star --] And the middle. [melody is morphing] And extended. [melody is becomeing more complex] And we really are just scratching the surface of what's possible. How do we continue to have the machine learn and have a better intuition for what melodies are about? So, again, to bring it back full circle, we have, using different compositions and musical works, we're able to train a variational autoencoder to create an embedding space that builds tools that enable open source communities to design creative artist tools. To look at new ways of pushing the expressive boundaries that we currently have. This is, again, just released. It's on our blog. All the code is open source. And made available to you. And also enabled by TensorFlow. In addition to all these other things, including Nikhil, here enabled by the type of work and creativity and expressivity. And so, in wrapping up, I want to take us back to this demo that we saw. Now, the most interesting and maybe the coolest thing about this demo was that we didn't even know that it was being built until it was Tweeted by T Tero, a developer from Finland. And the fact of the matter is, this just -- we're barely scratching the surface. There's so much to do, so much to engage in and so much to discover. And we want to see so much more of this. We want to see more developers, more people sharing things and more people getting engaged. Not just developers, but artists and creatives as well. We to want explore and invent and imagine what we can do with machine learning together as an expressive tool. And so, go to our website, g. co/magenta, you'll find our publications and these demos. You can experience it yourself and more. And you can also join our discussion group. So, here's g. co/magenta. Join our discussion group. Become part of the community and share the things that you're building so we can do this alongside together. Thank you so much. [ Applause ] So, that's it for the talks today. We've had an amazing, amazing show. Amazing spread of speakers and topics. Now let's take a look at a highlight review of the day. [Music playing] >> Earlier this year, we hit the milestone of 11 million downloads. We are reallied really excited to see how much users are using this and the impact in fact world. >> We're very excited today to announce that we are joining the TensorFlow family, deeplearn .js. >> TensorFlow is an early stage project, we would like you to help build this future. >> I told you at the beginning, the mission for tf.data was to make a library that was fast, flexible and easy to use. >> So, I'm very excited to say that we have been working with other teams in Google to bring TensorFlow Lite to Google apps. >> In general, the Google brain team's mission is to make machines intelligent and use that ability to improve people's lives. I think that's good examples of where there's real opportunity for this. [ Applause ] >> So, hold on just a minute.
B1 中級 TensorFlow Dev Summit 2018 - ライブストリーム (TensorFlow Dev Summit 2018 - Livestream) 5 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語