Placeholder Image

字幕表 動画を再生する

  • thank you so much for coming to our session this morning.

  • I'm Sarah Sarah Jovin.

  • I'm on the Tensorflow Light team, and we work on bringing machine learning to mobile and small devices.

  • And later on I will introduce my colleague Andrew Sally, who will be doing the second half of this talk.

  • So the last couple of days have been really fun for me.

  • I've gotten to meet and speak with many off you, and it's been really nice to see the excitement around tensorflow light.

  • And today I'm happy to be here and talk to you about all the work that our team is doing to make machine.

  • Learning on small device is possible and easy, so in today's talk, we'll cover three areas.

  • First, we'll talk about why machine learning directly on device is important and how it's different than what you may do on the server.

  • Second will walk you through what we have built with tensorflow light and lastly will show you how you can use tensorflow light in your own APs.

  • So first, let's talk about devices for a bit.

  • What do we mean when we say a device while usually a mobile device basically our phones, so our phones are with us all the time.

  • We interact with them so many times during the day, and more than phones come with a large number off sensors on them, which give us really rich data about the physical world around us.

  • Another category of devices is what we call edge devices, and this industry has seen a huge explosion in the last few years.

  • To some examples are smart speakers smartwatches smart cameras?

  • And as this market has grown, we see that technology, which only used to be available on more expensive devices, is now available on far cheaper ones.

  • So now we're seeing that there is this massive growth and devices they're becoming increasingly capable, both mobile on edge.

  • And this is opening up many opportunities for novel applications for machine learning.

  • So I expect that many off you are already familiar with the basic idea of machine learning.

  • But for those that aren't, I'm going really quickly cover the core concept.

  • So let's start with an example off something that we may want to do.

  • Let's say classifications off images.

  • So how do we do this?

  • So in the past, what we would have done was to write a lot of rules that were hard ported, very specific about some specific characteristics that we expected to see in parts of the image.

  • This was time consuming, hard to do and frankly didn't work all that well.

  • And this is where machine learning comes in with machine learning, we learn based on examples.

  • So a simple way to think about machine learning is that we use algorithms to learn from data, and then we make predictions about similar data that has not been seen before.

  • So it's a two step process forced the mortal Lorne's, and then we use it to make predictions.

  • The process of mortal learning is what we typically called training, and when the model is making predictions about data is what we call inference.

  • This is a high level view or what's happening during training.

  • The model is passed in label data that is, input data, along with the associative prediction and since in this case we know what the right answer is, we're able to calculate the error that is, how many times is the model getting it wrong and by how much we used these errors to improve the model, and this process is repeated many, many times until we reach the point that we think that the model is good enough or that this is the best that we can do.

  • This involves a lot of steps in coordination, and that is why we need a framework to make this easier.

  • And this is very tense of Low comes in.

  • It's Google's framework for machine learning.

  • It makes it easy to train and build neural networks, and it is cross platform.

  • It works on CIB use GPS to abuse as well as mobile and embedded platforms, and the mobile and embedded piece of tensorflow, which we call tensorflow light, is what we're gonna be focusing on in our talk today.

  • So now we want to talk about why would you consider doing machine learning directly on device?

  • And there's several reasons that you may consider.

  • But probably the most important one is Leighton.

  • See if the processing is happening on the device, then you're not sending data back and forth to the server.

  • So if you're use case involves real time processing off data such as audio or video than it's quite likely that you would consider doing this.

  • Other reasons are that your processing can happen.

  • Even when your device is not connected to the Internet, the data stays on device.

  • This is really useful if you're working with sensitive user data, which you don't wantto put on servers.

  • It's more power efficient because your devices not spending power transmitting data back and forth.

  • And lastly, we're in a position to take advantage off all the sensor data that's already available and accessible on the device.

  • So this is all great.

  • But there's a catch like they're always is, And the catch is that doing on device ML is hard.

  • Many of these devices have some pretty tight constraints.

  • They have small batteries, tight memory and very little computation Power Tensorflow was built for processing on the server, and it wasn't a great fit for these use cases.

  • And that is the reason that we built tensorflow light.

  • It's a lightweight machine learning library for mobile and embedded platforms, so this is a high level or review of the system.

  • It consists of a converter where we convert models from Tensorflow format to tens of low light format and for efficiency reasons.

  • We use a format, which is different than it consists of an interpreter, which runs and device.

  • There are library off ops and cardinals, and then we have a B I's, which allow us to take advantage of hardware acceleration whenever it is available.

  • Tensorflow Light is cross platform, so it works on Android, IOS, Lennox and a high level devil upward workflow here would be to take a train tensorflow model converted to Tensorflow light format and then update your APS to use the Tensorflow light interpreter using the appropriate A P I on.

  • Iowa's developers also have the option off using corn amount instead.

  • And what they would do here is to take their train tensorflow model and converted to core ML using the Tensorflow Decorum Alcon border and then use the converted model with the core Am l wrong time.

  • So the two common questions that we get when we talk to developers of our tensorflow light is Is it small?

  • And is it fast?

  • So let's talk about the first question.

  • One of our fundamental design goals of tensorflow light was to keep the memory and binary size small, and I'm happy to say that the size off our core interpreter is only 75 kilobytes.

  • And when you include all the supported ops, the sizes 400 kilobytes.

  • So how did we do this?

  • So, first of all, we've been really careful about which dependencies we include.

  • Secondly, Tensorflow Light uses flat buffers, which are far more memory efficient than protocol.

  • Buffers are one other feature that I want to call out.

  • You're in tensorflow Light is what we call selective registration and that allows developers to only use the ops that their model needs and as they can keep the footprint small now moving on to the second question, which is off speed.

  • So we made several design choices throughout the system to enable fast start up, low laden see and high throughput.

  • So let's start with the mortal file format.

  • Tensorflow light users, flight buffers, like I said, and Flat Buffers is across Black Forum efficient serialization library.

  • It was originally created at Google for game development and is now being used for other performance sensitive applications.

  • The ad wanted refusing flat buffers is that we can directly access the data without doing parsing or UNP arcing off the large files which contained waits.

  • Another thing that we do it at the time of conversion is that we prefer use the activations and biases, and this leads to faster execution.

  • Later.

  • At Runtime, the Tensorflow Light interpreter uses a static memory and static execution plan.

  • This leads to faster load times, many off the colonel's that tons of low light comes red have been specially optimized to run fast on the on unarmed sea views.

  • Now let's talk about hardware acceleration.

  • As machine learning has grown in prominence, it has for quite a bit of innovation at the silicon Larry.

  • And many hardware companies are investing in building custom chips, which can accelerate neural network processing.

  • GPO's and GI ESPYs, which have been around for some time, are also now being increasingly used to do machine learning tasks.

  • Tensorflow Light was designed to take advantage of hardware acceleration, whether it is through GP.

  • Use d Espy's or custom ai ai chips on Android.

  • The recently released on Joy Neural Network A.

  • B I is an abstraction layer, which makes it easy for tensorflow light to take advantage of the underlying acceleration.

  • The way this works is that hardware renders right specialized drivers or custom acceleration code for their hardware platforms and integrate with the android and an FBI tensorflow.

  • Light, in turn, integrates with the Android and FBI via its internal delegation, a.

  • B I.

  • A point to note here is that developers only need to integrate their APS with tens of low light.

  • Tensorflow light will take care off, abstracting away the details off hardware acceleration from them.

  • In addition to the Android, n N a.

  • P.

  • I were also working on building direct GPU Acceleration in Tensorflow Light GPO's are widely available in use.

  • And, like I said before, they're now being increasingly used for doing machine learning tasks similar to an FBI.

  • Developers only integrate with tensorflow light if they want to take advantage of the GPU acceleration.

  • So the last bit on performance that I want to talk about this corn ization.

  • And this is a good example, often optimization, which cuts across several components in our system.

  • First of all, what is scorned ization?

  • A simple way to think about it is that it refers to techniques to store numbers and to perform calculations on numbers in formats that are more compact than 32 bed floating point representations and why is this important?

  • Well, for two reasons.

  • First, model size is a concern for small devices, so the smaller the model, the better it is.

  • Secondly, there are many processors which have specialized Cindy instruction sets, which process fixed point numbers much faster than their process floating point numbers.

  • So the next question here is how much accuracy do we lose if we're using eight beds or 16 bits instead off the 32 bits, which are used for representing floating point numbers?

  • Well, the answer obviously, depends on which model that we're using.

  • But in general, the learning process is robust to noise and want.

  • Ization can be taught off as a form of noise.

  • So what we find is that the accuracy is tend to be usually within acceptable threshold.

  • A simple way of doing want ization is to shrink the weights and biases after trading, and we are shortly going to be releasing a tool which developers can use to shrink the size of their models.

  • In addition to that, we have been actively working on doing quantum ization, a training time, and this is an active area off ongoing research, and what we find here is that we are able to get accuracy ese, which are comparable to the floating point models for architectures like Mobile that as well as inception.

  • And we recently released a tool which allows developers to use this and we're working on adding support for more models in this.

  • Okay, so I talked about a bunch of performance optimization Sze.

  • Now let's talk about what does it translate to in terms of numbers?

  • So we benchmark two models mobile.

  • Let an inception Vetri on the pixel two.

  • And as you can see here, we're getting speed ups of more than three times.

  • When we compare quantities models running on tensorflow light versus floating Point models running on tensorflow.

  • I'll point out here that these numbers do not include any hardware acceleration.

  • We've done some initial benchmarking with hardware acceleration and we see additional speed ups off 3 to 4 times with that, which is really promising and exciting.

  • So stay tuned in the next few months to hear more on that.

  • Now that I've talked about the design of tensorflow and performance, I want to show you what tensorflow light can do in practice.

  • Let's please roll the video so this is a simple demo application, which is running the mobile Let classification model, which we trained on common office objects.

  • And as you can see, it's doing a good job detecting them.

  • Even this tensorflow logo that we trained this mortal on.

  • Like I said, it's cross platform.

  • So it's running on Iowa's Isabella's Android, and we also are running it here on android pains.

  • This was a simple demo.

  • We have more exciting demos for you later on in the talk.

  • Now let's talk about production use cases.

  • I'm happy to say that we've been working with a partner.

  • Teams inside Google to bring tons of low life to Google maps.

  • So portrait mode on android camera.

  • Hey, Google and Google Assistant and Smart Reply are some features which are going to be powered by tensorflow light in the next few months.

  • Additionally, Tensorflow light is the machine learning engine, which is powering the custom model functionality in the newly announced ML kit.

  • And for those of you that may have missed the announcement, a market is a machine learning sdk.

  • It exposes both on device and cloud powered AP eyes for machine learning as well as the ability to bring your own custom models and use them.

  • These are some examples of facts that are already using tensorflow light were a milk it big start.

  • It's a really popular photo editing and collage making up and whisk.

  • Oh, it's a really cool photography app, so back to tensorflow light and what is currently supported.

  • So we have support for 50 commonly used operations which developers can use in their own moderns.

  • I will point out here that if you need an up which is not currently supported, you do have the option off using what we call a custom off and using that and later on in this talk, Andrew will show you how you can do that up.

  • Support is currently limited to inference.

  • We will be working on adding training support in the future.

  • We support several common popular open source models as well as the quantities counterparts for some of them.

  • And with this, I'm gonna and white my colleague Andrew to talk to you about how you can use tensorflow light in your own APS.

  • Thanks, Sarah.

  • So now that you know what tensorflow light is and what it could do and where it could be run.

  • I'm sure you know what you want to know how to use it so we can break that up into four important steps.

  • The 1st 1 and probably the most important is get a model you need to decide what you want to do.

  • It could be image classification.