字幕表 動画を再生する
[MUSIC PLAYING]
NICHOLAS GILLIAN: OK.
We've just heard from the TensorFlow Light Team
how it's getting even easier to place machine
learning directly on devices.
And I'm sure this got many of you thinking,
what's possible here?
Now, I'm going to tell you about how
we're using Jacquard to do exactly this,
embed machine learning directly into everyday objects.
Before I jump into the details, let
me tell you a little bit about Jacquard platform.
So Jacquard is a machine learning
powered ambient computing platform
that extends everyday objects with extraordinary powers.
At the core of the Jacquard platform is the Jacquard tag.
This is a tiny embedded computer that
can be seamlessly integrated into everyday objects,
like your favorite jacket, backpack, or pair of shoes.
The tag features a small embedded ARM processor
that allows us to run ML models directly on the tag with only
sparse gesture or motion predictions
being emitted via BLE to your phone when detected.
What's interesting is that the tag has a modular design, where
the ML models can either run directly
on the tag as a standalone unit or via additional low-power
compute modules that can be attached along
with other sensors, custom LEDs, or haptic motors.
A great example of this is the Levi's Trucker Jacket
that I'm wearing.
Let me show you how this works.
So if we can switch over to the overhead camera,
so here I can take Jacquard tag and add it
to a specifically designed sensor module which
is integrated into the jacket.
Let me check that again.
What happens now is that this talks to an M0 processor that's
running on the jacket itself, which
is talking to some integrated sensor lines in the jacket.
The M0 processor not only reads data from the sensor lines,
but it also allows us to run ML directly on the tag.
This allows us to do gestures, for example, on the jacket
to control some music.
So for example, I can do a double tap gesture,
and this can start to play some music.
Or I can use a cover gesture to silence it.
Users can also use swipe in and swipe out gestures
to control their music, drop pins on maps,
or whatever they'd like, depending
on the abilities in the app.
What's important here is that all of the gesture recognition
is actually running on the M0 processor.
This means that we can run these models at super low power,
sending only the events to the user's phone via the Jacquard
app.
So I'm sure many of you are wondering how we're actually
getting our ML models to be deployed
in this case in a jacket.
And by the way, this is a real product
that you can go to your Levi's store and buy today.
So as most of you know, there are three big on-device ML
challenges that need to be addressed to enable platforms
like Jacquard.
So first is how can we train high-quality ML models that
can fit on memory-constrained devices?
Second, let's assume we've solved problem one and have
a TensorFlow model that's small enough to fit within our memory
constraints.
How can we actually get it running
on low compute embedded devices for real-time inference?
And third, even if we solve problems one and two,
it's not going to be a great user experience
if the user has to keep charging their jackets
or backpacks every few hours.
So how can we ensure the ML model's always
ready to respond to a user's actions
when needed while still providing multi-day experiences
on a single charge?
Specifically for Jacquard, these challenges have mapped
to deploy models as small as 20 kilobytes,
in the case of the Levi's jacket,
or running ML models on low-compute microprocessors,
like a Cortex-M0+, which is what's embedded here
in the cuff of the jacket.
To show you how we've addressed these challenges for Jacquard,
I'm going to walk you through a specific case study for one
of our most recent products, so recent, in fact, that it
actually launched yesterday.
First, I'll describe the product at a high level,
and then we can review how we've trained and deployed
ML models that in this case fit in your shoe.
So the latest Jacquard-enabled product is called GMR.
This is an exciting new product that's
being built in collaboration between Google, Adidas,
and the EA Sports FIFA Mobile Team.
With gamer GMR, you can insert the same tag that's
inserted in your jacket into an Adidas insole
and go out in the world and play soccer.
So you can see here where the tag inserts at the back.
The ML models in the tag will be able to detect
your kicks, your motion, your sprints, how far you've run,
your top speed.
We can even estimate the speed of the ball as you kick it.
Then after you play, the stream of predicted soccer events
will be synced with your virtual team in the EA FIFA Mobile
Game, where you'll be rewarded with points by completing
various weekly challenges.
This is all powered by our ML algorithms that run directly
in your shoe as you play.
So GMR is a great example of where
running ML inference on device really pays off,
as players will typically leave their phone in the locker room
and go out and play for up to 90 minutes
with just the tag in their shoes.
Here, you really need to have the ML models run directly
on-device and be smart enough to know when to turn off
when the user is clearly not playing soccer
to help save power.
So this figure gives you an idea of just how interesting
a machine learning problem this is.
Unlike, say, normal running, where
you would expect to see a nice, smooth, periodic signal
over time, soccer motions are a lot more dynamic.
For example, in just eight seconds of data here,
you can see that the player moves
from a stationary position on the left, starts to run,
breaks into a sprint, kicks the ball,
and then slows down again to a jog
all within an 8 second window.
For GMR, we needed our ML models to be responsive enough
to capture these complex motions and work across a diverse range
of players.
Furthermore, this all needs to fit within the constraints
of the Jacquard tag.
For GMR, we have the following on-device memory constraints.
We have around 80 kilobytes of ROM,
which needs to be used not just for the model widths,
but also the required ops, the model graphs, and, of course,
the supporting code required for plumbing everything
together so this can be plugged into the Jacquard OS.
We also have around 16 kilobytes of RAM,
which is needed to buffer the raw unused sensor data can also
be used as scratch buffers for the actual ML
inference in real-time.
So how do we train models that can detect kicks, a player's
speed and distance, and even estimate the ball
speed within these constraints?
Well, the first step is we don't--
well, at least initially.
We train much larger models in the cloud
to see just how far we can push the model's performance.
In fact, this is using TFX, which
is one of the systems that was shown earlier today.
This helps inform the design of the problem space
and guide what additional data needs to be collected
to boost the model's quality.
After we start to achieve good model performance
without the constraints on the cloud,
we then use these learnings to design much smaller models that
start to approach the constraints of the firmware.
This is also when we start to think about not
just how the models can fit within the low compute
and low memory constraints, but how they can run at low power
to support multi-day use cases.
For a GMR, this led us to design an architecture that
consists of not one, but four neural networks that all
work coherently.
This design is based on the insight
that even during an active soccer match,
a player only kicks the ball during a small fraction
of gameplay.
We therefore use much smaller models
that are tuned for high recall to first predict
if a potential kick or active motion is detected.
If not, there's no need to trigger the larger, more
precise models in the pipeline.
So how do we actually get our multiple neural networks
to actually run on the tag?
To do this, we have built a custom C model exporter.
For this, the model exporter is using
a Python tool that calls a number of C ops from a lookup.
This then generates custom C code with a lightweight ops
library that can be shared across multiple graphs,
and the actual .H and .C code that you get for each model.
This allows us to have zero dependency overheads
for our models and make every byte count.
Here, for example, you can see an example of one of the C ops
that would be called by the library.
So this is for a rank three transpose operation,
which supports multiple IO types,
such as int8s or 32-bit floats.
So with this, you can see how we're
taking our neural networks and actually
getting them to run on the Jacquard tag live.
I hope that you're inspired by projects like Jacquard,
and this makes you think about things that you could possibly
do with tools like TF Lite Micro to actually build
your own embedded ML applications.
[MUSIC PLAYING]