字幕表 動画を再生する
(bell dinging)
- Hello, and welcome to another Beginner's Guide
to Machine Learning with ml5.js in JavaScript.
So I'm here.
It's been a while since I added a video to this playlist,
and a bunch of things
about the ml5 library itself have changed.
There's a new release, 0.3.1.
There is a brand new website,
which you can find right here at ml5js.org.
So to some extent, this video is really an update
about the library, but I'm also going to look
at one particular feature,
a new feature of the library, sound classification.
The machine learning model that I'm gonna use
in this video is the Speech Command Recognizer,
and this is a model available from Google
as part of TensorFlow.js models.
Now, so this is a really important distinction.
I am not here to train a sound classifier.
I might do that in a future video
and show you about how to apply transfer learning,
which is something I did with images, also to sounds.
I just gonna make use of a freely available,
pre-trained machine learning model.
Anytime you use one of those things,
even in just a playful and experimental way,
which is what I'm doing,
it's good to do a little bit of research
and take a look at like well, how was this trained,
what the data, what are the considerations
around how the data was collected?
And so I encourage you to read through the read me
here on GitHub and in particular,
to click over and read the original paper
about this speech commands model,
and there you'll see, if you look,
it talks about some of the datasets
like Mozilla's Common Voice dataset,
500 hours from 20,000 different people,
this LibriSpeech, 1,000 hours of read English speech.
I don't know how to say this, TIDY DIGITS,
TIDIGITS, T DIGITS, 25,000 digit sequences,
which apparently was probably neat, right?
It's just like hours and hours of me reading
this random number book over and over again.
But so I encourage you to check out this paper,
and you can also find code for how to use this model
at TensorFlow.js in the tfjs models, GitHub repo itself.
I also want to interrupt this video for a second
to talk about how the sound classifier actually works.
This is kind of a surprising little tidbit,
and I'll come back to this more
if at some point I create a video
about training your own sound classifier.
Now, there different ways you could do this.
This isn't the way to make a sound classifier,
but this is the way that this particular model works.
It's actually shockingly,
amazingly doing image classification.
So if you image we have this thing
that's called a convolutional neural network.
This is the underlying architecture,
the structure of that machine learning model
that does the classification.
Typically this kind of model is something
that we would put images in.
Like we might have images of cats.
We might have an image of a turtle.
That's not really turtle, but whatever.
So the idea is that we're sending these images in
and getting back a label
and maybe a confidence score.
So it's the same idea.
The only thing is now we wanna send in audio
and get back a label like up
or one and a confidence score.
So how would we convert sound into an image?
Now, again, there are other neural network architectures
which you could receive sound data
in maybe a more direct fashion,
but if you have ever looked at a graphic equalizer
or some type of sound visualization system,
I've made examples like this in p5,
you can draw something that's often referred
to as the spectrogram,
which is basically a graph of all the various amplitudes
of frequencies, the wave patterns of the sound itself.
So if we took a one second spectrogram
and made that into an image,
we could then send that image
into a convolutional neural network
saying that's the image that is produced
from the spectrogram of somebody saying the word, up.
So underneath the hood, this machine learning system,
even though it's designed to work with audio data,
it first takes that audio data,
converts it into an image
and then sends it through a very similar types
of neural network architecture
to standard image classification models.
And you can read more about that in that paper itself.
However, I'm gonna show you how to access this model
in a quick way with the ml5 library.
And this is the new as of today, which is I dunno.
What's today's date?
June 13th, 2019 (laughing).
I'm gonna show you how to use this with the ml5 library
as it stands today.
So I'm gonna click here under reference.
One thing you should see, there's a lot of new features
have been added to the ml5 library.
I'm gonna come back and do videos about more of those,
but the one I wanna highlight is sound classifier.
So I'm gonna click on this,
and for all of the different functions available in ml5,
you'll find a documentation page
with some narrative documentation,
a little bit of a code snippet
and then some written documentation
about what the function names are
and the various parameters and things like that.
And by the way, I'm noticing now (laughing).
This will hopefully not read.
This is like a mistake (laughing).
This is documentation that's actually
for either Body-Pix or maybe the U-Net model,
which does something called image segmentation.
So we gotta get that fixed.
I'm sure many GitHub issues and fixes
will be out and done by the time you see this.
So in case you've forgotten how to use the ml5 library,
I'm just gonna show you as it's documented
on the ml5 webpage.
So first of all, you can go here to this Quickstart.
You can actually just click on this
open p5 web editor sketch with ml5js added.
You know what, I'm gonna so that.
That's the way I'm gonna do it.
But you also could just put a script tag in your HTML page
referencing the current version of the library,
which, as I said, is 0.3.1 as of today,
but probably while you're watching it,
it will be a higher number.
So lemme go and just open up this link here,
and now I'm in the p5 web editor.
You could see the name of the sketch is ml5js boilerplate.
Thank you, Joey Lee who's a contributor to ml5.
He's done a ton of work on the website
and all of the different features.
And oh, this should actually be 3.1.
I'm gonna fix that, uh-huh.
I'm gonna hit save, and then I'm gonna rename it
to sound classifier.
And I am going to then go over here
and go to sketch.js,
and I'm then I'm gonna run this,
and we should see.
There we go.
So now we know it's working
because there's a little console log
to log ml5.version.
If I hadn't imported the ml5 library,
I wouldn't see that, and we see that here.
So, what are we gonna do?
Let's load the sound classifier.
Now, most of the models, I haven't been using this
in my previous videos,
most of the models in ml5 are now actually available to you
in preload, meaning you don't need a callback function.
You can just load the model in preload,
and it'll be ready by the time you get to setup.
So I'm gonna make a variable called soundClassifier.
In preload, I'm gonna say soundClassifier
equals ml5.soundClassifier.
Now, I need to tell it
what model I want to load.
So I need to, in here, put the name
of the model I wanna load,
and in theory, in the future,
there might be a bunch of different options,
different kinds of sound classifiers
or maybe a sound classifier you've trained yourself
that you wanna put in there,
and I'll come back eventually,
show you videos about how to do that.
But for right now, I'm just gonna say
SpeechCommands,
and then I already forgot what it was called.
So I'm gonna go back to the ml5 website, which is here.
I'm gonna go to reference.
I'm gonna go to soundClassifier,
and I'm looking for it here.
So it's SpeechCommands18w.
This is a particular model
that's been trained on 18 specific words,
and you can see what those are.
The 10 digits from zero to nine,
up, down, left, right, go, stop, yes, no, that's 18.
10 digits, eight different words.
All right, so now I'm gonna go,
so it was 18w,
and then, once that model is loaded,
I need a callback.
So I could just say soundClassifier.Classify,
and I might just call it gotResults.
So in other words, I'm.
Oh, it's not defined, right?
So I'm telling the sound classifier to classify.
Now, by default, it's just going to listen
to the microphone's audio.
Maybe in the future, part of ml5 will be able to offer
hooks to how you can, to connect it
to a different audio source,
but it's basically just gonna work
with the microphone's audio.
Then I can write a function called gotResults,
and I'm gonna get rid of the draw loop
'cause I don't need that right now.
Lemme just turn off auto refresh
so that it doesn't keep refreshing.
And then now, if you remember,
ml5 employs error first callbacks,
meaning the callback function requires two arguments,
an error argument in case something went wrong,
and a data or results or some other argument
where the actual stuff is.
So I'm gonna say error,
and then I'm gonna say results.
And then I could do a little like basic error handling.
I'm just gonna say console.log
something went wrong,
and then I can also actually log the error, all right.
And then, so now,
and then I'm gonna say console.log(results).
So let's see if we get anything.
Oh, I have to run it again.
And you could ignore this error.
Oh, (gasping) something came in!
Ready?
Up.
I just wanna stop and mention
that if you're following this along,
hopefully your browser is asking for permission
to use the microphone.
The reason why that didn't happen here in this video
is because I've already set my browser
to allow use of the microphone on the p5 Web Editor pages,
but for security, you can't just access anybody's microphone
from a webpage without the user giving permission.
So hopefully you saw that happen,
and if you you didn't,
that might be why you run into an error
if you haven't given that permission.
This is getting a little hard to debug
just because so much stuff is happening here
on the console and this huge arrays,
but there's actually something that I missed
that I could add here, which is an options variable.
So one of the things I could tell,
there's a lot of things I can set as properties
or parameters for how the sound classifier should work,
but there's a very simple one,
which I'm gonna just look it up in the documentation
'cause I don't remember.
It's called the probabilityThreshold.
I'm actually just gonna copy-paste this here.
What this means is basically
the sound classifier is going to trigger an event.
Right now I'm console logging all of this information
about what it thinks it heard
based on a confidence level for how sure it is
it heard one of those keywords.
And right now, a lot of those events are triggering
because I don't know
what the default probability threshold is.
Maybe it was .7.
Maybe it's .5, but I'm gonna make that really high.
I'm gonna say .95.
So it has to have a 95%,
the machine learning model has to calculate a 95%
confidence score before it
gives the event back to me in ml5.
Once I've created that options variable with .95,
I need to pass it into the constructor
as the second argument.
So now we pass it in there.
I'm gonna run the sketch.
I'm gonna say the keyword up,
and then I'm gonna try to look into the console
to see if that's what came in.
Up.
And there we go.
Look at that!
Now other stuff is coming in, but you saw it there!
So rather than kind of debug with the console,
let me actually put what I said
onto the webpage itself.
Also, to make this easier to see,
let me actually console.log(results index zero label
and results index zero,
I believe it's called confidence.
So rather than have this big array logging in the console,
let me do this.
All right, we need to have a 95% confidence,
and I'm gonna run this.
Up.
Three, four, five,
six, seven, eight.
I'm quickly adding background color white
to the HTML body
because what I wanna then do, just quickly,
before I finish this off, but to finish this off,
let me just add a DOM element using the p5 DOM library.
I'm gonna just say resultP for results paragraph.
I'm gonna say resultP equals createP
waiting, and then, right now, I'm gonna say
resultP.html.
Then I could turn these results into a string
by using a string literal.
So back tick and then put curly brackets.
Put a colon here and curly brackets
and a closed back tick, okay.
And let me also say resultP.style,
is it font size, font-size,
just 32 point so we'll be able to see it.
All right, here we go.
Ready for this?
One, two, five,
up, down, left, right.
Okay, so (clapping),
you could imagine now what you could do with this.
For example, you could control a game with your voice.
And in fact, I'm gonna do that
in one of my coding challenge videos.
So take a look in this video's description.
I'm gonna do a coding challenge where I program
the Google Dinosaur game,
and then I'm gonna add this sound classifier
to have the dinosaur jump
except it won't be a dinosaur.
It'll be a unicorn,
to have the unicorn jump when I say the keyword, up.
All right, thanks for watching
this additional ml5 tutorial video
about sound classification in the browser.
(bell dinging)
(energetic dance music)
(bell dinging)