字幕表 動画を再生する
-
(bell dinging)
-
- Hello, and welcome to another Beginner's Guide
-
to Machine Learning with ml5.js in JavaScript.
-
So I'm here.
-
It's been a while since I added a video to this playlist,
-
and a bunch of things
-
about the ml5 library itself have changed.
-
There's a new release, 0.3.1.
-
There is a brand new website,
-
which you can find right here at ml5js.org.
-
So to some extent, this video is really an update
-
about the library, but I'm also going to look
-
at one particular feature,
-
a new feature of the library, sound classification.
-
The machine learning model that I'm gonna use
-
in this video is the Speech Command Recognizer,
-
and this is a model available from Google
-
as part of TensorFlow.js models.
-
Now, so this is a really important distinction.
-
I am not here to train a sound classifier.
-
I might do that in a future video
-
and show you about how to apply transfer learning,
-
which is something I did with images, also to sounds.
-
I just gonna make use of a freely available,
-
pre-trained machine learning model.
-
Anytime you use one of those things,
-
even in just a playful and experimental way,
-
which is what I'm doing,
-
it's good to do a little bit of research
-
and take a look at like well, how was this trained,
-
what the data, what are the considerations
-
around how the data was collected?
-
And so I encourage you to read through the read me
-
here on GitHub and in particular,
-
to click over and read the original paper
-
about this speech commands model,
-
and there you'll see, if you look,
-
it talks about some of the datasets
-
like Mozilla's Common Voice dataset,
-
500 hours from 20,000 different people,
-
this LibriSpeech, 1,000 hours of read English speech.
-
I don't know how to say this, TIDY DIGITS,
-
TIDIGITS, T DIGITS, 25,000 digit sequences,
-
which apparently was probably neat, right?
-
It's just like hours and hours of me reading
-
this random number book over and over again.
-
But so I encourage you to check out this paper,
-
and you can also find code for how to use this model
-
at TensorFlow.js in the tfjs models, GitHub repo itself.
-
I also want to interrupt this video for a second
-
to talk about how the sound classifier actually works.
-
This is kind of a surprising little tidbit,
-
and I'll come back to this more
-
if at some point I create a video
-
about training your own sound classifier.
-
Now, there different ways you could do this.
-
This isn't the way to make a sound classifier,
-
but this is the way that this particular model works.
-
It's actually shockingly,
-
amazingly doing image classification.
-
So if you image we have this thing
-
that's called a convolutional neural network.
-
This is the underlying architecture,
-
the structure of that machine learning model
-
that does the classification.
-
Typically this kind of model is something
-
that we would put images in.
-
Like we might have images of cats.
-
We might have an image of a turtle.
-
That's not really turtle, but whatever.
-
So the idea is that we're sending these images in
-
and getting back a label
-
and maybe a confidence score.
-
So it's the same idea.
-
The only thing is now we wanna send in audio
-
and get back a label like up
-
or one and a confidence score.
-
So how would we convert sound into an image?
-
Now, again, there are other neural network architectures
-
which you could receive sound data
-
in maybe a more direct fashion,
-
but if you have ever looked at a graphic equalizer
-
or some type of sound visualization system,
-
I've made examples like this in p5,
-
you can draw something that's often referred
-
to as the spectrogram,
-
which is basically a graph of all the various amplitudes
-
of frequencies, the wave patterns of the sound itself.
-
So if we took a one second spectrogram
-
and made that into an image,
-
we could then send that image
-
into a convolutional neural network
-
saying that's the image that is produced
-
from the spectrogram of somebody saying the word, up.
-
So underneath the hood, this machine learning system,
-
even though it's designed to work with audio data,
-
it first takes that audio data,
-
converts it into an image
-
and then sends it through a very similar types
-
of neural network architecture
-
to standard image classification models.
-
And you can read more about that in that paper itself.
-
However, I'm gonna show you how to access this model
-
in a quick way with the ml5 library.
-
And this is the new as of today, which is I dunno.
-
What's today's date?
-
June 13th, 2019 (laughing).
-
I'm gonna show you how to use this with the ml5 library
-
as it stands today.
-
So I'm gonna click here under reference.
-
One thing you should see, there's a lot of new features
-
have been added to the ml5 library.
-
I'm gonna come back and do videos about more of those,
-
but the one I wanna highlight is sound classifier.
-
So I'm gonna click on this,
-
and for all of the different functions available in ml5,
-
you'll find a documentation page
-
with some narrative documentation,
-
a little bit of a code snippet
-
and then some written documentation
-
about what the function names are
-
and the various parameters and things like that.
-
And by the way, I'm noticing now (laughing).
-
This will hopefully not read.
-
This is like a mistake (laughing).
-
This is documentation that's actually
-
for either Body-Pix or maybe the U-Net model,
-
which does something called image segmentation.
-
So we gotta get that fixed.
-
I'm sure many GitHub issues and fixes
-
will be out and done by the time you see this.
-
So in case you've forgotten how to use the ml5 library,
-
I'm just gonna show you as it's documented
-
on the ml5 webpage.
-
So first of all, you can go here to this Quickstart.
-
You can actually just click on this
-
open p5 web editor sketch with ml5js added.
-
You know what, I'm gonna so that.
-
That's the way I'm gonna do it.
-
But you also could just put a script tag in your HTML page
-
referencing the current version of the library,
-
which, as I said, is 0.3.1 as of today,
-
but probably while you're watching it,
-
it will be a higher number.
-
So lemme go and just open up this link here,
-
and now I'm in the p5 web editor.
-
You could see the name of the sketch is ml5js boilerplate.
-
Thank you, Joey Lee who's a contributor to ml5.
-
He's done a ton of work on the website
-
and all of the different features.
-
And oh, this should actually be 3.1.
-
I'm gonna fix that, uh-huh.
-
I'm gonna hit save, and then I'm gonna rename it
-
to sound classifier.
-
And I am going to then go over here
-
and go to sketch.js,
-
and I'm then I'm gonna run this,
-
and we should see.
-
There we go.
-
So now we know it's working
-
because there's a little console log
-
to log ml5.version.
-
If I hadn't imported the ml5 library,
-
I wouldn't see that, and we see that here.
-
So, what are we gonna do?
-
Let's load the sound classifier.
-
Now, most of the models, I haven't been using this
-
in my previous videos,
-
most of the models in ml5 are now actually available to you
-
in preload, meaning you don't need a callback function.
-
You can just load the model in preload,
-
and it'll be ready by the time you get to setup.
-
So I'm gonna make a variable called soundClassifier.
-
In preload, I'm gonna say soundClassifier
-
equals ml5.soundClassifier.
-
Now, I need to tell it
-
what model I want to load.
-
So I need to, in here, put the name
-
of the model I wanna load,
-
and in theory, in the future,
-
there might be a bunch of different options,
-
different kinds of sound classifiers
-
or maybe a sound classifier you've trained yourself
-
that you wanna put in there,
-
and I'll come back eventually,
-
show you videos about how to do that.
-
But for right now, I'm just gonna say
-
SpeechCommands,
-
and then I already forgot what it was called.
-
So I'm gonna go back to the ml5 website, which is here.
-
I'm gonna go to reference.
-
I'm gonna go to soundClassifier,
-
and I'm looking for it here.
-
So it's SpeechCommands18w.
-
This is a particular model
-
that's been trained on 18 specific words,
-
and you can see what those are.
-
The 10 digits from zero to nine,
-
up, down, left, right, go, stop, yes, no, that's 18.
-
10 digits, eight different words.
-
All right, so now I'm gonna go,
-
so it was 18w,
-
and then, once that model is loaded,
-
I need a callback.
-
So I could just say soundClassifier.Classify,
-
and I might just call it gotResults.
-
So in other words, I'm.
-
Oh, it's not defined, right?
-
So I'm telling the sound classifier to classify.
-
Now, by default, it's just going to listen
-
to the microphone's audio.
-
Maybe in the future, part of ml5 will be able to offer
-
hooks to how you can, to connect it
-
to a different audio source,
-
but it's basically just gonna work
-
with the microphone's audio.
-
Then I can write a function called gotResults,
-
and I'm gonna get rid of the draw loop
-
'cause I don't need that right now.
-
Lemme just turn off auto refresh
-
so that it doesn't keep refreshing.
-
And then now, if you remember,
-
ml5 employs error first callbacks,
-
meaning the callback function requires two arguments,
-
an error argument in case something went wrong,
-
and a data or results or some other argument
-
where the actual stuff is.
-
So I'm gonna say error,
-
and then I'm gonna say results.
-
And then I could do a little like basic error handling.
-
I'm just gonna say console.log
-
something went wrong,
-
and then I can also actually log the error, all right.
-
And then, so now,
-
and then I'm gonna say console.log(results).
-
So let's see if we get anything.
-
Oh, I have to run it again.
-
And you could ignore this error.
-
Oh, (gasping) something came in!
-
Ready?
-
Up.
-
I just wanna stop and mention
-
that if you're following this along,
-
hopefully your browser is asking for permission
-
to use the microphone.
-
The reason why that didn't happen here in this video
-
is because I've already set my browser
-
to allow use of the microphone on the p5 Web Editor pages,
-
but for security, you can't just access anybody's microphone
-
from a webpage without the user giving permission.
-
So hopefully you saw that happen,