Placeholder Image

字幕表 動画を再生する

  • [DING]

  • Hello, and welcome to another Beginner's Guide to Machine

  • Learning video tutorial.

  • In this video, I am going to cover

  • the pre-trained model, PoseNet.

  • And I'm going to look at what PoseNet is,

  • how to use it with the ml5,js library with the p5.js library,

  • and track your body in the browser in real time.

  • The model, as I mentioned, that I'm looking at,

  • is called PoseNet.

  • [MUSIC PLAYING]

  • With any machine learning model that you

  • use, the first question you probably want to ask is,

  • what are the inputs?

  • [MUSIC PLAYING]

  • And what are the outputs?

  • [MUSIC PLAYING]

  • And in this case, the PoseNet model

  • is expecting an image as input.

  • [MUSIC PLAYING]

  • And then as output, it is going to give you

  • an array of coordinates.

  • [MUSIC PLAYING]

  • In addition to each of these xy coordinates,

  • it's going to give you a confidence score for each one.

  • [MUSIC PLAYING]

  • And what do all these xy coordinates correspond to?

  • They correspond to the keypoints on a PoseNet skeleton.

  • [MUSIC PLAYING]

  • Now, the PoseNet skeleton isn't necessarily

  • an anatomically correct skeleton.

  • It's just an arbitrary set of what

  • is 17 points that you can see right over here,

  • from the nose all the way down to the right ankle, that it is

  • trying to estimate where those positions are

  • on the human body, and give you xy coordinates,

  • as well as how confident is that it's

  • correct about those points.

  • One other important question you should ask yourself and do

  • some research about whenever you find yourself using

  • a pre-trained model out of the box, something

  • that somebody else trained, is who trained that model?

  • Why did they train that model?

  • What data was used to train that model?

  • And how is that data collected?

  • PoseNet is a bit of an odd case, because the model itself,

  • the trained model is open source.

  • You can use it.

  • You can download it.

  • There's examples for it in TensorFlow and tensorflow.js

  • and ml5,js.

  • But the actual code for training the model,

  • from what I understand or what I've been able to find,

  • is closed source.

  • So there aren't a lot of details.

  • A data set that's used often in training models

  • around images is COCO, or Common Objects In Context.

  • And it has a lot of labeled images

  • of people striking poses with their keypoints marked.

  • So I don't know for a fact whether COCO

  • was used exclusively for training PoseNet,

  • whether it was used partially or not at all.

  • But your best bet for a starting point

  • for finding out as much as you can about the PoseNet model

  • is to go directly to the source.

  • The GitHub repository for PoseNet,

  • in fact there's a PoseNet 2.0 coming out.

  • I would also highly suggest you read the blog post "Real-time

  • Human Post Estimation in the Browser with TensorFlow.js"

  • by Dan Oved and editing and illustrations from Irene

  • Alvarado and Alexis Gallo.

  • So there's a lot of excellent background information

  • about how the model was trained and other relevant details.

  • If you want to learn more about the COCO image data set,

  • I also would point you towards the Humans of AI project

  • by Philip Schmidt, which is an artwork, an online exhibition

  • that takes a critical look at the data in that data

  • set itself.

  • If you found your way to this video, most likely,

  • you're here because you're making interactive media

  • projects.

  • And PoseNet is a tool that you could

  • use to do real time body tracking very quickly

  • and easily.

  • It's frankly, pretty amazing that you could do

  • this with just a webcam image.

  • So one way to get started, which in my view,

  • is one of the easiest ways, is with the p5 Web Editor,

  • in the p5.js library, which very, so I have a sketch here

  • which connects to the camera and just

  • draws the image in a canvas.

  • Also want to make sure you have the ml5,js library imported,

  • and that would be through a script tag in index at HTML.

  • Once you've got all that set up, we're ready to start coding.

  • So I'm going to create a variable called PoseNet.

  • I'm going to say PoseNet equals ml5.posenet.

  • All the ml5 functions are initialized the same way,

  • by referencing the ml5 library dot the name of the function,

  • in this case, PoseNet.

  • Now typically, there's some arguments that go here.

  • And we can look up what those arguments are,

  • by going to the documentation page.

  • Here we can see there are a few different ways

  • to call the PoseNet function.

  • I want to do it the simplest way possible.

  • I'm just going to give it the video element and a callback

  • for when the model is loaded, which I don't even

  • know that I need.

  • [MUSIC PLAYING]

  • I'll make sure there are no errors and run this again.

  • And we can see PoseNet is ready.

  • So I know I've got my syntax right.

  • I've called the PoseNet function,

  • I've loaded the model.

  • The way PoseNet works is actually

  • a bit different than everything else in the ml5 library.

  • And it works based on event handlers.

  • So I want to set up a pose event by calling this method on.

  • On pose, I want this function to execute.

  • Whenever the PoseNet model detects a pose,

  • then call this function and give me the results of that pose.

  • I can add that right here in setup.

  • PoseNet on pose.

  • And then I'm going to give it a callback called, got poses.

  • [MUSIC PLAYING]

  • And now presumably, every single time it detects a pose,

  • it sees me, it sees my skeleton, it

  • will log that to the console right here.

  • Now that it's working, I can see a bunch

  • of objects being logged.

  • Let's take a look at what's inside those objects.

  • The p5 console is very useful for your basic debugging.

  • In this case, I really want to dive deep into this object

  • that I'm logging here, the poses object.

  • So in this case, I'm going to open up the actual developer

  • console of the browser.

  • I could see a lot of stuff being logged here very, very quickly.

  • I'm going to pick any one of these and unfold it.

  • So I can see that I have an array.

  • And the first element of the array is a pose.

  • There can be multiple poses that the model is

  • detecting if there's more than one person.

  • In this case, there's just one.

  • And I can look at this object.

  • It's got two properties, a pose property and a skeleton

  • property.

  • Definitely want to come back to the skeleton property.

  • But let's start with the pose property.

  • I can unfold that, and we could see, oh my goodness,

  • look at all this stuff in here.

  • So first of all, there's a score.

  • I mentioned that with each one of these xy

  • positions of every keypoint, there is a confidence score.

  • There is also a confidence score for the entire pose itself.

  • And because the camera's seeing very little of me,

  • it's quite low, just at 30%.

  • Then I can actually access any one of those keypoints

  • by its name.

  • Nose, left eye, right eye, all these, all the way

  • down once again to right ankle.

  • So let's actually draw something based

  • on any of those keypoints.

  • We'll use my nose.

  • I going to make the assumption that there's always only going

  • to be a single person.

  • If there were multiple people, I'd

  • want to do this differently.

  • And I'm going to make a, hit stop.

  • I'm going to make a variable called pose.

  • Then I'm going to say, if it's found a pose,

  • and I can check that by just checking

  • the length of the array.

  • If the length of the array is zero,

  • then pose equals poses index zero.

  • I'm going to take the first pose from the array

  • and store it into the global variable.

  • But actually, if you remember, the object in the array

  • has two properties, pose and skeleton.

  • So it seems there's a lot of redundant lingo here,

  • but I'm going to say, posesindex0.pose.

  • [MUSIC PLAYING]

  • This could be a good place to use the confidence score.

  • Like, only if it's like of a high confidence actually

  • use it.

  • But I'm just going to take any pose that it gives me.

  • Then in the draw function, I can draw something

  • based on that pose.

  • So for example, let me give myself a red nose.

  • [MUSIC PLAYING]

  • So now if I run the sketch, ah, so I got an error.

  • So why did I get that error?

  • The reason why I got that error is it

  • hasn't found a pose yet, so there

  • is no nose for it to draw.

  • So I should always check to make sure there is a valid pose

  • first.

  • [MUSIC PLAYING]

  • Then draw that circle.

  • And there we go.

  • I now have a red dot always following my nose.

  • If you're following along, pause the video

  • and try to add two more points where your hands are.

  • Now there isn't actually a hand keypoint.

  • It's a wrist keypoint.

  • But that'll probably work for our purposes,

  • I'll let you try that.

  • [TICKING]

  • [DING]

  • How did that go?

  • OK, I'm going to add it for you now.

  • [MUSIC PLAYING]

  • Let's see if this works.

  • Whoo.

  • This is working terribly.

  • It could, I'm almost kind of getting it right.

  • And there we go.

  • But why is it working so poorly?

  • Well, first of all, I'm barely showing,

  • I'm only showing it from my waist up.

  • And most likely, the model was trained on full body images.

  • [MUSIC PLAYING]

  • Now I turned the camera to point at me over here,

  • and I'm further away.

  • And you can see how much more accurate

  • this is, because it seems so much more of my body.

  • I'm able to control where the wrists are

  • and get pretty good accurate tracking as I'm standing

  • further away from the camera.

  • There are also some other interesting tricks

  • we could try.

  • For example, I could estimate distance from the camera

  • by looking at how far apart are the eyes.

  • [MUSIC PLAYING]

  • So for example here, I'm storing the right eye and left eye

  • location in separate variables, and then

  • calling the p5 distance function to look

  • at how far apart they are.

  • And then, I could just take that distance

  • and assign it to the size of the nose.

  • So as I get closer, the nose gets bigger.

  • And you almost can't tell, because it's sizing relative

  • to my face.

  • But it gives it more of a realistic appearance

  • of an actual clown nose that's attached,

  • by changing its size according to the proportions of what

  • it's detecting in the face.

  • You might be asking yourself, well,

  • what if I want to draw all the points,

  • all the points that it's tracking?

  • So for convenience, I was referencing each point by name.

  • Right eye, left eye, nose, right wrist.

  • But there's actually a keypoints array

  • that has all 17 points in it.

  • So I can use that to just loop through everything

  • if that's what I want to do.

  • [MUSIC PLAYING]

  • So I can loop