Placeholder Image

字幕表 動画を再生する

  • COLTON OGDEN: All right.

  • Hello world.

  • This is CS50 on Twitch, and today we're joined

  • by CS50's Nick Wong, who's been a teaching fellow here

  • for a couple of years.

  • And today he's going to teach-- oh, what are you going to teach us today?

  • What are we going to talk about?

  • NICK WONG: So we're going to be talking about binary classifiers

  • using TensorFlow and Keras.

  • And so TensorFlow was actually a project that was developed and built by Google.

  • And Google has their hands in everything,

  • but TensorFlow is really cool because it basically took

  • the whole linear algebra-ness of machine learning and made it very simple.

  • And then Keras is a wrapper on top of TensorFlow

  • that actually made even that simplification a little bit simpler.

  • COLTON OGDEN: Let's transition to your awesome little screensaver back here.

  • Yeah, so like an introduction to machine learning using some nice open source

  • libraries, some Google base-- is Keras a Google open source library as well?

  • NICK WONG: I believe so, yeah.

  • COLTON OGDEN: OK.

  • TensorFlow-- I know everyone's heard of TensorFlow, or at least a lot of us

  • have.

  • I know almost nothing about machine learning,

  • so this will be a learning experience for me as well,

  • but I'm super excited to talk about binary classifiers and what they are.

  • NICK WONG: Awesome.

  • So yeah, I'll talk a little bit about the theory first.

  • I won't talk too much on theory.

  • I know people aren't usually huge fans.

  • I like the theory, I think it's really cool,

  • but we're going to just talk about it a little bit

  • to motivate and contextualize what I'm actually going to start

  • saying and coding in front of you.

  • Before I do that, I just wanted to point out this screen saver

  • that I've got going on here.

  • I think it's super cool.

  • I'm a huge fan.

  • It's technically CMatrix.

  • That's the actual program that's running.

  • But I put that through another program called

  • lolcat, which is one of my favorite "messing with people"

  • or troll programs.

  • If you're ever trying to mess with someone,

  • you can display lolcat and stuff through a screen.

  • COLTON OGDEN: We'll have a separate stream on how to troll people,

  • how to hack into a--

  • NICK WONG: One of my favorite things.

  • COLTON OGDEN: And this screensaver is in a shell, right?

  • NICK WONG: Right.

  • So this is in a shell.

  • If I press any key, then I'm back in my shell,

  • and I can go back to coding and doing all sorts of cool things.

  • But I like having it there.

  • It's entertaining.

  • It just drains power, so you want to be plugged in.

  • COLTON OGDEN: We have a few comments in the chat there.

  • MKLOPPENBURG says, "Hey everybody."

  • BELLA_KIRS says, "Hello."

  • WHIPSTREAK, BELLA, and I can't read that last one.

  • NICK WONG: Yeah, it's hard to read.

  • COLTON OGDEN: It looks like ILYAS.

  • They all say, Hi, Nick.

  • NICK WONG: Oh, awesome.

  • Hello guys.

  • Appreciate it.

  • Very happy to be here.

  • COLTON OGDEN: We have a lively chat, so we'll definitely read off the messages

  • as we get them in the chat.

  • NICK WONG: Sweet.

  • All right.

  • So machine learning.

  • It's one of my favorite buzzwords that everyone throws around,

  • along with like blockchain, AI--

  • what are some other ones?

  • There are some really good ones out there.

  • Bitcoin people that are out there.

  • COLTON OGDEN: Yeah, blockchain.

  • They've been massively-- yeah, those last couple years.

  • NICK WONG: Everyone keeps throwing them out there, and they're all great.

  • And so I think one of the things that I heard this last summer where

  • I was working, someone said that machine learning is just fancy statistics

  • wrapped up in a computer.

  • And I was like, yeah, not really.

  • Like in concept, yes, but not necessarily in what

  • they actually intended.

  • What they were meaning is, like, you could just

  • use statistics to do the same thing.

  • And when I was walking through our science center today,

  • I was looking at the old punch card style computers.

  • And I was like, yeah, technically I can run statistics on that too,

  • but it's different.

  • The game has changed a little bit given our computing power, our memory,

  • and things like that.

  • COLTON OGDEN: Little bit apples and oranges type deal.

  • NICK WONG: Yeah.

  • It's just like, yeah, you're not wrong really, but you're pretty off.

  • And so machine learning is just a very broad field.

  • It really means just any way in which you

  • get a computer to figure things out.

  • And a lot of it's modeled after how humans learn.

  • So we-- well, in concept.

  • It's modeled after the way our brains are structured.

  • So a lot of things are called neurons.

  • So you hear a lot of just jargon that gets thrown out.

  • You'll hear, like, tensors, neurons, layers, models, things like that.

  • COLTON OGDEN: I've heard, like, neural nets, for example.

  • NICK WONG: Right, neural network.

  • And those sorts of things can be very intimidating,

  • but they generally have very well grounded meanings.

  • So for example, a neural net, that just means that, like our brain,

  • they took a bunch of individual nodes and they linked them all together,

  • and that's all that it really ends up meaning.

  • Now, there's some nuance to that there's some complexity,

  • but in concept, it's actually pretty straightforward.

  • So what we're going to do is we are technically

  • going to build a neural network today.

  • COLTON OGDEN: That's cool.

  • NICK WONG: It is what is required for-- or not required,

  • but it's what we're going to use for the binary classification.

  • And that brings us to what binary classifiers are,

  • which is if you have a bunch of data, can I

  • tell you if it's in one of two groups?

  • So it's either going to be group A or group B.

  • And that can mean all sorts of things for us.

  • COLTON OGDEN: So if we're looking at pictures of apples versus pictures

  • of oranges, what's an apple--

  • NICK WONG: Is it an apple or an orange?

  • COLTON OGDEN: --what's an orange?

  • By the way, DARKSLAYERX says, "How often do you stream?"

  • So we typically stream, as of the last couple of weeks,

  • this is a fairly new thing, but usually four days a week,

  • so Monday, Tuesday, Wednesday, Friday.

  • Thursday is a little bit of a busy day right now for CS50.

  • We're shooting CS50 seminars.

  • You're doing a seminar, right?

  • NICK WONG: Yep.

  • I will be also doing a seminar.

  • COLTON OGDEN: Do you know what it is on yet?

  • NICK WONG: It'll be on web development with Python and Django.

  • COLTON OGDEN: OK, awesome.

  • NICK WONG: It's a completely different--

  • COLTON OGDEN: Completely unrelated.

  • NICK WONG: They're not even related.

  • COLTON OGDEN: But yeah, pretty frequently.

  • Our goal is to do this roughly three to four times per week.

  • It looks like WHIPSTREAK also tossed in that's it on the CS50 Facebook page.

  • Yes, it is on the CS50 Facebook page most definitely.

  • NICK WONG: The human equivalent of check demand page.

  • COLTON OGDEN: Yeah, yeah.

  • Yeah, basically.

  • NICK WONG: One of my favorite responses.

  • That's awesome.

  • COLTON OGDEN: MKLOPPENBURG-- Django rocks.

  • NICK WONG: Yes, I agree.

  • And we use I guess Flask and-- what is it?

  • Flask, Jijnja2, and MySQL in CS50, and I'm a huge fan of Django

  • which kind of wraps that all together.

  • But today, we'll be just using Python and we're

  • going to talk about [INAUDIBLE] stuff.

  • COLTON OGDEN: Separate stream.

  • We'll do a separate Django stream for it.

  • NICK WONG: Yeah, completely different.

  • Yeah, exactly.

  • COLTON OGDEN: By the way, you will be doing another stream--

  • NICK WONG: Yes.

  • COLTON OGDEN: --coming up on Linux commands.

  • So Nick will do-- if anybody is curious on how

  • he got his awesome show operating this way,

  • we'll dive into the basics of how that works with the basic Linux commands.

  • NICK WONG: Yeah.

  • That's next Friday, right?

  • COLTON OGDEN: Yeah, next Friday.

  • NICK WONG: Sweet.

  • Yeah.

  • So feel free to tune into that if you think this is cool

  • or if you want to learn more about Linux.

  • All right.

  • So we have binary classifiers.

  • We're just separating things into two categories.

  • And that sounds like an intuitive thing.

  • If I show you a picture of a square and a picture of a triangle,

  • you'll know that one is not like the other.

  • But for machines, that can be really complex.

  • And actually, the theory underneath that seems very difficult.

  • And so machines, early on, that was one of the easiest problems

  • that they could try and tackle solving.

  • And it's still hard.

  • It's not an easy problem, but it is one of the easiest.

  • COLTON OGDEN: Identifying basically simple shapes, those sort of things?

  • NICK WONG: Right.

  • Can I just say, is this this or not?

  • So is it a shape or not?

  • Is it dark or not?

  • And so what we're going to actually start with--

  • and for you guys, I actually started a GitHub repository

  • that you guys are welcome to kind of pull code from as we go--

  • COLTON OGDEN: Oh, do you want to--

  • NICK WONG: --which we might want to bring that up.

  • COLTON OGDEN: --let me know what that is and I'll type it into chat?

  • NICK WONG: Yeah.

  • So it's at GitHub.com/powerhouseofthecell--

  • COLTON OGDEN: Let me get the--

  • Sorry, I can't read it too well.

  • Powerhouse.

  • Oh, you're right.

  • Yeah.

  • Literally--

  • NICK WONG: Literally powerhouse.

  • COLTON OGDEN: You changed your Twitch handle a little bit from last time.

  • NICK WONG: Yeah, I had to change that, unfortunately.

  • COLTON OGDEN: Is_it_real.

  • NICK WONG: Yeah, exactly.

  • So our eventual goal for today, what we will get to by the end,

  • is given pictures of cartoon people versus real pictures, like a portrait,

  • can we distinguish which one is real?

  • COLTON OGDEN: That's an interesting--

  • NICK WONG: Can I tell us-- is something cartoon or is it not?

  • And the reason I thought that was really cool

  • is there is no necessarily apparent programmatic way

  • to distinguish the two.

  • You can't tell based on color.

  • You can't necessarily tell based on orientation or type of shape.

  • Maybe for really bad approximations in cartoon structure,

  • you could maybe guess at that, but even then, that might not be true.

  • COLTON OGDEN: I was going to say, if I were to take an initial stab at this

  • completely naively, I would probably build the cartoon classifier and then

  • work my way up to real humans because I feel like that's--

  • it's almost a simplification or an abstraction of that.

  • NICK WONG: Right, exactly.

  • And so that's what my point was, is that even given very limited tools--

  • we have basically no theory that we've talked about.

  • We don't have to talk about likelihood maximization or any sort of summation

  • notation, none of that.

  • We can still do that.

  • We can get really high accuracies with pretty minimal tools.

  • And they're all open source.

  • They're all available anywhere you want to go.

  • So I think that's really cool.

  • And I think when I was beginning CS, that was super inspiring to me, so I

  • like sharing that with people.

  • COLTON OGDEN: Cool.

  • NICK WONG: We're also going to use a third library I forgot to mention.

  • It's called OpenCV or Open Computer Vision.

  • I think it was UC Irvine's project.

  • And I actually used that my freshman year,

  • but I've used it ever since because it is

  • one of the primary libraries for reading images,

  • for just pulling them in and doing something like that.

  • But the problem with OpenCV is they update their documentation possibly

  • less consistently than college kids go to sleep.

  • It's so all over the place.

  • It does say MIT license.

  • Sorry.

  • To respond with WHIPSTREAK's question, which is, is it an MIT license?

  • I'm in my GitHub.

  • Why?

  • Isn't this Harvard?

  • MIT license is actually kind of a broad license.

  • It's generally used for personal projects or projects

  • that you want to be able to share it with the world

  • but you do want them to acknowledge you or just reference the fact that it's

  • not necessarily theirs.

  • It's one of my favorite licenses to use on GitHub.

  • Another common one that you might see is the GNU license as well as

  • the Apache license.

  • Those two or those three are pretty well-known.

  • And then there's a bunch of other license options

  • that I don't understand super well.

  • COLTON OGDEN: And then FERNANDO also says, "Hi."

  • Hi, Fernando.

  • NICK WONG: Hey, Fernando.

  • COLTON OGDEN: Thanks for joining us today.

  • NICK WONG: Awesome.

  • So we're going to get going a little bit on our environment and what we're doing

  • and how we actually go.

  • So I'm in Bash.

  • This is my command prompt.

  • I just like the color blue a lot.

  • Ironically, I'm wearing red, but that's OK.

  • And so this is my Bash prompt.

  • If you're not super familiar with Bash or Linux shell, don't worry.

  • I might refer to with a couple of words like terminal, console, Bash, shell.

  • They all mean the same thing.

  • I just mean here.

  • I can type stuff ls, which has all sorts of stuff on there.

  • I'm a mess.

  • I can do ps to see what's actually running-- very little.

  • There's a bunch of other commands, and we'll go into them next Friday.

  • But what I'm going to do is I actually have a technology called

  • Virtualenvwrapper, and what that means is

  • I can basically create these separate little Python modules,

  • Python environments for myself.

  • And so that means that if I have some projects that uses Python 2.5 and then

  • my current project uses Python 3.7 because I'm not in the 20th century--

  • sorry, I like to jab at the old Pythons--

  • then I can separate everything out and not

  • worry about colliding dependencies or colliding version numbers.

  • I can keep everything very clean.

  • COLTON OGDEN: Or even two versions of the same library.

  • NICK WONG: Exactly, which would be-- that is the worst.

  • I have done that many times, and it's awful.

  • So we're going to call that, this version.

  • Is_it_real.

  • It'll match with my GitHub, and I'll try to be very consistent.

  • And you'll see that when I run this mkvirtualenv,

  • but I'm not really necessarily going to go into.

  • There are a lot of tutorials online on how to install Virtualenvwrapper.

  • That is what I'm using, though.

  • And it takes a little bit to go, but it creates all these ways for me

  • to actually interact with Python in an isolated place.

  • And you'll notice that, over here, it puts a little--

  • it prepends a part to my prompt that says

  • that I'm in that virtual environment at any one time.

  • So I can leave it by just saying, deactivate,

  • just for those of you that are gone, and then I

  • can work on it using this command.

  • Cool.

  • So I'm also going to create a folder for this.

  • I normally would have done some make directory or mkdir,

  • but I'm actually going to just go into the one that I--

  • oops.

  • It's called is_it_real.

  • My apologies.

  • I'm going to go into one that I already created just because already cloned

  • my GitHub repository.

  • If you cloned it, then you'll actually end up being in a place like this,

  • and you'll have these two files in there--

  • just the readme and the license.

  • We are literally creating this from scratch.

  • COLTON OGDEN: That's the best way to do it.

  • NICK WONG: So I apologize if we have some syntax errors.

  • You'll get to see some really live debugging.

  • COLTON OGDEN: We've had plenty of that on the ones that I've done, so--

  • NICK WONG: That's good to hear, because there's going to be plenty on my end.

  • All right, cool.

  • So we're in a Python virtual environment,

  • and we need to pick up a bunch of packages.

  • So we're going to do that live too, which never goes well.

  • I've never had this code correct on the first go.

  • So we're going to pick up some kind of maybe not conventional ones,

  • or they aren't the names of the packages that I said at the very beginning.

  • We're picking NumPy and SciPy.

  • And those are very common.

  • Machine learning, any sort of numbers or data

  • analysis-- you're going to usually see those.

  • NumPy or NumPy makes things very fast generally speaking,

  • and it's going to give you a bunch operations with numbers.

  • I'm going to actually let this run while I talk.

  • Oh good, they're cached.

  • Thank god.

  • They're kind of large, so they're kind of a pain.

  • And then SciPy is a bunch of science-like tools.

  • So if you're analyzing data, it's really useful.

  • COLTON OGDEN: It's literally In the name--

  • SciPy.

  • NICK WONG: Yeah.

  • It's always like Python is super intuitive.

  • It's meant to be user friendly.

  • The next thing that we're going to grab--

  • and I always forget what it's called.

  • I think it's opencv-python.

  • Their package name is not super intuitive because the package itself

  • when you import it-- thank god--

  • is called cv2.

  • However, we're using cv3.

  • And it's still called cv2, so there's that.

  • And you download it as opencv-python, which is absurd.

  • So I don't really like their naming scheme, but that's OK.

  • COLTON OGDEN: WHIPSTREAK says, "Nick and Colton are acting like best friends.

  • Are they?"

  • We're definitely good friends.

  • NICK WONG: We are very good friends.

  • COLTON OGDEN: We taught a games course together actually,

  • so we do have a little bit of history here.

  • NICK WONG: Yeah.

  • Colton's awesome.

  • I actually got to watch him full powerhouse last year

  • teaching his course that he went and did the whole nine yards on.

  • I was just there for the ride.

  • It was very cool.

  • COLTON OGDEN: We flipped the situation around a little bit,

  • so now you are teaching and I am observing.

  • NICK WONG: I appreciate it.

  • COLTON OGDEN: It's fun.

  • It all comes around.

  • TWITCHHELLOWORLD-- "I don't see the Livestream video.

  • Is everyone seeing a Livestream video?"

  • The video should definitely be up.

  • I see it on my Twitch.

  • Definitely refresh the page and see if that fixes it.

  • NICK WONG: Very possible.

  • COLTON OGDEN: Astly, who's NUWANA333, says, "Hi everyone.

  • Yes TWITCHHELLOWORLD."

  • Good to see you, Astly.

  • But yeah, if you have any issues, definitely refresh the page.

  • I'm seeing it live, and It looks like everyone else

  • is seeing it live as well.

  • NICK WONG: Yeah, good luck.

  • Debugging browser stuff is one of my least favorite tasks.

  • It's why I work in Python.

  • All right.

  • So let's see.

  • We've picked up NumPy, SciPy, opencv.

  • We also need Pillow, which is not an intuitive package that you would want.

  • It's capitalized too, so just in case you encounter some weird errors there.

  • And Pillow, it's the-- what is it?

  • Python image library.

  • So that translates to PIL.

  • They call it Pillow and it's capitalized, but when you import it,

  • you don't.

  • You don't import Pillow anywhere really.

  • Some of these packages are kind of absurd.

  • So then, we're also going to install the stars

  • of our show, which are TensorFlow--

  • so shout out to Google for making that accessible to everybody--

  • and Keras, which is a wrapper on top of TensorFlow.

  • It makes it a little bit higher level.

  • It's a little bit easier to interact with.

  • And that way, you're not dealing with as much terminology,

  • so you're not sitting there being like, what's a tensor?

  • What are the dimensions?

  • And all these things.

  • Keras takes care of that.

  • COLTON OGDEN: Shout out to also how easy it is to install all these packages.

  • That's probably why Python is so successful.

  • NICK WONG: Gotta love Python.

  • I am a huge fan of Python.

  • I think Colton is too.

  • Python is just very well done.

  • You're seeing maybe a different output than you might see on your screen

  • because I have used these packages pretty extensively,

  • so they're all cached on my machine.

  • It just means that they've been saved before.

  • And you'll see some bars that go across, and that should be correct.

  • Cool.

  • So we've downloaded all of our packages.

  • If we wanted to test that out, we could go into the Python REPL

  • by just typing Python and say, like, "import tenserflow,"

  • and that should not fail.

  • That would suck if it did it.

  • And maybe "from keras.models import--" that might-- yeah,

  • I think that was probably right. "import Model"

  • I'm also going to assume a lot of Python syntax,

  • so if you have any questions, feel free to post it in the chat

  • and we'll be happy to answer it.

  • And so yeah, we have both terrace--

  • sorry-- TensorFlow and Keras.

  • And then the last one that we want to double check that we have is cv2.

  • COLTON OGDEN: It looks like Keras even has

  • a implicit dependency on TensorFlow--

  • NICK WONG: Yes.

  • COLTON OGDEN: --it looks like with that line there.

  • NICK WONG: Yeah, exactly.

  • So actually, Keras has this thing.

  • I think most people use TensorFlow now underneath it.

  • It's kind of the back end for it.

  • But it can also use Theano or Theano, which is a different machine learning

  • background or back end.

  • It's very cool, also very powerful.

  • I just happen to use TensorFlow.

  • So you can use either one.

  • We're going to just talk about TensorFlow though.

  • Cool.

  • So we can exit out of that with Control-D

  • or by typing in the word "exit" and parentheses.

  • And now we're going to actually start creating some code

  • or writing some code.

  • So what I'm going to do is, generally speaking,

  • I like to separate things out and be very modular and have good practices.

  • But when I first go through stuff, I just want to get it working.

  • I just want things to actually function correctly.

  • So we're going to write a monolithic style for this,

  • meaning that, basically, we're going to--

  • and actually, I'm going to just touch this.

  • Oops.

  • I can't spell.

  • I can never spell live.

  • And we're going to just touch run.pi.

  • I like calling it run.py.

  • It indicates what I'm going to do with it.

  • I'm going to run it.

  • And so that makes things a little bit easier on ourselves,

  • and it'll be a little bit easier to follow so

  • that we're not flipping between files all over the place.

  • However, I also don't exactly love typing in just Bash

  • and using nano for everything.

  • So we're going to use Visual Studio Code.

  • It's one of my favorite IDEs.

  • A completely different team was behind it.

  • I think mostly different team was behind it at Microsoft

  • then the team behind Visual Studio.

  • So if you have any kind of group with Visual Studio, one,

  • they've revamped it a lot, but two, it's very different from VSC.

  • And VSC is super customizable.

  • It has a lot of nice plugins.

  • COLTON OGDEN: Big fan of it myself.

  • NICK WONG: Yeah, which is nice.

  • COLTON OGDEN: We have a couple comments there.

  • Looks like WHIPSTREAK says, "What is the machine learning program's objective?"

  • NICK WONG: Ah.

  • OK, right.

  • I didn't actually it.

  • I mentioned it to Colton right beforehand and then didn't actually,

  • I think, say it.

  • So we're going to be going over--

  • basically, the end goal is to see if we can distinguish, given an image,

  • if it's a cartoon image or a real-life one.

  • And we're going to focus on images of people

  • because that makes it a little bit easier for us.

  • But that's going to be the end goal.

  • We're going to start with a little bit easier

  • of a task, which is distinguishing whether or not

  • an image is dark or light, because I think

  • that that conceptually is a little bit easier for us to understand.

  • There is a way for us to figure that out.

  • I think you could, maybe upon thinking about it for a little bit,

  • come up with a program that does it deterministically.

  • It takes the average--

  • images are just data.

  • And it maybe takes the average overall of the image's intensities and says,

  • if they're over some threshold, it's light,

  • and if they're under that threshold, it's dark.

  • And I think that that would be a very nice programmatic way to do it.

  • You need to adjust a mic?

  • DAN COFFEY: [INAUDIBLE]

  • NICK WONG: Ah, yeah.

  • My apologies.

  • DAN COFFEY: Sorry about that.

  • NICK WONG: Thank you.

  • No, no worries.

  • COLTON OGDEN: Dan Coffey, everybody.

  • Shout out to Dan Coffey.

  • NICK WONG: Dan Coffey makes all of this magic much more possible.

  • COLTON OGDEN: Oh, we got some--

  • NICK WONG: Awesome.

  • COLTON OGDEN: --other comments there too.

  • NICK WONG: Oh, yeah.

  • COLTON OGDEN: CHITSUTOTE-- "Hi there!

  • What's the topic today?

  • Machine learning?"

  • Yeah, machine learning.

  • Going to be talking about TensorFlow and cartoons, light and dark.

  • Oh, DAVIDJMALAN is in the chat.

  • NICK WONG: Wow.

  • That's awesome.

  • COLTON OGDEN: Everybody shout out to DAVIDJMALAN giving vc code link there.

  • NICK WONG: Yes.

  • COLTON OGDEN: He's coming in the clutch.

  • NICK WONG: Very helpful.

  • COLTON OGDEN: Lots of hellos for David there.

  • MAGGUS503.

  • NICK WONG: Nice.

  • Yeah, I love it.

  • Is that a unicorn?

  • COLTON OGDEN: I think so.

  • NICK WONG: Or maybe a pig.

  • COLTON OGDEN: I think it's a Brony maybe.

  • NICK WONG: Oh, a Brony.

  • Nice.

  • COLTON OGDEN: I can't tell.

  • It's hard to tell.

  • NICK WONG: It's something, and it's cool.

  • COLTON OGDEN: "If I login on a desktop," says TWITCHHELLOWORLD,

  • "then is it maybe an option to see the livestream video there?"

  • I would use Chrome.

  • Use Google Chrome because that's the only web browser that I really

  • tested this extensively on.

  • It should work and Firefox, should work and Safari,

  • should work on most major web browsers.

  • But the latest version of Chrome on Twitch.tv/CS50TV should work just fine,

  • so give that a try.

  • NICK WONG: And best of luck.

  • COLTON OGDEN: I'm not too familiar with any Twitch for desktop apps,

  • but presumably those would work as well.

  • But chrome is the only one that I'm personally familiar with.

  • NICK WONG: Makes sense, yeah.

  • Yeah, actually the same.

  • I usually don't watch too many Switch streams, actually.

  • COLTON OGDEN: Just star in them, really.

  • NICK WONG: Yeah.

  • I like to just host them.

  • All right, cool.

  • So we have this open.

  • If you're not super familiar with an IDE, don't worry.

  • On the left, you just have my file structure,

  • all the files that I have access to in this directory.

  • At the bottom, I have a console, so this is just Bash again.

  • You can just list things out and do normal Bash things.

  • And then, in the main bulk of the screen is

  • where we'll actually be writing code.

  • Hopefully, that's not too small for everyone.

  • It should be big enough that everyone can read it.

  • COLTON OGDEN: It's the size I usually program on.

  • NICK WONG: OK, perfect.

  • COLTON OGDEN: If anybody thinks it's too small,

  • definitely let us know in the chat.

  • NICK WONG: Yes.

  • And actually, I just moved my files off the screen--

  • give us a little bit more real estate.

  • So what we're going to do is we're going to setup all of our imports

  • and get this file built so that we can actually run it and execute

  • things that are going on.

  • I know just ahead of time I'm going to need os.

  • I'm going to use it for path joining and checking directories and things

  • because we're going to have pictures in a bunch of directories,

  • so I'm going to use os for that.

  • I also know that I'm probably going to want argparse for later design stuff.

  • argparse is just a really convenient library

  • for doing command line arguments so I have

  • to do, like, if the length of argc--

  • or if argc is greater than 2, than this, and then otherwise--

  • I just particularly like argparse.

  • It's built into Python.

  • And those are our two system libraries or the built-ins for Python.

  • We're also going to definitely need Keras.

  • But with Keras, you don't usually indirectly import Keras.

  • We don't actually need Keras on its own.

  • So we're actually going to say, "from keras.models import

  • Sequential," which is not exactly model.

  • It's actually, I think, a subclass of model.

  • And Sequential has a bunch of things preset for us

  • because we know that we're going to be building some form of model where

  • we just add layers to it one at a time.

  • And actually, we may not have known that.

  • I know that, and I'm going to tell you that,

  • that that is how it's going to work.

  • And so, yeah, Sequential model means that we're going

  • to just stack everything together.

  • It looks-- if you were to view it in real life,

  • it would literally be a stack of layers.

  • And each of those layers represents all sorts of different things,

  • and we'll talk about those when we get there.

  • COLTON OGDEN: That's like refinements possibly?

  • NICK WONG: Yeah, exactly.

  • They're ways of tinkering with the data as it comes through.

  • I like to think of it as, like, if you imagine them as a bunch of filters

  • and you're just pouring sand through it, and at the end

  • you get something meaningful.

  • COLTON OGDEN: A ruby or something, right?

  • NICK WONG: Hopefully sand you like.

  • Yeah, or a gem.

  • Cool.

  • So we're going to import Keras.

  • We're also going to import-- oops--

  • cv2.

  • And that is going to just let us open images and do things, basically all

  • the image manipulation through that.

  • And I don't think we necessarily need any other ones.

  • Oh, we actually do also want NumPy.

  • We're going to use it to reshape arrays and images and things like that.

  • And so NumPy actually usually gets imported as something usually.

  • The convention seems to be "import numpy as np."

  • And so a lot of tutorials, you'll see just "np."

  • blabbity-blah, and that assumes that we'll use np.

  • Cool.

  • And then, I like to follow the C paradigm of "if name--" oops.

  • Yeah, you're going to see me in mistype constantly today.

  • Then we're going to run stuff.

  • And I'm going to put "pass" there for now,

  • but we will generally put something there.

  • And I like to use this space up here to write out some functions

  • and help with debugging.

  • And then down here, we're going to just call those functions.

  • So very C script mashup is what we're seeing here.

  • However, I also like to use Control-C to exit out of functionality as we go,

  • so I'm going to wrap this and try an except KeyboardInterrupt,

  • and that'll let me just say, like, oh, the user deliberately aborted.

  • I like to put preface that.

  • And that'll just make it a little bit easier

  • for me to distinguish between me Control-Cing and actual errors that

  • happened while we were running.

  • So this is one of my favorite ways of setting up programs.

  • You're welcome to do your own.

  • Excuse me.

  • All right.

  • So before we actually even get to the keras.model's import Sequential,

  • I'm going to let us have access to command line arguments in case

  • we get to that later.

  • So I'm going to say, ap just stands for argparse, is argparse.

  • I was doing this without a editor earlier when I was testing things out,

  • and man, it's much harder.

  • This is great because I just have autocomplete,

  • and I don't have to remember the actual names of things.

  • COLTON OGDEN: IntelliSense.

  • NICK WONG: Right.

  • IntelliSense is-- what a beautiful, beautifully built thing.

  • And then I'm going to just say that this is ap.parse_args.

  • Sorry, I didn't really narrate any of that.

  • In between this line 4 and line 6, we can put it like ap.add_argument,

  • and those will be available to us on the command line.

  • So that's just a really convenient way of doing this.

  • And actually, even right now, there are certain commands

  • that are now available to us on the command line.

  • So we're going to look at that.

  • If I do run.py, it'll tell us that we're using TensorFlow

  • and nothing will happen.

  • That's what we expect.

  • But then, from here, I can actually do -h and we'll get a usage message,

  • and that's arg_parse doing its job.

  • And the reason that I do that before TensorFlow or before importing Keras

  • is because Keras is actually a very slow import,

  • so we're completely able to ignore that if someone

  • didn't pass a necessary command line argument

  • or they wanted to just get help.

  • It speeds things up just a little bit.

  • So maybe stylistically it looks a little strange,

  • but it's definitely not too problematic.

  • COLTON OGDEN: I think old WHIPSTREAK has a question for you.

  • NICK WONG: Oh, yes.

  • "In line 12 and line 13 necessary or-- sorry, 16 and 17."

  • I believe you're talking about this.

  • You can also use the actual code that's in those lines

  • because lines will change and we're a little bit delayed,

  • or you guys I guess are a little bit delayed from what we perceive.

  • So also telling us which lines of code you're talking about by name

  • COLTON OGDEN: I think he's asking or he or she is asking whether the

  • "if name is main" is necessary to run the script.

  • NICK WONG: Ah, OK.

  • I see.

  • No.

  • So if you're asking about the "if name equals main, then we do stuff,"

  • it's totally not necessary.

  • Python is a fully capable scripting language.

  • You can just run Python scripts as you go.

  • I just like to follow a C style paradigm where we have a main

  • and it actually gets called and things go.

  • This also means what also ends up happening

  • for us is that if I import this whole file as a module,

  • then that stuff is not going to get automatically run.

  • So I would have--

  • excuse me.

  • I would have access to the functions that I might build earlier on or later

  • in this file, but I'm not necessarily going

  • to have them all just run without me telling them to.

  • So it serves a couple of purposes.

  • I think it's very convenient.

  • Cool.

  • So now we're going to start building some stuff.

  • So you might say, OK, well, the first intuitive thing

  • is that we're going to want to be able to load data.

  • And that makes a lot of sense, except I don't

  • know what my data looks like, I don't know where we got it from,

  • there's no data that exists at the moment,

  • and I have no file structure for it.

  • So loading data is un-implementable at the moment.

  • And I like to walk through these and motivate different parts of the problem

  • as we go, and we've not motivated that we don't have any data.

  • To run a machine learning model without data is, I think, a nonsense idea.

  • So we're going to get some data.

  • You're going to get a nice view of just how messy everything is in my file

  • structure.

  • I'm going to use Finder to just create some files.

  • If you will trust me on faith, you can hopefully

  • believe that this is the same directory that we're looking at here

  • and it's the same directory we're in over here.

  • They're all the same.

  • And if you don't believe me, then I apologize.

  • Nothing I can do on that.

  • I'm going to call this images because our data type is images.

  • You can call it whatever you'd like.

  • And then, within img, I'm going to actually have three folders.

  • So the first two are train and test, or validate is another word

  • that you might use for that.

  • And this is pretty standard in any sort of machine

  • learning setup, at least for supervised learning

  • where we're trying to feed it answers and get it to learn some pattern,

  • is we're going to say, take my data and split it into two data sets.

  • I have a training data, and I have a testing data

  • set that validates whether or not that data's good.

  • And the reason that you would split something

  • is in machine learning models, they can sometimes over-learn things.

  • For example, humans do this all the time too,

  • where I might tell a child that after 1 comes 2, and after 2 comes 3,

  • and the child might then go and say, OK, well then

  • after 3 comes 4, and then 4 comes 5.

  • And that's very reasonable.

  • That's something that's totally what we might expect,

  • except I might have just fed you the first couple of numbers

  • and a Fibonacci sequence, in which case after 3 does not come 4 but rather 5,

  • and then 8 and then 13.

  • And so it's really important to us to be able to give the machine learning model

  • data that it doesn't incorporate into what it actually does,

  • and data that it can then just see how well it actually is doing.

  • Basically, it's called over-fitting, where a machine learning model learns

  • all of its training data by memorizing.

  • And so it just says, this is what I know.

  • This is everything that I'm going to do.

  • An analogy for that is a child saying, I know four colors

  • and I know them very well.

  • And then you hand them a new color, and they have no clue what it is.

  • And that's something that real humans could, in concept, do.

  • We tend to compensate for that, but it's definitely very possible,

  • and machines do it all the time.

  • So we split this into training data to give it something to actually learn on,

  • and then we see how well it's actually doing

  • kind of in the real world on data it's never seen before in the testing data.

  • And that data does not getting incorporated back

  • into how the machine actually learns.

  • It's just used, literally, to test it.

  • And then, the third directory that we're going to put in is a predict directory.

  • And I just like to structure things really cleanly.

  • And so this directory is just going to contain a bunch of images

  • that I want the machine to predict on.

  • I want the machine to tell me, what do those images actually have?

  • Are they light?

  • Are they dark?

  • Are they cartoon?

  • Are they real?

  • Things like that.

  • COLTON OGDEN: Cool.

  • NICK WONG: Cool.

  • So in train and test, we're going to want--

  • the way that Keras pulls images from a directory

  • as it uses the directory name as the label for that data.

  • So I'm going to create a new folder, and we're going to call that light.

  • And I'm going to create a new folder here and call that dark.

  • And so we'll want to replicate these over to test.

  • And so these two folders, basically what they do is inside of dark,

  • I'm going to put a bunch of dark images--

  • not dark in the metaphysical sense, but literally just darker images.

  • They're just blacker colored.

  • COLTON OGDEN: R-rated stream.

  • NICK WONG: Yeah.

  • We're going to keep everything very nice and PG I hope.

  • And so everything here will just be-- is it

  • literally just more black pixels or more light pixels?

  • And a lot of times, people are like, oh, I have to collect a bunch of data.

  • And I've actually been very much an advocate

  • of collecting just smarter data, not necessarily more.

  • So what I mean by that is, are you covering all the cases, for example?

  • And in our case, we don't really have that many.

  • There's just, like-- actually, I guess there's

  • a very large number of ways you can make a dark image versus a light image.

  • But what I mean by that is we can collect some things,

  • like maybe there's a just pure black screen.

  • Well, that makes sense.

  • And that's dark, for sure.

  • But what about if you had a black screen with some stars in it?

  • Well, then that might actually be slightly different.

  • Or for example, this picture.

  • I'm literally going to just drag and drop a bunch of pictures from Google.

  • This picture, I would classify it as dark.

  • It's pretty black.

  • There's a little candle in there.

  • And so a machine might not deterministically

  • be able to say, oh, well that's dark, but machine learning

  • can probably figure that out.

  • And so I'm going to copy a bunch of these images.

  • You'll notice a lot of them are not exactly all black.

  • Most of them have some sort of image in them.

  • And this one is pure black.

  • Love that.

  • This one is also quite dark, but not black

  • really, not as black as the other ones.

  • COLTON OGDEN: An interesting test case on that one, yeah.

  • NICK WONG: Right.

  • So I think that that's something that would be really worth putting

  • in front of our computer.

  • And I'm deliberately ignoring an image that I know is on the right

  • because it's a really good test example.

  • So I'm going to grab 10 of these.

  • Oh sorry, it's actually down here now.

  • So there's this image, and this image I would classify as dark,

  • but it has a very bright center to it.

  • So if we can get our machine learning model

  • to figure out whether or not that's light or dark,

  • that'd be really interesting for us.

  • So we're going to put that in our predict folder.

  • COLTON OGDEN: ZODI4KX says, "Hey guys, I'm Wesly."

  • ZODI4CKX.

  • NICK WONG: Oh.

  • COLTON OGDEN: Good to have you, Wesly.

  • Thanks for joining us.

  • And then BHAVIK_KNIGHT, who's a regular, is in the chat.

  • NICK WONG: Welcome back.

  • COLTON OGDEN: Says, "Hey guys.

  • Seems like I'm very late today."

  • No worries, BHAVIK, we're just getting started.

  • And the code is on GitHub.

  • I'll try to post that URL again.

  • NICK WONG: And at basically every major point, I will push all of the code

  • that we have so far to GitHub.

  • If you were around last week, they actually did a whole stream on GitHub

  • and how oh this very powerful tool--

  • COLTON OGDEN: Oh yeah, with Kareem.

  • Yeah, on Twitch and on YouTube we do have a Git and GitHub stream

  • with Kareem Zidane, Kareem the Dream.

  • NICK WONG: Kareem the Dream.

  • All right.

  • And then, I'm going to just switch over to our testing folder

  • and put some other dark images in there.

  • This will let us get a good gauge of just

  • whether or not we're accurately measuring things.

  • I think in the previous folder for training

  • I put about 10 images for testing data.

  • Oh, Black Mirror.

  • What a fitting, very appropriate thing to test on.

  • I put about 5.

  • There's not really a hard number on how many of anything you should have,

  • but I generally try to go for nice numbers that I think

  • are really cool, like 5, 10, and 42.

  • I think the more data you get, in general

  • you can do better, as long as your data brings up new patterns.

  • So then we're going to just bring in some more images.

  • Sorry, this part is not particularly interesting.

  • I'm not writing any code.

  • COLTON OGDEN: Hey, it's from scratch.

  • The whole thing is from scratch.

  • NICK WONG: It is from scratch.

  • So you could literally follow along click by click,

  • and you will technically do this.

  • COLTON OGDEN: And choose their own images even.

  • NICK WONG: Yeah, you could even pick your own images.

  • This is meant to be as generalizable as I hope possible.

  • And so we're just picking up very light colored--

  • oh, sorry.

  • I picked up a few too many.

  • I like to keep my data sets balanced.

  • So if I have 10 in dark, I'm going to try and keep 10 in light.

  • I am a little--

  • weirdly enough, you wouldn't believe this by looking at my desktop,

  • I do like to keep things organized when I code.

  • I think when I organize my life, not so much, but that's OK.

  • COLTON OGDEN: People have different definitions of organized.

  • It's OK.

  • NICK WONG: Yeah.

  • It's a flexible definition.

  • So yeah, I'm very organized when I write code.

  • I'm not necessarily so organized when I do the rest of my life.

  • Cool.

  • So then I'm going to rename these.

  • I just think it's a little bit cleaner to have this.

  • And then we're going to get some light testing data.

  • Oh.

  • Actually, I think this is just such a pretty picture.

  • COLTON OGDEN: Oh yeah, that's a nice picture.

  • NICK WONG: It's very cool.

  • Oh, I thought it was a unicorn.

  • I was super excited for a second.

  • I thought it was a unicorn.

  • COLTON OGDEN: Can explain again the difference between the test and the--

  • what was the other category?

  • NICK WONG: Yeah.

  • So we have test and training data, and there's also a predict.

  • COLTON OGDEN: Predict, that's--

  • test and predict, what's the difference between those two?

  • NICK WONG: All right.

  • So testing data is going to be said-- at every step

  • that we run our machine learning model, we'll see how it did on testing data.

  • That won't be incorporated into the model, but it will tell us,

  • like, how are we actually doing?

  • Because on its training data, it'll generally go up

  • as it starts to just memorize things.

  • So we want to see, on data it's never seen before,

  • is it actually getting better?

  • Are we actually seeing some improvement, or are we kind of at a loss here?

  • And so we'll see a little bit clearer what exactly I'm

  • talking about when we actually start to run some models.

  • But basically, the testing data shows us, how are we actually doing?

  • The training data we generally should improve at.

  • If you're staying at just this flat 50/50,

  • you're probably doing something wrong in your model.

  • In fact, there is an error I discovered earlier

  • that I think would be really good for me to show everyone

  • that I like to mess up.

  • Cool.

  • And then we have two images-- oh, and predict.

  • Sorry, I didn't answer that part.

  • The predicting data is, after we've trained our model

  • and we have it thinking and it thinks it knows what it's talking about,

  • then we're going to test it on data that it doesn't really

  • get thrown into any sort of framework.

  • This is really the use case of our model.

  • So if I wanted to use it for actively predicting things,

  • then we're going to say, hey, let's take a look at how we might do that.

  • So it might not be in the directory predict.

  • We might even pass it as a command line argument,

  • and it just goes and finds it.

  • But I like to keep it organized there for now

  • while we're going and developing.

  • COLTON OGDEN: Make sense.

  • NICK WONG: Cool.

  • So we have a bunch of images.

  • I guess for the sake of copyright, they're not mine.

  • I literally pulled them off of Google.

  • You watched me do that.

  • They are in no way my images, and I certainly do not take credit for them.

  • I only take credit for knowing how to Google--

  • or barely.

  • And so we just went found some random images.

  • And if you don't believe that those are the same thing,

  • here are all of my same directories that we just built.

  • I just use Finder because I think it's a slightly easier interface.

  • Cool.

  • So we have all these images now.

  • Let's see if we can go find them.

  • So when we say load data, that means that somehow, I

  • have to get the images that I just put everywhere

  • and return them to me as test and training data.

  • So we're going to go and get some of that data.

  • I'm probably going to be given an image directory or img_dir,

  • and I probably want to go and maybe enumerate

  • what's in that image directory.

  • I happen to know that the structure of this directory

  • is in a test and train split.

  • So what we might do is I might say, OK, then my test--

  • or we'll start with train.

  • "train_images."

  • And we'll say "paths."

  • And so I will probably switch these to abbreviated versions

  • because I think those are conventional.

  • So this is our training data, and these are the image paths that we get.

  • I'm going to use a list comprehension because it's convenient,

  • but if you are not comfortable with those,

  • then I think it's somewhat easy to flip this into a for loop--

  • hopefully not a problem.

  • If you have any questions on syntax and things, then feel free to let me know.

  • If you spot any bugs, maybe just chuckle to yourself

  • and you'll see me find it later when I try to run it.

  • Cool.

  • So what we want to do is we're going to do "os.path.join,"

  • which is a cool command.

  • It lets you just actually join paths together

  • without really having to worry about like slashes and things

  • and double slashes.

  • And I know that this is going to be in my training directory.

  • And then I'm going to just do "img_path."

  • And this is "for img_path in os."--

  • OK, this is possibly very poorly styled code.

  • It's going to be a very long line.

  • We could've split this up.

  • "in os.listdir(os.path.join)."

  • Oh man, that maybe was not the best planning.

  • That's OK.

  • COLTON OGDEN: The beauty of list comprehensions right there.

  • NICK WONG: Gotta love list comprehensions.

  • COLTON OGDEN: At least you can enter in the middle of it, though, to break up,

  • which is nice.

  • NICK WONG: That is true.

  • Well actually, that's a very good point.

  • We might do that.

  • So technically what we are doing is we're

  • taking the variable train_imgs_paths, we're

  • setting it list equal to this list.

  • I've realized I'm starting to skip words.

  • That's kind of cool.

  • And so what we're doing is we're saying join the paths of the image director we

  • were handed, the training directory which we know will exist--

  • we're hardcoding that for now--

  • and then the image path.

  • And the image path is given to us by iterating through and assigning

  • as we go through image path to each of the things or items in this list.

  • And this list is a list of the directory which is given by, again,

  • joining our image directory and train.

  • So what that basically ends up meaning is that we're going to go and collect

  • all the paths of things that we went and put into our image training directory.

  • And then-- you really should never copy and paste code. #CS51.

  • We're going to copy and paste code, and we're going to change some variables.

  • Gotta love that.

  • And we're going to say, OK, here's my test img_path, [INAUDIBLE] path,

  • and this is also in test.

  • Yeah, live coding will really test your ability to be confident when you do it.

  • All right.

  • So I generally like to iteratively debug things.

  • I don't like doing huge monolithic swaths of what's going on.

  • So we're going to just run this in our script.

  • So we're going to have "load_data," and we're

  • going to say img is the data directory.

  • We might pass that in as a command line argument later,

  • and that might trigger some bells.

  • But now we should be able to run this, and it

  • should print out to me both of the things that contain our image data.

  • So then, the thing that we have in here is you'll notice that there's this .DS

  • Store.

  • Actually, that causes some bugs sometimes.

  • It won't cause bugs in this case because Keras ignores it when

  • it's flowing things from directory.

  • But if you try to load individual images, that does mess with you.

  • So just keep that in the back, very far back of your mind.

  • It is something to keep in mind.

  • All right.

  • So now we're going to get into--

  • oops-- the meat of Keras.

  • And unfortunately, I don't remember the exact syntax off the top of my head,

  • but I'll tell you what we're going to go do.

  • And then we're going to do what every cs person should be very good at.

  • We're going to look at some documentation

  • and we're going to remember what that name is.

  • So I have these paths now.

  • I know exactly where all of the images are named.

  • I actually don't necessarily need each of their paths.

  • I really only need this os.path.join of here.

  • That's the actual directory.

  • We're going to use that later.

  • But I like to have the image paths for displaying which images we're actually

  • looking at later.

  • So it's just a future thought focused thing.

  • But we're going to do what Keras has beautifully built,

  • which is called a flow from directory.

  • And the reason that we're going to do that

  • is because we don't necessarily have a lot of data.

  • And our data is certainly incomplete.

  • There's plenty ways make another dark image.

  • But the reason that I think a flow from directory is brilliant

  • is because it allows us to take our small data set

  • and make it into a very large one.

  • Because an image that looks like this, if I

  • were to flip it around, to the computer, that's a different image.

  • But to us, for our purposes, it's identical.

  • It's still dark.

  • It didn't change anything about it.

  • So what's beautiful about Keras's flow from directory

  • is it randomly does that.

  • It'll start flipping images.

  • It might cut them in half.

  • It can flip them around in all sorts of directions.

  • It can shrink them.

  • It can resize them however we like.

  • And it allows us to create data from just our existing data set.

  • We don't have to go and find anymore.

  • COLTON OGDEN: That's a cool [INAUDIBLE].

  • NICK WONG: So it's a very cool feature.

  • COLTON OGDEN: BHAVIK_KNIGHT says, "We don't have a main, have we?"

  • NICK WONG: We kind of do.

  • So what Python actually ends up doing, in a nutshell,

  • is if the Python script is run as a script from the command line,

  • then it is going to have main.

  • But if it's run as a module, it's imported as a module,

  • then its name, this variable that we never really instantiated,

  • is not going to be main.

  • I actually don't know what it is if you run it as a module.

  • But yeah, no worries there.

  • Let's see.

  • "Am I supposed to log in?"

  • TWITCHHELLOWORLD says, "Am I supposed to log in to GitHub

  • to see the code at the URL?"

  • You don't need to log in it.

  • Is a public repository.

  • There is no code there currently.

  • After we load all of our data in, I'll push that to GitHub

  • so that everyone can see what's going on there.

  • Cool.

  • So we have Keras.

  • The documentation got updated I say recently.

  • I mean within the past five months or so,

  • which I'm really happy about because for a little bit there

  • it was a little confusing.

  • And I know that down at the bottom they do this whole thing talking

  • about image pre-processing.

  • And that's just a bunch of stuff that I happen to know by reading through this.

  • I know that I'm going to want to do some sort of actual generator.

  • But this is not the exact one I want.

  • This one is.

  • So this little bit of code that is in the Keras documentation--

  • I take no credit for writing it--

  • is basically what we're looking for.

  • This gives us the ability to do our data generators, which

  • are that whole applying random operations to our data set.

  • And then we do these actual generators, which are

  • going to be the flowing from directory.

  • It's going to pull all our data in to some sort of directory--

  • or from the directories that we listed, and it's going to do stuff with them.

  • We're going to do stuff with it.

  • So I'm going to literally copy and paste this code.

  • Not necessarily the best model, I think, for doing things, but--

  • COLTON OGDEN: This is true "from scratch" programming today.

  • NICK WONG: This really is "from scratch" programming.

  • I did actually walk through this with myself a little bit last night,

  • but I deleted all of that code so that you

  • would get a legitimate version of this and you

  • would see that there is a lot of effort and thought

  • and messiness that goes into this.

  • So we'll notice that in VSC, Image Data Generator is

  • highlighted because it doesn't exist.

  • We'll deal with that in a second.

  • It's just an import that we need.

  • I'm not a huge fan of their style.

  • I prefer this one.

  • But that's OK.

  • So in this train_datagen, we know that rescale basically--

  • sorry.

  • I keep saying "we know."

  • I apologize.

  • I mean just rescale is going to allow us to reshape the image's just size.

  • It's literally taking the image and just making it--

  • trimming it this way or that way.

  • Sheer range is not something we're going to actually deal with,

  • though it would be totally valid.

  • And that takes an image and just--

  • you can think of it as--

  • just cuts off parts of it.

  • And that's definitely a useful operation for us, but not necessarily

  • in this case.

  • I don't necessarily need zoom range.

  • I think there's one called vertical flip.

  • And so vertical and horizontal flips do exactly what they sound like.

  • They flip the data.

  • And we're going to use this for now.

  • So it allows you to rescale stuff to grayscale

  • for this, allows you to do all sorts of rescaling here as well as rescaling

  • the image size.

  • I should actually double check on what rescale does.

  • That's OK.

  • We'll look at it in a second.

  • But then, horizontal flipping and vertical flipping

  • allows us to create a bunch of new data-- a little bit of new data,

  • where you can do a horizontal flip, you can do a vertical flip,

  • or you can do both.

  • And those will create new images for the computer.

  • And then this test_datagen is going to just be our testing data.

  • So it's the same kind of thing, but for the test

  • data that's going to be used to validate our model.

  • And this one has only the rescaling factor.

  • Cool.

  • So then we get into the generator.

  • And so this is where you see this flow from directory.

  • It's one of my--

  • I think my favorite command that Keras has because it allows

  • you to do this extra data creation.

  • We are actually-- this happens to be for a binary classifier.

  • Cool.

  • So it's still going to be a binary classifier.

  • And you'll notice that there's this thing called target_size,

  • and then there's this data directory.

  • So we are going to change those two things.

  • So I'm going to just copy a little bit of what I had above.

  • And that's my training directory, is just

  • os.path.join this, which is our image directory that we

  • supplied via the argument, and the testing

  • directory, which is just called test.

  • And we now have two directories which hopefully have two folders within them

  • that have a bunch of images in them.

  • That's what Keras expects.

  • And then, this target size is the size of the image.

  • It assumes that they're of any dimensionality.

  • We rescaled them so that they are only two-dimensional images.

  • Color images are actually in three dimensions.

  • We have our width and our height, and then we have our colors.

  • We have R, G, and B. If you've been doing CS50 stuff,

  • you actually know exactly how that works probably very well.

  • And so I'm going to change our target size a little bit to be 500 by 500.

  • They don't have to be squares.

  • I just think it's kind of cool.

  • And it won't really matter for us.

  • A dark image that has been reshaped and distorted

  • but is still all the same pixels is identical to us for these purposes

  • as the image that's in its original size.

  • And then, our batch size--

  • and this actually has a few too few parameters.

  • I'm going to add a few.

  • Our batch size let's say is just a nice, clean, round number.

  • And then we're-- so batch size, meaning how many images does it pull for any

  • given step within the process?

  • And then, I don't think we need anything else from this except knowing

  • where exactly that came from.

  • And if you look on their documentation, they

  • do list all of the possible ways in which you could

  • manipulate the data before it comes in.

  • There are some really cool ones.

  • Fill mode I think is actually really nifty.

  • And that basically just inputs some data for you

  • if you need it to fill out images that aren't all the same size.

  • It allows you to--

  • instead of shrinking them or expanding them,

  • it lets you just fill in the extra space to the maximum image size.

  • This is the actual split between the two,

  • so it reserves some fraction of things for validation.

  • So if you just had training data, then you could split some of them

  • to be only validation data.

  • And there's some other stuff that goes into this.

  • But we have this ImageDataGenerator class.

  • And you'll notice that it seems to come from here.

  • That's the full module path.

  • So what we're going to do is I'm going to copy that,

  • and we're going to paste that here.

  • "from keras.preprocessing.igame import ImageDataGenerator."

  • And that's how I would convert those full pads to, hopefully,

  • what actually does its job.

  • We'll see if that's true.

  • That's the intuition, at least.

  • And you'll notice this actually went away because IntelliSense thinks

  • that I've at least done it correctly.

  • Hopefully, it's right.

  • And then the rest of this we're going to leave on its own.

  • So we have our validation-- or our train generator and our validation generator.

  • And those are going to be what our model actually ends up getting fitted to

  • and trained on.

  • Trained on and fitted to are actually pretty much synonyms in this sense.

  • So we have our train generator, and I'm also

  • going to return the validation generator.

  • Cool.

  • So we've loaded our data, and hopefully we end up actually getting data.

  • I'm also going to now excepts those arguments.

  • I'm going to call it train_data and test_data to be consistent with myself

  • from the way that I labeled my directories.

  • You're welcome to change those variable names.

  • I think that would count as some sort of consistency points.

  • And we're going to run this and make sure that nothing happens.

  • That's literally what we want.

  • We want it to tell us the images it found and do nothing else.

  • And that's exactly what it did.

  • And so I'm very glad that that's what it did,

  • and also that is exactly what we were expecting, where it found

  • 20 images belonging to two classes.

  • So you might intuit--

  • oh, that was our training data, which is good

  • because that was the one we ran first, so I'm glad it came out first.

  • And then it found 10 images also belonging to two classes.

  • We're still doing binary.

  • We've got two classes and 10 images.

  • We know that that was our testing data.

  • So everything seems to check out.

  • And hey, we're still using TensorFlow.

  • So I'm going to git at add.

  • I am in the habit of saying git add dot.

  • Not a great habit.

  • I'll git add img and I'll git add run.py.

  • COLTON OGDEN: I'm in the same bad habit as well.

  • NICK WONG: It's a terrible habit.

  • Someone in our CS61 class actually committed the entire subversion tree

  • of--

  • it was some Unix command.

  • They accidentally pulled an entire Unix subversion tree into it.

  • COLTON OGDEN: Probably not what they intended to do.

  • NICK WONG: Was certainly not what they intended to do.

  • They actually crashed our grading server--

  • COLTON OGDEN: Oh, nice.

  • NICK WONG: --which was really kind of cute.

  • Everyone was freaking out.

  • They were all like, oh no, I can't get my grades in.

  • I'm going to fail.

  • The professors are like, we'll just extend the deadline.

  • It's fine.

  • All right.

  • Cool.

  • So that has now been pitched to the GitHub,

  • so if you would like to grab it, you're welcome to pull that up.

  • Can anyone-- sorry.

  • Is that NEOKRES--

  • COLTON OGDEN: NEOKRES-- it's hard to pronounce that name.

  • I'm not sure.

  • NICK WONG: I do not know how to pronounce your name.

  • I'm so sorry.

  • But I'm going to call you NEO for now, and if that offends you,

  • please let me know.

  • So NEO asked, "Can anyone tell me, is there a link for C++ and JavaScript

  • Harvard video lectures/sections if there are any?"

  • I don't necessarily know from my other classes.

  • I know CS50 has some great videos on JavaScript.

  • I don't believe we have any on C++.

  • COLTON OGDEN: Yeah.

  • CS50 doesn't do any-- hasn't done any C++.

  • We have done some JavaScript videos in the past,

  • and Brian Yu's web course, I believe, goes into some JavaScript

  • if you want to go to that URL for some JavaScript stuff.

  • But yeah, no C++ yet.

  • We'll keep it in mind for maybe some future stream videos.

  • NICK WONG: Yeah.

  • I think that'd be very fun to go through.

  • C++ is a great language.

  • It does all sorts of cool things.

  • JavaScript I have a lot of beef with, but it is technically

  • a very useful language.

  • I think it does a lot of very strange things.

  • And actually, speaking of JavaScript, TensorFlow,

  • which is the back end of what we're doing here,

  • was recent-- well, recently implemented in JavaScript by Google.

  • And so they released that, and they have all sorts of really cool demonstrations

  • on using machine learning on the web browser.

  • So that's actually what my winter break will probably consist of,

  • and I'm super excited for.

  • COLTON OGDEN: That and the web assembly.

  • NICK WONG: Yes.

  • COLTON OGDEN: That's be huge.

  • NICK WONG: Web assembly I think is a huge just advancement of browser

  • technology.

  • I'm super excited for that too.

  • COLTON OGDEN: It feels like those two will go hand in hand.

  • NICK WONG: So many go.

  • I think so.

  • I think that that will actually vastly improve

  • the abilities of what we see browsers doing, which given

  • Spectre and Meltdown makes me a little bit nervous.

  • I don't know if I want my browser to be able to do a lot more than it already

  • can.

  • But I think it'll be cool as we work out kinks.

  • All right.

  • So we have built our loading data, and I'm

  • reasonably confident that we have loaded it correctly.

  • You'll notice that when I load the data here,

  • I actually don't use either of these two.

  • I'm realizing that in my head I had thought a little bit too far.

  • So these two are not necessarily useful.

  • I'll comment them out for us, and we might revisit them later.

  • Hopefully, we will revisit them later.

  • All right.

  • That's actually one of my favorite features of Visual Studio Code,

  • is being able to just collapse the code of a function.

  • And there are a lot of IDEs that do that.

  • I think a CS50 ID does not, though, and it's really annoying.

  • Cool.

  • So we're going to-- oops.

  • I was going to say we're going to fit the model, but we don't have a model.

  • My next thing would be, oh, OK, let's train our model on things,

  • but we don't actually have a model to train.

  • So let's build a model.

  • And this is, I think, one of the coolest parts of what we get to do.

  • So building a model.

  • I'm going to create a kind of "global" variable.

  • It is global, actually.

  • I don't know why I put quotes around that.

  • And I'm going to call that Sequential.

  • Or sorry, I'm going to call it m, and it is a Sequential model.

  • And that's going to be global because all of our functions

  • are going to access it.

  • And you might be thinking to yourself, oh, this

  • might fit really well in a class, and you'd be absolutely right.

  • This is actually really well implemented as a class that has certain methods.

  • But we're building it as a script, and here we are.

  • So we're going to use build_model to actually construct our model

  • and see what layers we can add to it and things like that.

  • So the first thing that I might want to do is m.add.

  • And that's going to be the way that we add layers to our model.

  • They're going to be sequential so that if we add a layer,

  • that's the last layer.

  • The last added layer is the last layer that we are going to go through.

  • All of our data we'll go through that layer at the end.

  • So the first layer that we're going through

  • is the first one that we add, and we're going to have that be-- oops, "Desne".

  • And Dense is just your standard, good old-fashioned neural network layer.

  • It basically has a bunch of nodes in it, and those nodes

  • are going to look like that's the number of nodes,

  • roughly, that it has inside of it.

  • And that's pretty much all you necessarily

  • need to know about what it does as far as technical specs go.

  • But we're going to add a bunch of parameters

  • to it because this is our first model.

  • Or sorry, our first layer.

  • And in our first layer, we need to tell it

  • the input size that it should expect.

  • So this sort of thing is always very annoying.

  • Dense requires a bunch of parameters.

  • I know input size is one of them, so we're going to give it the input size.

  • And the input size is going to be something like--

  • sorry, we specified 500, 500--

  • 3?

  • Question mark?

  • That 3 goes either there or on the other side.

  • I can't always remember.

  • It's close to that.

  • We're going to give it an activation.

  • And with binary classifiers, they use a sigmoid activation.

  • And you might be wondering, what is an activation?

  • Great question.

  • COLTON OGDEN: What's a sigmoid?

  • NICK WONG: What's a sigmoid, also a great question.

  • And so basically, what that means is activations

  • are the initializations of our layers.

  • So our layers, in reality, they're just a bunch of numbers.

  • It's really just a weighting of numbers.

  • And then those numbers are going to be modified as we go,

  • and then that weighting will change to hopefully give us some answer.

  • And so those numbers, they need an initial value.

  • You might default to 0, but that actually, based on the math behind it,

  • doesn't necessarily give us a whole lot to work with.

  • Sometimes that can actually cause you to get stuck in one particularly

  • version of your model.

  • And so this activation is an initialization to it.

  • It's a initial set of values.

  • And so sigmoid, you can think of it as, like, if you were to plot it out,

  • it has that classic curve structure to it.

  • And that's going to be what you actually would imagine

  • if you were to plot out all of the activation

  • or all of the individual nodes' individual values

  • from an arbitrary left to right.

  • It's a little bit hard to describe without the theory underneath,

  • but that's roughly what's going on.

  • We're setting a bunch of numbers equal to a pattern, roughly speaking.

  • And all that means for us is that a sigmoid response is

  • really good for binary classifiers because binary classifiers have

  • one of two answers.

  • It's either up or down.

  • It's either 1 or 0.

  • And so a sigmoid response, what that might look like,

  • if you know anything about signals processing or filtering

  • of images or data, or even music, then a sigmoid response

  • makes it really, really hard to land in the middle.

  • It's very difficult for me to be halfway between.

  • It's really easy for me to be 1 or 0.

  • And so that is basically what we're looking for.

  • That's the behavior we're looking for in this sort of model.

  • So we're going to stick with sigmoid activations for most things.

  • We will see a bug that I introduced to myself last night, where

  • if you choose the wrong activation, we can actually

  • get totally random behavior from our model, and that's kind of cool.

  • But we're going to stick with sigmoids for now.

  • COLTON OGDEN: METAL_EAGLE has a question for you

  • if you want to pick that one up.

  • NICK WONG: Oh, awesome.

  • So a METAL_EAGLE asked, "What is the problem of 'git add.'?

  • Should you have a habit to run git status beforehand

  • to make sure that running git add will not break

  • anything you do not want to commit?"

  • I have broken all sorts of things that I did not want to commit.

  • I have committed API keys.

  • I've committed my only personal keys to things,

  • depending on what you're building.

  • So yes, you should definitely make sure.

  • Running git status is another really good habit

  • that I do not have to make sure what you're actually committing.

  • "git add."

  • is problematic, I think, because let's say

  • that you have something in your directory

  • that is not part of your project.

  • It just happened to be in there because you were reusing some old code

  • or you wanted to look at something for an example.

  • And I've certainly done that.

  • You probably don't want to add that to your git project.

  • You don't want to put it in the git repository history.

  • Because of the way git works-- it's a feature, not a bug--

  • you have access to all the previous things.

  • So-- excuse me.

  • If you were to add some sort of API key, for example, then using "git add."

  • means that that is now permanently part of your git history.

  • GitHub has a really cool tool.

  • It's GitGuardians, actually.

  • They monitor perpetually for API keys, and they'll email you

  • and they'll block your repository for a little bit

  • until you get that sorted out.

  • And that's really cool.

  • COLTON OGDEN: That's a cool feature.

  • NICK WONG: They've emailed me a couple times.

  • I occasionally am just really trying to go, and I'm a little careless.

  • COLTON OGDEN: Do as Nick says, not as he does.

  • NICK WONG: Right, exactly.

  • I've picked up some bad habits along the way.

  • TWITCHHELLOWORLD says, "I couldn't see the Livestream in the app

  • so I missed the start.

  • Are you in CS50 sandbox or somewhere in TensorFlow?

  • NICK WONG: Yes.

  • So we're not-- sorry, we're not in the CS50 sandbox.

  • We are using TensorFlow.

  • We are in Visual Studio Code on my personal-- on my local environment.

  • So I'm just using this on my laptop, and I will be pushing code in chunks

  • as we go to GitHub.

  • So basically, whenever I get a chunk of code working,

  • then we'll actually push that to you guys,

  • and you're welcome to copy it down and see how it works on your own machine.

  • You will need TensorFlow.

  • You will need Keras.

  • You also need Pillow.

  • You will also need opencv.

  • Actually, I can make that really easy because Python has the freeze feature.

  • So I can freeze, which basically means take all of the current Python packages

  • you have and output them.

  • So if I just run "pip freeze," then it'll

  • output all the packages that I've downloaded and I use.

  • And there's a few more than you might think.

  • I only downloaded Keras, but Keras, for example, downloads three other--

  • or like three packages.

  • So this is actually a really good way to transfer what

  • Python things you're using.

  • And in this command, I just piped all of that output into a requirements.txt.

  • It's a very common feature on Python packages and modules and things.

  • And if you wanted to build from that, you

  • could do pip install -r of requirements.txt,

  • and that'll pull all those requirements in the correct versions,

  • or at least the version I'm using, to you.

  • So I'll also push that to-- oops--

  • to GitHub.

  • So we'll git add-- oops.

  • And git commit.

  • COLTON OGDEN: Thanks also to MKLOPPENBURG

  • for pitching in the React course that Jordan Hayashi taught,

  • which did cover JavaScript, and for mentioning

  • that we are using the VS code.

  • NICK WONG: Awesome.

  • Thank you.

  • All right.

  • So you should have a requirements.txt file in the repository now,

  • and that should help if you're trying to get everything set up.

  • Cool.

  • So we now want to continue adding layers to our model,

  • and we can keep things actually very simple.

  • So we add this dense layer.

  • I'm positive I'm missing a parameter.

  • I think it goes in the beginning there.

  • And we're going to figure that out when it doesn't compile.

  • But this dense parameter is just your input.

  • That's the first thing that's available to us

  • whenever our data goes through our model.

  • The next thing that we want to be able to add is--

  • our models are images.

  • Images in general are displayed as arrays of some sort of data.

  • And in our code, they're also arrays.

  • They're NumPy arrays, but they are still arrays.

  • And we're going to need to modify that a little bit.

  • We need to tinker around with what exactly we're seeing.

  • So actually, we're going to change this beginning

  • layer to be a convolutional layer.

  • Ah, that's what I was forgetting.

  • OK.

  • We're to use a convolutional layer at the top.

  • It still uses basically all the same parameters,

  • except it has an additional parameter which is its filter size.

  • And we're going to use three by three.

  • That might be a name parameter.

  • I'm not entirely sure.

  • And so convolution is a operation that you can do on signals data

  • from anything, from visual signals to auditory signals

  • to electronic ones in general.

  • And a convolution, mathematically speaking, is you take a signal

  • and you take a response, and you loop the response

  • and pass it over the signal.

  • And if that sounds complex and difficult to understand, it is.

  • I don't necessarily fully understand it either.

  • COLTON OGDEN: Is it like feedback almost?

  • NICK WONG: Yeah, exactly.

  • It's like your response to what's going on in the signal or the way

  • that two signals might interact.

  • I would think of it maybe like that.

  • COLTON OGDEN: That makes sense.

  • NICK WONG: Yeah.

  • COLTON OGDEN: Almost like the messing with the image data that we had too.

  • Feels similar.

  • NICK WONG: Right.

  • Yeah, exactly.

  • It's just a way for us to manipulate the image data.

  • What convolution will do for us is it--

  • I believe in this case, if I'm not losing my marbles,

  • it is convolving the image with this three by three array.

  • And what that's going to do for us is make our data little bit more--

  • not centralized, because that's actually what our next layer is going to do--

  • but it makes our data more readable and friendly to the machine.

  • So it gives it some features that make our machine a little bit happier.

  • COLTON OGDEN: Massaging the data.

  • NICK WONG: Exactly.

  • That's exactly what it does.

  • All right.

  • COLTON OGDEN: BHAVIK_KNIGHT says, "Why not pip3?"

  • Why are you using pip instead of pip3?

  • NICK WONG: Ah.

  • So actually, if you look at which pip or pip --version, this actually is pip3.

  • I happen to use pip because it's alias to the same command.

  • So pip3 would also work.

  • There's not necessarily a difference between the two.

  • However, there is a difference if you're on a Mac

  • and you use the installed version of pip versus the system installed

  • version of pip.

  • And that is one of the most annoying bugs you will ever

  • encounter, almost guaranteed.

  • COLTON OGDEN: CYLVTWITCH says, "Will the videos be on YouTube

  • so I can watch them later?"

  • Yes, they will be.

  • If you just go to YouTube and type CS50 on Twitch,

  • you should be able to see our playlist.

  • And it's also see CS50's YouTube channel where David posts his lectures

  • and a bunch of other course videos.

  • NICK WONG: Awesome.

  • COLTON OGDEN: GHASSEN says, "Hello, are there any interns open

  • in Silicon Valley?"

  • I would Google and see.

  • I'm sure there are internships available,

  • but none that necessarily CS50 is directly affiliated with

  • or knows of at this moment.

  • But definitely do some Googling and see, and I'm

  • sure there are links on those websites that you could reach out

  • to get some more information on them.

  • NICK WONG: And BHAVIK_KNIGHT actually pointed out

  • that convolution is output going into the input again.

  • That is actually-- yeah, exactly.

  • That is what this convolution does.

  • So it's not the general definition of convolution,

  • which is what I was I guess stumbling through,

  • but it is the convolution that happens here.

  • It convolves the image with itself.

  • Thank you.

  • That was something that was very important

  • that I could not for the life of me remember how to talk about.

  • Cool.

  • So now we're also--

  • COLTON OGDEN: You could say it's a little convoluted.

  • NICK WONG: It's a little convoluted.

  • That's awesome.

  • COLTON OGDEN: Zing.

  • NICK WONG: Man, I love puns.

  • Puns are the light of my life.

  • All right.

  • So we can also add this MaxPooling2D layer, which sounds very--

  • sorry.

  • I was reading the comments, and it's kind of funny.

  • Sometimes you guys have just these entertaining comments OK.

  • So we're adding a MaxPooling2D layer, and that sounds kind of ridiculous,

  • but it's not.

  • Max pooling just means that we're looking at the maximum of the values

  • in a given range or frame.

  • We're pooling all those together, and that's

  • going to be representative of that actual area.

  • So you could imagine it as taking a 5 by 5 array, and the maximum of that

  • is now representative of all of that five by five array.

  • So it could be thought of as like feature--

  • man, the word is escaping me.

  • Combining features or feature combination.

  • And it basically just means that some of the data in our image will get ignored,

  • but we're assuming that the maximum part of our data is the important part.

  • There's also min pooling and there's average pooling.

  • There's a bunch of other ways that you can pool data.

  • There's all sorts of things you can do with it.

  • And then, 2D means two dimensional.

  • You're using two-dimensional data.

  • So the parameters to MaxPooling2D are something like filter_size.

  • And this is just that, like how much of a--

  • like what's the size of the pool we're using.

  • Activation is definitely one of those parameters, I think.

  • OK.

  • I say "definitely," and then I said, "I think."

  • That's self-contradicting.

  • I don't know if it takes anything else.

  • We'll find out.

  • This is generally something where I don't necessarily

  • memorize it because I can just go look at documentation.

  • And then, we're still working with an image which is not super clean.

  • Or sorry, it's not super easy for our machine to then put out a 1 or a 0.

  • So what we might do is something like this--

  • model that add of a Flatten layer.

  • And that pretty much does exactly as its name.

  • It just flattens the image into a one-dimensional stream.

  • And so if you did CS50's resize recover and you played around

  • with images a lot, you could think of them as all having been flattened.

  • So they're no longer represented in two-dimensional arrays.

  • They're represented like a stream of data.

  • And there is some breakpoint that's logical, but to a computer,

  • those breakpoints don't matter.

  • They're kind of irrelevant.

  • So we can actually then flatten our images.

  • And now that we've done this max pooling and convolution of the image,

  • we can flatten everything and then just feed that into standard dense neural

  • network models.

  • And that should be pretty reasonable for us.

  • And we're actually going to do exactly that.

  • We're going to add a dense layer to do some kind of minor computation.

  • I like to use powers of two.

  • That is certainly not a--

  • it's not a requirement, but I like to use them.

  • And that's just the number of nodes within our layer, roughly speaking.

  • It's not exactly correct, but it's close enough.

  • And then we're going to add our final layer.

  • And this one has a specific set of parameters.

  • It has to have the number of nodes within it be actually one.

  • And the reason for that is we're using a binary classifier,

  • so we want it to output only using one node or one number.

  • We don't want it to output a bunch of nodes because then each of those nodes

  • might be weighted to something else.

  • They might represent non-existent or imaginary classes.

  • So we don't want that.

  • Cool.

  • And then we're going to give that one a sigmoid activation as well.

  • So now, running through this in concept would work,

  • but you'll notice that in IntelliSense, it actually

  • highlights all of these things or all of the layers

  • that we just added because they haven't been imported yet.

  • So what we're going to do is import them.

  • And I believe it's "from keras.layers," hopefully, "import Dense,"

  • can't remember the name, "Conv2D," and we'll say MaxPooling.

  • Oh, I love IntelliSense.

  • It pulls things that I don't even have up here yet.

  • "Flatten."

  • And I think that's it.

  • So those are all the layers that we tried to use.

  • Oops, that's not what I want.

  • And now that we have everything going--

  • oh, I killed my virtual environment.

  • That's a bummer.

  • We should be able to get this model built.

  • Now, the problem is that we haven't necessarily linked them yet.

  • But I also think I'm missing a parameter here.

  • I think I might have not named it correctly.

  • But that's OK.

  • We'll be told that very shortly by our Python run.py--

  • well, if we call build_model.

  • So build_model is a little bit strange, and it

  • deviates from the standard paradigm that I

  • like to use in that it has a side effect and doesn't actually return anything

  • to us.

  • So build_model is kind of like print, where

  • you just run it and things happen, and you don't necessarily

  • know what's going on underneath.

  • I'm not a huge fan of that.

  • I prefer a functional style model where you get a return given some input,

  • but that's OK.

  • We're using this more as a weird hybrid between scripting and modular building.

  • So it's kind of halfway.

  • So we can go ahead and run this, and it should

  • run without problems, minus the fact that I think I have-- oh OK.

  • Well, I also can't spell.

  • No surprise there.

  • Now that I spelled that correctly, this should run mostly correct.

  • Yes.

  • input_size size is not a keyword.

  • We're going to go and look at some documentation.

  • You would think I would know that part, though.

  • I believe it is.

  • Well, OK.

  • I'm obviously wrong.

  • It's not input_size.

  • But I know there is something in there.

  • Let's look at-- oh, right.

  • Sorry.

  • Convolutional layers.

  • That's what I was trying to look at.

  • Conv1D.

  • Conv2D.

  • There's all these things.

  • So we know that we need a filter.

  • This is the number of filters that the convolution's going to have.

  • The kernel size.

  • So those are two positional arguments.

  • And I actually might have-- yeah, passed those correctly.

  • Cool.

  • I just forgot how to name it input or tell it what kind of input it takes.

  • Ah, input_shape.

  • input_shape.

  • There we go.

  • So that's correct.

  • So what this actually does--

  • and you might be wondering-- oh, it's probably filter shape as well.

  • Or size filter.

  • There's all sorts of variations on these.

  • So what this does is-- we didn't actually

  • have to pass an input shape to all of the other filters

  • because they can intuit them from the previous one

  • or from the previous layer that was being used.

  • And so that is a cool feature, and it's certainly one of the benefits of Keras,

  • is you don't necessarily need to keep track

  • of all of the individual things that are going on in between.

  • You can just stick to adding layers in a sequential order.

  • All right.

  • I apparently don't know the name of that.

  • One of these days.

  • Let's see.

  • Where do they put their--

  • they might have called that Merge, or Merge is their own layer.

  • I haven't seen that one before.

  • Add.

  • Subtract.

  • Concatenate.

  • Oh, well we can also use MaxPooling.

  • And this will bring up--

  • wow.

  • Or it won't.

  • Oh wait, there we go.

  • Cool.

  • MaxPooling2D.

  • It seems to take a second.

  • Ah, OK.

  • Here.

  • We'll look at that.

  • MaxPooling2D takes not-- oh, it's pool_size.

  • That makes sense.

  • That's kind of intuitive.

  • All right.

  • Cool.

  • So we have corrected our syntax, and I believe that fixes

  • all the rest of our syntax errors.

  • And that should be able to--

  • or not.

  • Did I misspell that?

  • Yeah.

  • Doo-doo-doo.

  • Ah, it might not take an activation.

  • You're really seeing this live.

  • You quite actually--

  • COLTON OGDEN: It's part of the fun, part of the charm.

  • NICK WONG: It is part of the actual fun to this.

  • There we go.

  • It's literally just tinkering around with what exactly I remember and what I

  • don't.

  • Cool.

  • So we have now added correctly a convolutional two-dimensional layer,

  • a max pooling two-dimensional layer, a flattening layer, and then

  • two dense layers, the last of which is our output one.

  • And that's going to return to us a number that is either 0 or 1.

  • Well, kind of.

  • It will return to us a number that ranges from 0 to 1.

  • It won't necessarily be 0.

  • COLTON OGDEN: We have a couple of comments

  • here if you want to read some of these.

  • NICK WONG: Cool, yeah.

  • COLTON OGDEN: OK.

  • So GHASSEN says, "Here in Tunisia we study many, many technologies,

  • and access to Silicon Valley is restricted.

  • I want a guaranteed choice because that would be very useful."

  • I'm definitely not an expert on the programs that Harvard offers.

  • I would probably just say Google for "international transfer programs."

  • I know we definitely have Harvard students from Tunisia who have

  • come here, but I can't speak on that.

  • I'm not an authority on that. and I don't know anything

  • about anything related to Silicon Valley directly

  • because just familiar with what goes on here in Cambridge

  • at Harvard University, I'm afraid.

  • But definitely do dig around a little bit,

  • see what the research or the transfer programs look like.

  • Look up Silicon Valley companies.

  • See if they have any programs related to international work.

  • I know that some companies probably do have something related to that,

  • although I'm not going to say--

  • I can't speak on behalf of international work related laws, so I'm not sure.

  • But again, I would defer to Google on that, probably.

  • So afraid I can't give any better information than that,

  • but I'm not an expert on that.

  • 42FORCE says, "Thanks for doing this."

  • NICK WONG: Oh, thank you.

  • Yeah, thank you for being here and watching.

  • We appreciate it.

  • "What structure is a Sequential?"

  • So Sequential is actually its own class.

  • It's given to us by Keras's model's submodule.

  • And it does work like a set, and very nice intuition on using--

  • it has the .add method.

  • But that actually is unrelated to the facts of how a set works.

  • I believe they do use an underlying structure.

  • It's not a set, but I believe there is an underlying list

  • to keep track of which layers you added when.

  • But that add method has a bunch of other stuff

  • that goes on underneath to verify your layer

  • and translate it accordingly TensorFlow-wise.

  • So very nice intuition, but it actually is Keras specific.

  • And thank you, NUWANDA, for pointing out I did miss an s in Keras.

  • Same thing at BELLA_KIRS.

  • I did miss the s.

  • I did miss that.

  • COLTON OGDEN: Part of the fun of the live coding.

  • NICK WONG: It is one of the great parts.

  • COLTON OGDEN: I feel 100%.

  • NICK WONG: Yeah.

  • Awesome.

  • Are there any other comments that were Going on here

  • COLTON OGDEN: I think we grabbed all the questions there.

  • NICK WONG: Awesome.

  • All right.

  • So I just pushed the build_model part, so you should have

  • access to building the actual model.

  • We are going to revisit the way this model works because it probably

  • won't work super well.

  • It's not very complex, but it does work in concept.

  • So now we get to the exciting part of fitting our model to our data.

  • We have a model.

  • It's been built. We now just have to actually fit it or train.

  • So I'm actually going to use the word train because that's

  • what I called our data sub directories.

  • But we're also going to use just this--

  • it's actually basically a one-liner.

  • It doesn't really return anything to us.

  • It just takes the model and says, you know what?

  • Now here's a bunch of data.

  • Tell us what happens when we shove it through you.

  • And we give you the answers along the way.

  • So again, syntactically this is going to be kind of fun.

  • Go figure.

  • Even after years of doing CS, and years of working with the same modules too,

  • I'm still looking up syntax occasionally.

  • It happens.

  • It's OK.

  • So we're going to-- in our train model, we're

  • going to want to have both the training data and the testing data.

  • We're going to use the testing data at each step to validate what's going on.

  • The training data is going to do exactly that.

  • It's going help us train our model.

  • So we're going to use m.fit_generator.

  • And that's the actual parameter.

  • It's very similar to m.fit, except I'm not supplying manual data,

  • I'm supplying a generator-- it's very similar in concept to Python

  • generators--

  • that just generates data as we go.

  • It'll never run out, which is really cool.

  • And I remember approximately zero of the syntax for this,

  • or at least the names of them.

  • They do change from time to time.

  • This actually used to be nb_epochs.

  • And epochs or epochs is just the number of times

  • that we run through the training set of steps.

  • So it's how many times do we actually just try to train the model.

  • And so what this means is that, if we run like 10 epochs--

  • epochs?

  • Whatever it is.

  • You'll know what I mean.

  • If it sounds funny, that's OK.

  • You can make fun of me.

  • So as we go through for 10, on the first time, the model's just totally fresh

  • and not trained on anything.

  • But on the second, third, and fourth times, and every time after,

  • it's going to have picked up knowledge from all the previous times.

  • So it's basically it practicing.

  • I would think of it as just practice rounds for the model.

  • So we're going to start with just 10.

  • And then we're going to say batch_size equals 20--

  • COLTON OGDEN: Does it save these iterations

  • of data collection, data gathering?

  • NICK WONG: No, it does not.

  • So it doesn't really save them anywhere on file.

  • So when you get to the end, it actually just tosses the model,

  • unless you save it yourself.

  • So we're actually not going to save them until later when we actually have

  • some models that we are satisfied with.

  • But no, it doesn't actually save them on each epoch.

  • You can-- Keras has these beautifully built structures for things.

  • They have these callbacks which, if you're familiar

  • JavaScript callbacks-- come from the same terminology.

  • But what that lets you do is at the end of every epoch,

  • you could actually save that version of the model.

  • And if you're running like 1,000 epochs and you have just these enormous batch

  • sizes and you really want to make sure that at every stage

  • you could tell us something about the model at that stage,

  • then you would want to do that.

  • And I don't think I'll probably cover that here.

  • It is a little bit more advanced and a little bit more--

  • I don't necessarily know it off the top of my head.

  • But super cool, and Keras has all sorts of powerful tools.

  • And as I said earlier at the very beginning,

  • they actually just updated some of the documentation,

  • and they have all sorts of cool things that I haven't even seen yet.

  • So I'm really excited about that.

  • WHIPSTREAK also just asked, "Are we only using Python?"

  • Yes.

  • I happen to particularly love Python, although you could also code this

  • in C++.

  • I believe they have libraries for it.

  • I don't know if Java does.

  • I don't believe it does.

  • But C++ would also probably work.

  • It would be a little bit faster, probably.

  • But we're going to stick to the Python implementation of Keras and TensorFlow.

  • COLTON OGDEN: It's clear that this is quite a rabbit hole to go into.

  • NICK WONG: Yeah, it really just fans out.

  • There's all sorts of things you could do.

  • There's just so many ways in which you can change what's going on here.

  • We're going to switch back to get on our fit parameters.

  • What do we actually need as far as that goes?

  • COLTON OGDEN: BLUEBOOGER asks, "Could you use R?"

  • And his name is actually blue in our chat

  • too, which is all the more appropriate.

  • NICK WONG: That's awesome.

  • Yes, I believe R has a library.

  • Actually, BHAVIK_KNIGHT just answered that.

  • I believe there is a library for it.

  • I am not super familiar with it off the top of my head.

  • COLTON OGDEN: I wonder if TensorFlow-- does TensorFlow have--

  • NICK WONG: Yeah, I believe TensorFlow for R.

  • COLTON OGDEN: Interesting.

  • NICK WONG: Oh yeah, look at that.

  • COLTON OGDEN: Oh, that's cool.

  • Yeah, TensorFlow does have an R binding, which is nice.

  • NICK WONG: Yeah, look at that.

  • COLTON OGDEN: Cool.

  • NICK WONG: Yeah, so TensorFlow is really beautifully well done.

  • It's a Google project, and I'm a little bit biased,

  • but I think they do a really good job.

  • It's very, very cool what they've been able to put together for us.

  • Let's see if I can find where fit generator is.

  • It might be under the Sequential model.

  • Oh, actually they throw it in image pre-processing.

  • I do remember that.

  • So let's go see what they actually do.

  • We're going to just copy all these parameters because we're original.

  • And I'm going to paste them all here.

  • They did a 50--

  • COLTON OGDEN: Copying and pasting is a real developer technique, though.

  • NICK WONG: It really is.

  • It is a very, very valid developer technique.

  • COLTON OGDEN: Stack overflow. #StackOverflow.

  • NICK WONG: Exactly.

  • And so we're going to just change these to our variable names.

  • Steps per epoch.

  • 2,000 is quite a few.

  • Generally speaking, you would want to actually run many, many steps on these,

  • but our data's pretty small, and we can actually get really good results

  • with pretty low numbers of steps.

  • COLTON OGDEN: And oftentimes, big companies

  • are using massive server farms for this, aren't they?

  • NICK WONG: Yeah, they use enormous--

  • COLTON OGDEN: Not just like a MacBook--

  • NICK WONG: Just computational power.

  • COLTON OGDEN: --or MacBook Pro.

  • NICK WONG: Yeah.

  • Yeah, we're running this on my local laptop.

  • This is my just personal student laptop.

  • But yeah, a lot of big companies are running

  • these on just enormous clusters of paralleled networks and things.

  • They have all sorts of network capable computational sharing.

  • They have some just supercomputers running it.

  • They have some really just-- they have a lot of money going into this,

  • and there's a lot of just people really trying

  • to get these things to work for all sorts of crazy things.

  • I know there's a project at Lifts that does something really

  • cool with simulating a bunch of just real life data in real time,

  • and they just test random machine learning models on it.

  • Or not random, but their own personal projects on it.

  • And that's super cool.

  • I know Google has, I'm sure, just millions of products

  • that we don't know about.

  • There's all sorts of really cool things going on.

  • All right.

  • So we have done this train model, which does call the actual fit generator

  • method.

  • And what that means is I'm going to then call that in my--

  • the actual scripted part of this and call train_data and test_data.

  • And in concept, this should actually run.

  • This should get us pretty close.

  • Oh.

  • OK, I tried to git push.

  • I made the mistake of going up and pressing Enter without even looking.

  • That was great.

  • Gotta love the habit.

  • COLTON OGDEN: It's a hard habit to break.

  • NICK WONG: It really is.

  • Ah.

  • You must compile your model before using it.

  • That's entirely reasonable, and I completely forgot about it.

  • Technically, build_model is incomplete.

  • I forgot to compile the model.

  • So compiling it means you're telling TensorFlow--

  • oh man, I cannot type--

  • that you are done adding layers to the model.

  • You want it to be all put together, assembled

  • into tensors in this beautiful pipeline.

  • And so that's what we basically end up having going on there,

  • and that's all we're going to run here.

  • It also takes a couple of arguments.

  • For example, it takes an optimizer, which is, mathematically, a tool

  • that makes things a little bit better as far as how each layer

  • or how everything takes feedback.

  • So when it's told what the actual answers were,

  • it optimizes for making sure that we don't interpret that incorrectly.

  • I would think of it like that.

  • That's not exactly what it does, but it is the right idea.

  • I'm going to use SGD, which is the--

  • man, I am forgetting names left and right.

  • That is the name for something.

  • Actually, we can switch this to Adam, which is-- again,

  • I don't remember the acronym, but it is a version of an optimizer.

  • They just these slightly-- well, some actually majorly different things.

  • And you'll notice some different results, depending on what you're doing

  • and which optimizer you use.

  • We're going to also add metrics.

  • And this takes a list.

  • We're going to do binary_crossentropy, which sounds really fancy,

  • and great party term if you're at a party of nerds.

  • It's a really fun term to throw out there.

  • There's also categorical_crossentropy.

  • I think that one's even cooler.

  • It's even longer.

  • And so what that basically just means is every time we do--

  • I got distracted by a comment.

  • That's really funny.

  • COLTON OGDEN: Let me go back up to WHIPSTREAKS there.

  • He says--

  • NICK WONG: Right, sorry.

  • Ah, OK.

  • So, "I was away, only came back now.

  • Do summary."

  • Yes.

  • I don't know exactly where you left, but we are currently trying

  • to get our model learning on data.

  • And so what that means for us is that we want

  • to be able to actually run this train of the model or fit the model to the data.

  • And what we-- or what I forgot about in the interim

  • was that I actually have to compile the model, which

  • means giving it an optimizer and what metrics to use for how well it's doing.

  • And that's, I think, what will catch you up.

  • COLTON OGDEN: And the video will be on Twitch as a VOD

  • to look at later and also on YouTube tonight.

  • You'll be able to look back on it if you missed anything.

  • NICK WONG: Awesome.

  • And then-- ah, we're checking in on who BHAVIK_KNIGHT is.

  • That's kind of cool.

  • COLTON OGDEN: Part of the Handmade Hero stream.

  • I've checked that out a couple of times, actually.

  • NICK WONG: That's sweet.

  • COLTON OGDEN: It's nice.

  • It's a guy who makes a complete game from scratch 100% in C, which

  • is pretty cool.

  • NICK WONG: Wow, that's crazy.

  • COLTON OGDEN: And he's got like 1,000 videos up right now.

  • It's pretty ridiculous.

  • NICK WONG: Damn.

  • COLTON OGDEN: But I guess BHAVIK_KNIGHT is not the same person that's

  • on the Handmade Hero stream.

  • NICK WONG: Wow, that's crazy.

  • COLTON OGDEN: Just happens to be shared by-- another person

  • happens to have that handle.

  • NICK WONG: I didn't even know that was possible.

  • COLTON OGDEN: And then GHASSEN says, "I recommend

  • to have a live session of parallel programming.

  • It's just a hard thing to understand."

  • NICK WONG: Yeah, we can do that.

  • COLTON OGDEN: We could take a look into that at some point, yeah.

  • NICK WONG: Actually, it'll come up a little bit here too.

  • COLTON OGDEN: Yeah, OK.

  • NICK WONG: We do you use it a little bit.

  • But yeah, that'd actually be a really fun one.

  • We could even parallel code it and then see if it runs that way,

  • motivate it that way.

  • And actually, that's--

  • I can't read that name, but--

  • COLTON OGDEN: It's BLUEBOOGER.

  • NICK WONG: Oh, BLUEBOOGER.

  • Ah, my fave.

  • All right, so BLUEBOOGER then pointed out that we are parallel programming

  • right now.

  • COLTON OGDEN: Except I'm not actually programming.

  • NICK WONG: This is true.

  • COLTON OGDEN: I'm a bystander in this right now.

  • NICK WONG: You're a waiting process.

  • COLTON OGDEN: I am.

  • NICK WONG: You are the parallel equivalent of a waiting process.

  • COLTON OGDEN: I am.

  • NICK WONG: That's awesome.

  • COLTON OGDEN: An infinitely waiting process.

  • NICK WONG: Yeah.

  • The "while true" process.

  • COLTON OGDEN: A demon thread.

  • NICK WONG: Yeah.

  • Oh, man.

  • I think if you don't appreciate puns and jokes in life,

  • then it's just kind of sad.

  • It's just really depressing.

  • All right.

  • So we have an optimizer.

  • We have metrics.

  • There is a third thing that I am missing,

  • and I can't remember what that is.

  • So we're going to go find it in documentation.

  • COLTON OGDEN: This is part of the live coding thrill, man.

  • NICK WONG: It really is.

  • We're looking left and right at documentation.

  • COLTON OGDEN: It's a good documentation consulting tutorial.

  • NICK WONG: It really is.

  • Keras, your documentation, it is a lot better now.

  • I really appreciate what you guys did on that.

  • Great job.

  • It's making my life a lot easier.

  • It used to be, I think, a little bit of a mess,

  • but there have been a lot of clean cleanups on it.

  • Things have been updated.

  • It's great.

  • I'm a huge fan of up to date documentation.

  • If you can read documentation and be like, wow, things work--

  • oh right, "loss."

  • There we go.

  • COLTON OGDEN: 42FORCE says, "What you said,

  • Nick, is true about puns and such."

  • NICK WONG: I really appreciate that.

  • Yes.

  • That's one of my life philosophies.

  • Especially when you're sitting up late at night

  • and just trying not to stay up until the sun comes back up, puns keep you going.

  • All right.

  • So I actually misspoke a little bit earlier.

  • I said the binary_crossentropy was a way of metricizing what's going on.

  • Apparently, I'm losing my marbles.

  • That is actually a categorization of loss.

  • It is, in a way, a way of looking at how things are going wrong,

  • but loss is basically--

  • there's a whole lot of theory behind that,

  • but you could think of it as the way that--

  • kind of the loss in accuracy that the model had.

  • So on that run, how poorly was it doing?

  • The higher it is, the worst it goes.

  • The metrics are what we're actually using

  • to display in a human readable format.

  • Loss doesn't necessarily mean anything to me.

  • It could be, like, a 42.

  • That might be meaningful, maybe.

  • They're more useful as a relative metric.

  • But metrics on their own, like accuracy, that's super meaningful to me.

  • It means what it says.

  • It's an accuracy, usually given as a percentage.

  • I think mine in this case will be displayed not as a percentage but as

  • a proportion of 1.

  • So you'll get like 0.55.

  • Not great.

  • 0.5 means we're literally guessing.

  • 0.45 means we're doing worse than guessing, so we're anti-guessing.

  • We got it right, but then flipped our logic.

  • And then 1 is ideal.

  • "Ideal" in concept.

  • Cool.

  • COLTON OGDEN: Astly-- NUWANDA3333, who's Astley [INAUDIBLE],, says, "Agreed.

  • Hope you do more sessions, Nick."

  • NICK WONG: Awesome.

  • Thank you.

  • I really appreciate it.

  • COLTON OGDEN: Tune in next Friday--

  • NICK WONG: Yeah, we'll be here next Friday.

  • COLTON OGDEN: --for a LInux command tutorial.

  • NICK WONG: I very much enjoy that.

  • I appreciate it a lot.

  • And now we're going to see if we can get this to run.

  • That's one of my favorite--

  • I say that to myself when I'm coding things.

  • Hey look, it worked.

  • I'm always-- I'm not actually surprised it worked.

  • I'm a little bit surprised it worked.

  • Cool.

  • So I'll tell you what's going on here while it runs.

  • It's running exquisitely slowly, which is awesome.

  • It wasn't running so slowly last night, but we're Livestreaming.

  • COLTON OGDEN: Exquisitely slowly.

  • NICK WONG: So slowly.

  • Everything you Livestream, something that ran in maybe one second

  • will take four hours.

  • If you had a download that took a second, it's four hours now.

  • There's just no getting around that.

  • COLTON OGDEN: It feels like four hours.

  • NICK WONG: Yes, especially when you're sitting there just waiting.

  • But there's a lot for me to explain here,

  • so we're going to go through what's going on.

  • So you see this bar increasing on the left.

  • That's telling you your progress through the number of--

  • I'll see if I can find it in the code because that

  • would contextualize what's going on.

  • The number of steps.

  • So each one of the-- oh man, that was awful.

  • Each one of these little bars that goes through on the bottom

  • here is going to be one step.

  • Each line is one-- or sorry, is one epoch or epoch.

  • And that's going to be just-- we're going to run through 10 of them,

  • as per our parameters.

  • But then, our steps per epoch is how many steps do we actually

  • go through trying to train our data.

  • And that's going to be 20 in this case, and it's

  • going to count through each one that we walk through.

  • And then it tells us an ETA.

  • Those are generally fairly accurate.

  • I've found them to be pretty reliable.

  • This is the loss that we mentioned earlier-- not particularly meaningful.

  • You generally want it to go down.

  • That's pretty much all we really know.

  • COLTON OGDEN: Also, your computer is whirring right now.

  • NICK WONG: It's really funny.

  • If you run these in class and you're supposed

  • to be paying attention to lecture, then people

  • will start to look at you as your computer just

  • starts to get closer and closer to just exploding.

  • And I think it freaks people out.

  • They're like, oh god, why is your computer panicking?

  • And then we get to this accuracy metric, which is what we mentioned before.

  • And you'll notice there's actually two of them.

  • There's accuracy here, and then there's val accuracy.

  • What that stands for is this is the accuracy on the training data.

  • So that generally will go up just regardless

  • as it starts to memorize things.

  • But the validation accuracy--

  • excuse me-- should give us some reasonable metric

  • for how well we're actually doing.

  • And what we can do is, if this validation accuracy

  • is substantially higher than the training accuracy,

  • then you're doing a pretty good job.

  • You're literally predicting data that you've never seen before better

  • than the data you have seen before.

  • And that's not something that happens very often, if ever.

  • It's something that would be pretty rare.

  • If the validation accuracy was at 100 and our training accuracy was at,

  • like, 1/2, then--

  • oh, sorry.

  • I forgot that we were on the screen.

  • Right.

  • Thank you, Colton.

  • COLTON OGDEN: There it is.

  • NICK WONG: I appreciate it.

  • You'll see it.

  • I can actually probably move it up a little bit, and that might help.

  • COLTON OGDEN: Duck down just a touch.

  • NICK WONG: There we go.

  • Oh, well that-- it's close enough.

  • So yeah, that'll show up again when this epoch finishes.

  • And so validation accuracy versus accuracy are really good to compare.

  • If validation accuracy is substantially lower than accuracy--

  • and unfortunately, that happens kind of often--

  • that means you've probably over-fitted your data.

  • Because when you're given new data, you're just like,

  • oops, I don't know what's happening, but given data that you've seen before,

  • you're pretty good at it.

  • However, what you might notice-- and it takes a little bit at the end

  • here-- you'll notice it freezes on that last step in the epoch.

  • That's because it's running all the validation steps,

  • and they take a little bit too.

  • You'll notice that our accuracy is at 0.5,

  • and it's staying uniformly there, which is not great.

  • That basically means we're randomly guessing,

  • and we're randomly guessing really well.

  • We're just really random.

  • And what I mean by that is even if I were to flip a coin,

  • I would probably hover around 50.

  • I'd have a mean of 50 and an expectation of 50%, 50% heads or 50% tails,

  • but I wouldn't get exactly 50%, at least not usually.

  • So the fact that we are sticking at 50% is a result of the fact

  • that our data's pretty well balanced, but also we're just really random.

  • We're true randomness here.

  • We're literally guessing.

  • And so that's not ideal.

  • Probably not what we were looking for.

  • So that means that we now have to debug the actual model.

  • And I guess we deliberately left this here, where I made some of the mistakes

  • that I made last night, but I made them again for everyone else to see.

  • And that's because they're really common mistakes that show up

  • all the time in machine learning.

  • So it's really easy, usually, if you're given some sort of Python code or C

  • code-- how do you debug it?

  • How do I go about fixing that code?

  • And the problem with machine learning style things

  • is we don't necessarily know where to go debugging-wise.

  • There's no real article on how to debug a machine learning model.

  • There's a bunch of really technical ones,

  • but there's none that are really as approachable as this code seems

  • to imply it should be.

  • So we have a bunch of things to look at.

  • How do we build our model?

  • How are we getting our data?

  • How are we manipulating our data?

  • And what kind of optimizer are we using?

  • And what kind of activations do we have within building our model?

  • So I'm going to let this keep running, and we'll stay at 50, but that's OK.

  • It's cool to watch it go.

  • We can also make this a lot faster.

  • So there are some other things going on here that

  • can make our lives a little bit better.

  • Let's go about making it faster first.

  • So this is use_multipleprocessing, which is something that I think is correct.

  • And we're going to say false.

  • But I'm going to say the number of workers

  • or the number of independent threads you can spin up is 4.

  • And what this means is a lot of machine learning stuff

  • can actually be done in parallel.

  • I don't have to do it all in one thread on one processor.

  • I can spread it across as many processors as I want,

  • limited by the number I have.

  • And so what I'm going to do is I'm going to say, you know what?

  • You can actually use 4 workers or 4 threads.

  • And I turned off multi-processing.

  • If you use both at the same time, Keras will give you some warning and be like,

  • yeah, don't do that.

  • And the reasoning for that being, basically, you're

  • attempting to do two versions of the same thing.

  • I say that, and I'm sure someone somewhere just cringed.

  • They're not actually the same thing, but it would

  • accomplish a lot of the same goals.

  • So we're going to just use one instead of the other.

  • And I'm going to rerun this now.

  • It should run a little bit faster, which would be ideal for us.

  • COLTON OGDEN: Machine learning takes a long time.

  • NICK WONG: Machine learning can be very problematic as far as timing goes.

  • [INAUDIBLE]

  • COLTON OGDEN: Learning in real life takes a long time, so it makes sense.

  • NICK WONG: Yeah, exactly.

  • And we have one of the most powerful computers in the known universe,

  • actually, which is crazy.

  • So it is going very marginally faster, but not necessarily a whole lot.

  • All right.

  • So now we need to look at, well, what are we

  • doing that might be breaking our machine learning model?

  • Why is it not learning in the way we want it to?

  • So some things-- and this is where we get into a guess and check

  • sort of thing, and the programmatic part is

  • where you're looking, where you're guessing and checking.

  • So the first place I might look is in the way I built my model.

  • Is there something inherently that jumps out

  • that tells me, yeah, this is incorrect, like something here just doesn't work?

  • So I noticed that my activation here is sigmoid, which is right.

  • That's what I want for a binary classifier because of the same reasons

  • that we mentioned earlier.

  • My activation in all intermediate steps is also sigmoid,

  • but I could change that.

  • I could make it relu.

  • I don't actually know what that stands for.

  • I did at one point.

  • I don't anymore.

  • Or something like tanh.

  • And both would be reasonable.

  • There's no real reason to pick--

  • there is a real reason to pick one versus the other,

  • but we don't necessarily need to care.

  • But I'm going to leave that actually as sigmoid.

  • I'm going to look at the original activation, which is also sigmoid.

  • And that might be kind of problematic.

  • It might not necessarily be the source of our bug exactly,

  • but it might also be something to reconsider,

  • in that we said that a sigmoid activation creates it

  • so that you're in a binary state.

  • It's either down or up.

  • And we don't necessarily-- thank you.

  • MAGGUS503 just pointed out that relu is rectified linear unit.

  • Thank you.

  • COLTON OGDEN: And GHASSEN said, "Can I work with you guys?

  • I need a Skype interview with a company in America."

  • So CS50, to the best of my knowledge, is not currently hiring.

  • When we do hire, we accept all applications.

  • Normally, they're posted on our website, cs50.harvard.edu.

  • But like I said, I don't believe we have any job openings currently available.

  • In terms of getting a Skype interview with a company in America, that's

  • also something that we probably won't be able to assist with, but definitely

  • reach out to companies you're interested in working with.

  • Most companies have links on their website for applications.

  • Definitely build a portfolio of projects on GitHub or the like

  • so that you have more code to share with people.

  • And keep the search going, but I'm sorry that we're

  • unable to provide any more than that currently.

  • NICK WONG: But best of luck.

  • COLTON OGDEN: Yeah, definitely best of luck--

  • NICK WONG: Very good luck.

  • COLTON OGDEN: --in finding something.

  • There's tons of openings, I'm sure.

  • NICK WONG: Yeah.

  • All sorts of companies want people who know anything about coding

  • or are looking for just people to, I guess, put in effort and work.

  • I think-- we'll go on a short aside, like for four

  • seconds, that I think that don't ever lose motivation

  • on these sorts of things.

  • As someone-- I sent out I think like 104 job applications last year.

  • COLTON OGDEN: That's dedication.

  • NICK WONG: I got rejected from most of them.

  • You're really trying to get a job, and for me it's

  • like, well, I want to be doing something with my life over the summer.

  • I want to actually be contributing.

  • So just don't give up.

  • Just keep going, keep pushing through.

  • Rejection-- people very clichedly will say, like, rejection makes you grow,

  • or rejection teaches you stuff.

  • But I learned a lot from my rejections, and I think

  • that they actually do help you a lot.

  • So [INAUDIBLE].

  • COLTON OGDEN: Yeah, there's a phrase that I like that's--

  • who came up with it?

  • "Be so good they can't ignore you."

  • NICK WONG: Oh, right.

  • COLTON OGDEN: Steve Martin, I believe.

  • NICK WONG: Yeah, I think so.

  • COLTON OGDEN: That's a good phrase.

  • Keep working hard, and if you're really, really, really good at what you do,

  • you just won't help but be able to get something somewhere. that you want.

  • NICK WONG: I think Lang Lang, the pianist, has a phrase or a story

  • in his book where there's someone who hates him and is a judge for his piano

  • competition, and even they can't deny how good he is.

  • I think that that's really just super cool.

  • I think that's super inspiring.

  • All right.

  • So we did change some stuff about what was going on in our model.

  • I rectified, so to speak, the activation of our input layer.

  • I changed it from sigmoid to rectified linear unit or relu.

  • Relu.

  • And that actually caused our model to start doing some interesting things.

  • It's still very slow, and we added some extra workers,

  • but it's not necessarily going that much faster.

  • But if we look down at our metrics here--

  • and I believe Colton might be in one of the metrics,

  • but we can just raise it up.

  • COLTON OGDEN: Oh, I'm sorry.

  • NICK WONG: Oh no, you're all good.

  • We can just do this.

  • COLTON OGDEN: Cut off there.

  • NICK WONG: I think it's above you now.

  • COLTON OGDEN: I'm trying to go this way.

  • NICK WONG: There we go.

  • I think you're good.

  • I think you're good.

  • COLTON OGDEN: Cool.

  • NICK WONG: All right.

  • So what we can look at here is in this first line,

  • in that first epoch, what we see is our loss is--

  • whatever.

  • It's some set of numbers.

  • But our accuracies, our accuracy and validation accuracy,

  • are not 50-- or 0.5.

  • Sorry.

  • I'll probably refer to them as percentages,

  • but they are technically fractions between 0 and 1.

  • So they're actually 70 on our training data and 100% on our testing data,

  • on our validation done.

  • And if you recall or maybe you're currently watching in the future-- huh,

  • weird--

  • then you might recall that I said it's pretty rare that you

  • have your validation accuracy go higher than your actual accuracy

  • or your accuracy and your training data because you generally

  • learn you're training data lot better than your validation data.

  • But in this case, we are using something that's fairly simple,

  • and we are technically solving a problem that isn't too terrible.

  • So we actually see that our validation accuracy,

  • in now two epochs out of three so far, has been higher

  • than our training accuracy.

  • And we do happen to have just more variance within our training data set.

  • It's bigger.

  • It's double the size.

  • So it makes sense that we might see something

  • like this for a pretty simple problem.

  • So we're going to let that keep running.

  • It's still very slow.

  • And something that-- the reason I keep pointing that out

  • is to motivate the rest of-- or the next part that I'm going to talk about.

  • But it would be really cool if I could speed that up and not lose accuracy

  • because that would be-- that's the ideal, if I can make machine

  • learning model run really quickly, really cheaply, and not lose accuracy.

  • COLTON OGDEN: Millionaire.

  • Future millionaire.

  • NICK WONG: That'd be awesome.

  • You could you make a start right there on that.

  • And that's a gross oversimplification of the problem

  • and certainly not something that we can do,

  • but there is something that we can do to fix what's going on here.

  • Because I've mentioned earlier that these images,

  • the size doesn't really matter that much as long

  • as we keep most of the important data or most of the features.

  • We're using some pretty large images.

  • They're sized 500 by 500.

  • And so if we're running through, what, 20, 40 images, and we have 500

  • by 500 for each, well, 500 squared, we're on the order of 10,000,

  • or I guess 25,000 actual pixels or data points, times 40.

  • So we're at, at the very least, what is that, 2 million or 200,000?

  • I'm not very good at math.

  • And that's OK.

  • I think we're on the order of a couple 100,000 things

  • and calculations that have to go through all the time, and that's terrible.

  • It's definitely not what we're looking for and I guess not

  • what we really need.

  • So I would go through and change all of these 500s

  • to then maybe something like 150, a very common version of that.

  • The problem is, I wrote it in a bunch of places,

  • and it's going to just change in one way.

  • I'm going to use pretty much square images forever.

  • So this is a pretty good place to switch to some variable.

  • So im_dim or image dimensionality--

  • oh actually, that's not a great variable name.

  • It means something else.

  • We'll say image-- we'll call this one width.

  • It actually doesn't matter which one we define as width

  • and which one we define as height as long as we're self-consistent.

  • So I'm going to say image height and image width are in that order,

  • and I'm going to replace all of our 500s with those,

  • which should, if we don't screw up missing one of them,

  • should help us out a little bit.

  • And then, similar to how we defined model up here,

  • we're going to also define each of those.

  • And we're going to bring that down to about 150 by 150.

  • Let's make sure that I didn't leave any spare 500s.

  • Cool.

  • And now we know that we can control the image height and width by just

  • changing a configuration variable.

  • And in my previous project or in the talk that I gave last year,

  • the code that I used for that actually has just a configuration module,

  • and you can just change config variables and it will change everything

  • about how it works.

  • And so I think that, generally, moving those out of the actual code

  • so that you can think of the code in terms of abstractly what it's doing

  • can clean up a lot of your actual debugging and processing.

  • So we're going to stop this, the currently running training session.

  • And it says, "User aborted!"

  • And that's great.

  • We're going to run it again but with our smaller sizes,

  • and we're going to see if that changes the speed at which we go.

  • And I say that like I didn't know.

  • COLTON OGDEN: Visibly it looks like it's definitely moving a lot faster.

  • NICK WONG: Yeah, it's so much faster.

  • And you'll notice our accuracy has not gone down.

  • We're still interpreting these images pretty accurately.

  • And you can think of this in a human context is if I handed you

  • a really small image versus an enormous poster,

  • you could tell me pretty quickly whether either one is dark or light.

  • You could look at the smaller version and be like, yeah, that's dark.

  • And you look have at the big poster board.

  • Yeah, that's light.

  • And that's pretty immediately doable for us.

  • So this is basically what we're seeing.

  • We're seeing the machine equivalent of that.

  • And right now, you could argue that we've over-fit our data.

  • That's probably true.

  • But our validation accuracy is still also true.

  • The problem is that that might lead you to think, oh, well then we

  • can now use this model and predict on our prediction data, which

  • is definitely our next step.

  • But that's not entirely the case.

  • You'll notice that loss parameter actually has

  • continued to go down from row to row.

  • And that tells us something about what we're doing to our data.

  • It's one of the only times that I'm going to point at loss as something

  • that we really want to consider.

  • As that continues going down, it means we're getting closer and closer

  • to just getting it right.

  • But that's problematic on our training data

  • because it means that we're over-fitting.

  • We're going to just hyper-fit ourselves to this data set,

  • and we're going to put on our blinders and say, this is all I know,

  • and this is all I can ever deal with.

  • That's really problematic for images, where even one pixel difference means

  • that you have a different image technically.

  • So we want to prevent that.

  • And luckily, right before we go into prediction,

  • we actually have a layer that does that.

  • So we're going to add that layer in between, sandwiched between two Denses.

  • And that layer is called Dropout.

  • And I know we are running a little bit low on time, but that's OK.

  • I think we are actually going to run perfectly into--

  • COLTON OGDEN: We can go--

  • NICK WONG: Finish this.

  • COLTON OGDEN: --a few over if you need to take a few minutes to--

  • NICK WONG: All right.

  • COLTON OGDEN: Fairly flexible format on Twitch.

  • NICK WONG: Right.

  • That is a good point.

  • That's actually one of the beautiful aspects of this.

  • All right.

  • So what we've done here is we've added Dropout, which means everyone once

  • in a while, we're going to just randomly drop out data.

  • We're going to just ignore it for that training session.

  • And that's really cool.

  • It means that we can actually counterbalance the over-fitting

  • that we might do.

  • I said 0.2, but I'm going to make it really aggressive.

  • The term "aggressive dropout" is something that I think is really funny.

  • It makes me think of a college kid who just drops out of college

  • and is extremely aggressive about it.

  • But yeah, we have this aggressive dropout,

  • which means that we're dropping out a pretty high number of things or data

  • points at each point.

  • And that's going to help us prevent ourselves from over-fitting.

  • I'm also going to, now that we have this whole thing working on up to training,

  • I'm going to add that to GitHub.

  • oops.

  • "Added model training."

  • And then we can git push.

  • And so now what we have left to do is--

  • can we actually try and predict on the data

  • that we have in our prediction directory?

  • So that's going to be our last step, and we're going to say that our predict--

  • our predict on, and we'll say image as well because that's

  • where our prediction data lives.

  • So then, in here I'm going to say define predict,

  • and it's going to take an img_dir, and all

  • we have to do is let's recall what the actual structure of predict is.

  • So under Image, under Predict, it's just images.

  • And what I might say is, OK, then let's actually just take those images,

  • and we're going to save their paths.

  • I need to be able to save their actual paths.

  • And that's actually what I had intended to do over here.

  • I just conflated the two in my head.

  • And we now have this lovely line of code.

  • There is no actual need to do any of this os path joining.

  • If you'll recall, we're just using listdir

  • to tell us what images are in that.

  • And then we're going to img_dir predict of what's going on in there.

  • And this should give us access to all of the images

  • that we actually want to use.

  • Let me actually think through that before I tell you that that's the case.

  • Ah. os.listdir.

  • I actually do need the os.path.join.

  • Oop, not "jin." "join."

  • That's one of the features I don't like of VSC.

  • And "predict."

  • So now we have access to all of the images

  • that we want to have prediction on.

  • And I'm just going to call these im paths or img_paths.

  • And the reason that I'm going to keep track of those

  • is so that I can tell which image I'm actually making a prediction on later.

  • And so I have to now load in these images.

  • And this is where we actually use cv2.

  • So let's say images is equal to--

  • and I'm going to use, again, list comprehensions. cv2.imread

  • of img for img in img_paths.

  • And this will allow us to actually read in each of the images.

  • And then I'm going to again do images is cv2.reshape, I believe, of the img.

  • And we're going to reshape it to our height.

  • Oops, I called that im_height.

  • im_height, im_width-- god I love the auto-complete-- by 3.

  • And this is, again, for img in images.

  • Cool.

  • And what this basically does-- and I'm abbreviating the steps

  • and cutting down a little bit on my narration-- but cv2.imread means

  • read the image into what we're dealing with.

  • It reads it in as a NumPy array.

  • And then this cv2.reshape of the img actually puts it

  • into the right shape for us, I believe.

  • I think that's a necessary step.

  • I might be conflating it with the NumPy version.

  • And there's also-- yeah, I have a feeling I mixed these two together.

  • We're going to actually comment that one out for now.

  • I think I'm talking about the NumBy one.

  • We have numpy.reshape, and that sounds much more correct,

  • so I'm actually going to just copy the line I had above.

  • And this basically just takes the images, like dimensionality,

  • and reformats them a little bit so that we can actually feed them

  • into our machine learning model.

  • Our pre-processing step before up at the top when we load in the data

  • actually did that for us.

  • We just didn't really realize it.

  • Cool.

  • So this should be all of our actual images.

  • And then we're going to say predictions--

  • actually, we can just return from there.

  • We can return again a list comprehension of m.predict_classes.

  • I don't know if it's plural.

  • We'll find out.

  • Gotta love the just random guessing sometimes.

  • On img for img in images.

  • And this will tell us, what does it think each thing is?

  • However, that's going to return to us something that's a little weird.

  • And we can actually printed it.

  • Printed it out?

  • Who knows?

  • We can look at what those predictions might look like.

  • And since we're debugging this and not actually the number of epochs--

  • or sorry, not the actual model, we just want

  • to see what it would look like later--

  • we can actually reduce this to two steps per epoch and one epoch.

  • And that's going to be really fast.

  • It's going to just skim through.

  • Oops.

  • I just did git push again.

  • There we go.

  • Cool.

  • I have invalid syntax, and that makes sense.

  • I forgot the word "in" on line 81.

  • Doo-doo-doo-doo.

  • That's one of those things where if you're watching someone code,

  • you see that immediately, and you're like, no!

  • How did you forget that?

  • But then when you're coding it live, you're just like, eh.

  • All right.

  • Cool.

  • Cannot reshape an array of size 150804 into shape 150, 150, 3.

  • I'm going to actually cat a piece of code

  • I wrote a long time ago to see what that was just

  • so I can verify what that resize was.

  • Ah, cv2.resize.

  • OK, so there was actually a step there.

  • Sorry.

  • I'm peeking at some code I wrote earlier.

  • It's code from a different project, but it does do the right thing.

  • COLTON OGDEN: Code reuse is an important thing.

  • NICK WONG: Code reuse.

  • So OK, we have cv2 resizing the image to the right dimensionality for us,

  • and then we have NumPy reshaping it into the proper array for what

  • we're actually looking for.

  • And that should actually be correct.

  • The rest of this is roughly true.

  • And I'm going to stop looking at my previous code now.

  • Cool.

  • So then, let's go ahead and print these out

  • and see what that looks like for us.

  • 42FORCE points out, "Code reuse works all the time."

  • Yes.

  • I am generally a big fan.

  • All right, so we get these just random looking arrays.

  • They don't look particularly meaningful.

  • This is a NumPy data structure where you have an array

  • and it has a super listed in number.

  • It's 1.

  • The other one could be 0, but it's actually they're both 1.

  • And then our dtype is a 32-bit integer.

  • So that's not super meaningful to us, but if we actually just access

  • the 0th element of the 0th element of each of those arrays,

  • we'll just get the numbers back.

  • And then what we can do is--

  • I'm actually going to also print out which

  • image path that's associated with.

  • So I'm going to say that img_paths at i, i in enumerate of images.

  • COLTON OGDEN: MKLOPPENBURG says, "Will have to finish this later on.

  • It's been a joy.

  • Thanks Colton, Nick for doing this."

  • NICK WONG: Awesome.

  • COLTON OGDEN: It's been a good time.

  • NICK WONG: Thank you.

  • We appreciate it.

  • COLTON OGDEN: GHASSEN says, "Can I guys send you my CV so that you take a look?

  • I built a great resume."

  • Again, I don't believe CS50 is currently hiring or looking at CVs or resumes.

  • Normally, when we do, we send out--

  • we have an apply link setup which is currently not accepting.

  • But particularly if you're interested in working for a Silicon Valley company,

  • I would do research into companies in that region.

  • Check out their website.

  • Check out their job applications.

  • And then submit your CV there.

  • You'll probably have much better luck doing so.

  • But thank you for the-- thank you for offering to us.

  • And if we are hiring in the future, definitely submit it.

  • And then BHAKIV_KNIGHT says, "0, 0 in brackets would do as well for NumPy."

  • NICK WONG: Yes, that is true.

  • I think it also works for Python.

  • I'm just generally a fan of the old-fashioned way.

  • I'm not that old, but I am certainly an old-fashioned user.

  • All right.

  • So I am still copying things to speed things along a little bit.

  • And then, I will tell us what's going on.

  • All right.

  • So we can actually then go from here.

  • All right.

  • So we want to be able to map between the numbers and the actual classes

  • that we're talking about, like light, dark, things like that.

  • So what we're going to say--

  • nice.

  • All right.

  • Colton got called out for his great people skills.

  • That's true.

  • He does have fantastic people skills.

  • I wanted to make sure that he didn't end up reading his own compliment.

  • COLTON OGDEN: Thanks, Astley.

  • Appreciate that.

  • NICK WONG: He does a great job with people.

  • All right.

  • So we're going to define invert mapping.

  • It's something that I think is just useful.

  • Given a dictionary, which I'm going to call d,

  • we're going to have inverted, which is the inverted version

  • of that dictionary.

  • And I can say, "for key in d."

  • And actually, you could say "key, value in d.items."

  • This might make it a little bit easier on us.

  • Then we can say inverted at the value is equal to the key.

  • And this assumes a lot about your dictionary and how it works,

  • but that's OK.

  • We can actually prove that those assumptions are valid

  • because of how Keras defines the class indices dictionary.

  • But that basically allows us to switch it from--

  • it gives you the index name or the label name with the number,

  • and I want the number with the label name mapped to it.

  • So I just invert the dictionary for us.

  • There might actually even be a Python function that already exists for that.

  • I just don't happen to know it.

  • COLTON OGDEN: BHAVIK_KNIGHT has a dictionary comprehension just there

  • in the chat.

  • NICK WONG: Ah.

  • Yeah, that's perfect.

  • If you look in the chat, BHAVIK_KNIGHT's version of the dictionary inversion

  • is exactly what I just wrote out, but in one line and certainly very readable.

  • So very nice.

  • Cool.

  • So now that we have this kind of mapping, we can then just go through.

  • And for our prediction---

  • or actually, we're going to say the I think value is what I called it,

  • value and im_name in predictions.

  • We can actually now print out what each of those things was.

  • So "We think that im_name is val."

  • Oops.

  • Sorry, not val but rather mapping at val.

  • Cool.

  • And if we run this through-- oh, that's not what I wanted.

  • That's what I wanted.

  • If we run this through really quickly, then this will not actually

  • build our model correctly.

  • It's certainly not the right version of the model.

  • It's not going to get things right.

  • But what we are going to have is at the very end,

  • it'll print predictions on any image that's in our predict directory.

  • And that's what we see at the very bottom here, is that it says,

  • we think that the dark_1.jpeg is light, and we think that light_1.jpeg is also

  • light.

  • So it got one of them right and not the other one,

  • which is not super great for us.

  • COLTON OGDEN: 50%

  • NICK WONG: 50%.

  • But we know that if we train things a little bit longer,

  • we can actually do a lot better.

  • COLTON OGDEN: FORSUNLIGHT says, "Nick, old is gold."

  • NICK WONG: I'm a huge fan.

  • I also love that the old way of labeling global variables

  • used to be by prepending a g to them.

  • COLTON OGDEN: Oh yeah.

  • NICK WONG: So that's actually--

  • I don't know if you intended that, but it is certainly

  • the greatest pun on that.

  • All right.

  • So what I promised at the beginning--

  • I can, I guess-- hopefully you'll take on faith

  • that if we increase the number of times that we train this

  • and allow it to run its full course, we could actually get this to a point

  • where it is correctly predicting on light and dark images.

  • However, I promised that we could do cartoon versus real images,

  • and so let's actually set up that data.

  • And we'll find that it's actually really easy.

  • The way that we set up this structure, all we have to do

  • is replace the image directory and then re-run our script, and that's it.

  • And so what we're going to do is I'm going to call this img.old.

  • And then I'm going to just copy it and create a new one.

  • Oh, I can't paste it into it's own location.

  • Bummer.

  • That's OK.

  • And we're going to now create this new image which

  • still has predict, test, and train.

  • It's going to have two different jpegs in there.

  • And in test, instead of having dark and light,

  • it actually has cartoon and real.

  • COLTON OGDEN: GHASSEN says, "Like swift 3.

  • Very interesting.

  • We spoke about SQL last session.

  • I hope that we can build a common platform

  • and start talking programming."

  • Yeah, Swift is a good idea.

  • I'll maybe see around if anybody knows how to program in Swift

  • and wants to maybe make a simple application on stream.

  • If you want to maybe read TWITCH's last comment there.

  • NICK WONG: Ah.

  • All right.

  • So TWITCHHELLOWORLD says, "Thank you.

  • This is so interesting.

  • I heard that the later AI that triumphed over the initial program, which

  • won over the human at go, had not been trained by humans,

  • rather simply given the rule's games, then left to train itself."

  • Yes.

  • That's reinforcement learning at its best.

  • "Is that a type of coding that is called AI as opposed to machine learning?

  • How is it similar or different in terms of the coding?

  • Thanks."

  • So yeah, fantastic question, and something that

  • is one of the greater confusions with how AI and machine learning

  • are related.

  • A lot of times, AI utilizes machine learning

  • practices in order to generate what looks like intelligence.

  • And the goal is actually to simulate human intelligence.

  • That is the benchmark, give or take.

  • And so what TWITCHHELLOWORLD points out there,

  • which is that the AI, or AlphaGo, that beat a human being at go

  • was not actually trained by humans and was just handed the rules

  • and figured it out.

  • That's really crazy.

  • It's one of the most powerful versions of machine learning,

  • or artificial intelligence I think is actually much more close

  • to what you're actually going at there.

  • And it's super cool that it was able to do that.

  • That is a completely different.

  • COLTON OGDEN: Just played a ton of games against itself, basically--

  • NICK WONG: Pretty much.

  • COLTON OGDEN: --and figured out what led to a victory?

  • NICK WONG: It kind of simulated it in its head and figured it out,

  • and that's nuts.

  • That's insane.

  • Actually, there's this really cool version of that type of learning.

  • It's similar.

  • It's the reinforcement unsupervised learning

  • challenge bu I think Microsoft.

  • And they use Minecraft for it.

  • So you actually build a Minecraft agent and have it compete with other agents

  • to survive.

  • And that's something that is also on my list of projects for winter break,

  • is to build my own Minecraft agent and see how I can do on that.

  • COLTON OGDEN: See if you beat out everyone else.

  • Battle royale.

  • NICK WONG: Battle royale.

  • I'm a huge fan of building agents for online video games

  • like PUBG, things like that.

  • Also, we're going to just start dragging cartoon portraits of people--

  • COLTON OGDEN: Oh, nice.

  • NICK WONG: --onto our data list.

  • Now, generally speaking, you want to be really careful

  • when you're picking up human data because there are a lot of racist AIs

  • out there.

  • And that basically just means they were trained

  • on data sets that weren't complete.

  • These data sets did not have representations of all kinds of people,

  • and that's really important.

  • It's something I actually feel very strongly about--

  • making sure that people realize that you're just doing bad science if you

  • don't have a complete data set.

  • So even in the worst case, where I can't, I guess,

  • help you from being racist, you should certainly

  • be able to recognize that it's bad science

  • to not have a complete data set.

  • And in the best case, you realize that being racist is not ideal,

  • and so you are also trying to obtain the real goal of being

  • representative and inclusive of everyone that could possibly

  • be using your device.

  • COLTON OGDEN: Well said.

  • PUPPYLIONHD says, "Halfway in this CS50 course,

  • should I be able to understand all of that?

  • Because I don't, and it's kind of frustrating."

  • NICK WONG: I would argue certainly not.

  • I think even if you were working on--

  • if you were all the way through CS50, I would

  • argue that you shouldn't necessarily be able to just understand

  • this right off the bat.

  • And actually, yeah, I think it should be a little bit frustrating.

  • It certainly was for me when I began.

  • And I know for a lot of people it is constantly very frustrating.

  • So I think what I would advise, or my general advice for that

  • is, yes, it's very frustrating at the beginning and throughout,

  • but looking back at what you can do now is a really good way

  • to alleviate that frustration.

  • So for me, when I went into CS50, that was my first time I'd ever coded.

  • That was two years ago.

  • We're I guess getting on three.

  • COLTON OGDEN: Are you a junior now?

  • NICK WONG: I am a junior, yeah.

  • And so I actually took CS50 as my first ever CS class.

  • I had never really coded before then.

  • I had built computers and things, and so there

  • were some we'll say routes for doing some computer stuff,

  • but I had never actually written a line of code.

  • And so I was very lost.

  • And I give a lot of credit to my TF at the time, actually,

  • for helping me with how CS could be a big part of my life.

  • COLTON OGDEN: Who was your TF?

  • NICK WONG: His name was Vik Jayaram.

  • COLTON OGDEN: Oh, yes.

  • NICK WONG: So if you're watching, then--

  • COLTON OGDEN: Shout out to Vik.

  • NICK WONG: Yeah, shout out to Vik.

  • COLTON OGDEN: I know Jik--

  • Vic personally.

  • I can't talk.

  • NICK WONG: He's a great guy.

  • COLTON OGDEN: Yeah, he's super cool.

  • NICK WONG: I still remember the sections that I was in with him.

  • And he also was super willing to be like, yeah it's hard,

  • but you can do it.

  • He was very encouraging in that sense.

  • And I give him a lot of credit for making

  • me feel like I could actually go through and accomplish a lot in CS.

  • I've been teaching the class that he taught for two years now.

  • So that's very cool.

  • And I was really excited about that.

  • And I like to tell it to a lot of my students,

  • is that I started out not that far from where you are now.

  • I really wasn't all that different.

  • And I think that that can be really helpful,

  • especially if you're sitting there and you're like,

  • wow, this seems very complex or it seems absurd,

  • it seems basically impossible for me to accomplish.

  • That is certainly not the case.

  • And it's not the case that I hope any CS people or any CS staff

  • ever propagate that opinion.

  • I think that it is very important for people

  • to realize that CS can be for anyone that wants to be in it.

  • I think that it rewards people who have natural talent.

  • That's true.

  • Almost any field does.

  • But it particularly will really reward people for putting in effort.

  • And a lot of fields are like that too, where

  • you will actually get a lot of rewards out of it for putting in real effort.

  • COLTON OGDEN: Yeah, and all these streams, they take a long time,

  • and it's just a small fraction of the amount of time

  • that it actually does take to do all of this stuff.

  • NICK WONG: That's true.

  • COLTON OGDEN: It's time consuming, but if you

  • enjoy it and you put in the work, it's very rewarding.

  • NICK WONG: Exactly.

  • COLTON OGDEN: And it's fun.

  • It can be fun.

  • NICK WONG: I could not agree more.

  • COLTON OGDEN: "Nick is not racist and loves puns.

  • Great addition to the channel."

  • Yes, I very much agree.

  • NICK WONG: Thank you.

  • I appreciate it.

  • COLTON OGDEN: "What is the difference between React and Angular?"

  • NICK WONG: Oh, that's a good question.

  • COLTON OGDEN: So unrelated to machine learning, but in a nutshell,

  • they are both front end frameworks for web development.

  • They abstract what it means to do for an end development

  • and make it a bit more modular and are more high level.

  • Tomorrow, we're actually going be talking about React with Brian Yu--

  • NICK WONG: Hey, Brian.

  • COLTON OGDEN: --where we can maybe do a deeper dive into the differences

  • and nuances between the two.

  • But React is certainly the more popular presently,

  • although Angular has been around for longer

  • and has also had a lot of fame associated with it.

  • "Is it safe to study the same sort of coding and courses

  • as machine learning to eventually execute unsupervised learning?"

  • NICK WONG: So let's see.

  • Yes and no.

  • Yes in the sense that you need the basic background and other information--

  • oh, sorry.

  • Ignore my desktop, it's a mess.

  • You do need a lot of the same background and basic information and practices

  • that you see in this supervised rigorous learning that we're doing right now.

  • But you also would be exposed to a lot of new techniques

  • that people realize they could use when they

  • didn't have to actually watch or give answers to their AI or their machine.

  • There's also a lot of different best practices in play,

  • like I'm not really necessarily collecting data.

  • I might be, as was pointed out earlier, I might be pointing out rules.

  • I might be looking for patterns that are a little bit broader

  • than the actual data that we're looking at.

  • And that can be really important, especially

  • if you're creating these agents.

  • Then you give them a set of rules, and then you see what they do.

  • Or you might give them a set of rules, see what they do for a little bit,

  • feed them some information as a response to what they did,

  • and then let them go again.

  • So there might be interactions.

  • Actually, I think a lot of the time there

  • are interactions between this unsupervised learning

  • and the supervised learning.

  • So yes, you definitely would want--

  • I think this is easier to start with.

  • It's a little bit easier for us to contextualize.

  • Excuse me.

  • But you will learn a lot more in that field as well.

  • There is a lot to that on its own.

  • I think this is a good place to start, though.

  • There are a bunch of questions.

  • COLTON OGDEN: "What is the best technology that fits

  • for certain and guaranteed hiring?"

  • Honestly, that's a hard question to answer just because every industry is

  • a little different.

  • Every niche in every industry is a little bit different.

  • I would say figure out what you want to do in computer science

  • or whatever field you're interested in.

  • Do a deep dive on it.

  • Learn a lot.

  • Spend a lot of time figuring out whether you enjoy it or not.

  • And make some projects.

  • Flesh out your GitHub a bit.

  • Reach out to people.

  • Get some smaller jobs.

  • See if their work experience is fulfilling.

  • And build upon it from there.

  • But it's hard to frame it as there being any sort of magic bullet or magic trick

  • because at the end of the day, it's going to be a lot of hard work

  • ultimately, because if it was easy and free, everybody would do it

  • and everybody would have a very well-paying job in tech.

  • NICK WONG: That's true.

  • COLTON OGDEN: So the reality is you're going to have to spend a lot of time

  • figuring that out for yourself and doing research and getting

  • good at various technologies, and ultimately probably finding

  • a specialization or a given strength in a particular field.

  • But good question.

  • NICK WONG: Very, very good question.

  • COLTON OGDEN: FORSUNLIGHT says, "This is the real power of teaching.

  • Connection fuels commitment.

  • I guess daily live streams will increase the commitment of students."

  • I think so too, and I honestly hope that this channel helps inspire other people

  • to dig into more projects of their own and we

  • get to build some really cool stuff on stream.

  • I know today's has been really cool.

  • It's been very unique, all the ones we've had so far, and very technical,

  • which is nice.

  • The deeper we can go into this stuff, I think it shows a little bit more

  • of the reality of actually getting into the nuts

  • and bolts of this kind of stuff.

  • NICK WONG: Yeah.

  • I'm super excited that Colton asked me to come and join and be

  • able to present with all you guys.

  • But also, I think Colton's idea for just having

  • a Twitch Livestream, that's fantastic.

  • I think that's super awesome for what you guys get to do.

  • And you get to have just new topics all the time.

  • It's so cool

  • COLTON OGDEN: I get to come in and not even have to prepare anything.

  • It's great.

  • I can mess up live on stream and everyone sees

  • what it's like to be a real programmer.

  • How about that?

  • NICK WONG: Actually, that's a really important part of this.

  • And Colton told me about it last week when

  • I was thinking about what to do for this, is that being able to mess

  • up live on camera, that's great.

  • I think it shows a lot of people just the ridiculousness of the belief

  • that people just are monolithic coders, that they just come in

  • and they just go boom, boom, boom.

  • Those coders, I guess, do exist, but they aren't necessarily--

  • they're usually a product of having done it a lot.

  • They've sat down and really fleshed this out.

  • So having I guess the live version of that,

  • going online and having that in front of you, that's really cool,

  • and it's something that I wish you saw more in other fields too.

  • COLTON OGDEN: Yeah.

  • A big trend is making an app in 10 or 15 minutes.

  • But the reality is it's very unlikely that any actual app or project you make

  • is going to take 15 minutes.

  • And those videos are cool for introducing

  • at least the concepts behind a lot of those ideas things,

  • like building the games and such.

  • GHASSEN says, "Thank you, Colton.

  • Hope to be friends on Facebook."

  • Yep.

  • Send me a friend request.

  • I'll accept it.

  • "Shout out to Colton.

  • Livestream is cool.

  • I like to see people get frustrated and break code

  • like I do in psets, practice, et cetera."

  • Yeah, that's going to happen.

  • NICK WONG: It certainly happens.

  • COLTON OGDEN: That's absolutely going to happen.

  • BHAVIK, thank you for that.

  • NICK WONG: All right.

  • So as promised, a little bit behind schedule, but definitely as promised,

  • I've edited a little bit of our code.

  • I added a command line argument so that we could modify things as we go.

  • But we have just changed the entirety of what we're doing.

  • We're no longer classifying between light and dark.

  • We are classifying between real and cartoon pictures of people.

  • And all we had to do was find new data sets and label them.

  • That was it.

  • And the code will work entirely fine on those.

  • You'll notice things are a little bit harder here,

  • where we still are validating things--

  • oh, nice.

  • David through your Facebook in there.

  • COLTON OGDEN: Shout out to David for throwing my Facebook in the chat there.

  • "Prof is here!"

  • Welcome, David sir.

  • NICK WONG: Yep.

  • Welcome back, David.

  • And so this validation accuracy is not necessarily

  • going straight to 100 anymore.

  • It is a little bit difficult to classify, really,

  • between cartoon images and real life ones.

  • But you'll notice we're doing way better than just guessing.

  • We are not at 50/50.

  • We're at some 80% and 80%, which means we're roughly--

  • oops-- not too terrible on this.

  • Ah, that's the same bug I had last time.

  • Let me get rid of that and let it run again.

  • rm img/predict/.DS.

  • There we are.

  • So annoying.

  • I told you at the very beginning of the Livestream that was going to come up,

  • and then I forgot about it.

  • COLTON OGDEN: It's like we've come full circle.

  • NICK WONG: Yeah.

  • We've come all the way back to what we had at the very beginning,

  • where there's a small bug, we fixed it, and now we

  • are doing what we promised at the beginning.

  • We are classifying between cartoon images of people

  • and real images of people.

  • And what that means is--

  • in concept, you could train this a lot better with way more comprehensive data

  • and doing all sorts of cool things.

  • But even with very limited data sets in roughly two minute training times

  • and in and hour and a half, two hours of coding,

  • we were able to get to a model that works pretty well

  • on what's going on here.

  • There's actually quite a few concepts that we demonstrate in this model.

  • There's a lot of things going on underneath the hood, which

  • is one of David's actual favorite phrases, I think.

  • What's the record?

  • 40-something?

  • COLTON OGDEN: I think someone cut a video on our production staff

  • of how many times David said "under the hood," and it was a lot.

  • NICK WONG: It was pretty funny.

  • COLTON OGDEN: It was a good number.

  • A good number of it.

  • NICK WONG: So yeah, there's all sorts of things going on

  • in this even simple project.

  • And I think that there's a lot of places for it to grow and expand.

  • So you're welcome to take the code.

  • It's all yours.

  • There's no need to credit me with it.

  • And the images are not mine except for the pictures of me.

  • There is a picture of my face.

  • I use it as the profile picture.

  • And you'll notice, even here, we didn't do a great job of predicting.

  • We guessed that they were both cartoons.

  • One of them is certainly not a cartoon.

  • I'm a real person, I swear.

  • COLTON OGDEN: It's debatable.

  • NICK WONG: Debatable.

  • I'm certainly, not a cartoon version of myself, I think.

  • But yes, we basically were able to get from beginning

  • to end in about two hours.

  • And we have a machine learning model that allows you to predict on things.

  • There's plenty of room to modify this to make it a little bit more general,

  • to make it usable from the command line or as an app.

  • But this is certainly a--

  • I think if you were able to get through this even without fully

  • understanding everything, even if you were just sitting there and being like,

  • oh, this is kind of cool, that's fantastic.

  • COLTON OGDEN: Even I'll admit, I don't know every detail

  • of what we talked about, but it's been fascinating

  • and actually inspiring to go and dive a little bit deeper

  • into these technologies--

  • NICK WONG: That's awesome.

  • COLTON OGDEN: --into machine learning.

  • So I think if that's even what some people take away from this,

  • they're like things, different fields, different niches of CS or other fields

  • can be interesting or inspiring you without knowing all the details, that's

  • great.

  • That's how seeds are planted for future exploration.

  • NICK WONG: Yeah.

  • I thoroughly agree with that.

  • Couldn't be better said.

  • COLTON OGDEN: "Thank you very much," says DKBUXBOM.

  • "It has been instructive and very inspiring.

  • Special thanks for the encouragement."

  • Absolutely.

  • It's been a great session.

  • NICK WONG: Yeah, of course.

  • Thank you.

  • We appreciate having you guys sitting here talking with us.

  • It's really cool.

  • COLTON OGDEN: Yeah, the back and forth is what really makes this a ton of fun.

  • 42FORCE-- "Many thanks for the effort Colton and Nick along with rest

  • of CS50 team!

  • This is something to be thankful for by being here in the 20th century.

  • Education is now a free resource."

  • Absolutely.

  • No, and it changes every day.

  • And a special thanks to our production team for setting this up for us too.

  • This is only possible because of their hard work and for David--

  • NICK WONG: Snaps for the production team.

  • COLTON OGDEN: --being in support of the vision as well.

  • So shout outs to David and the production team.

  • Maybe want to read off TWITCH--

  • NICK WONG: Sure, yeah.

  • So TWITCHHELLOWORLD.

  • Nice.

  • You went for two smaller ones, and then--

  • COLTON OGDEN: Giving you.

  • NICK WONG: I appreciate it.

  • No, it's cool.

  • So TWITCHHELLOWORLD says, "Is the distinction between, say,

  • a PhD in AI for instance, those who develop the new techniques

  • and.or develop TensorFlow itself as opposed to the MS/BS degree holders

  • and self-trained as those that execute these techniques in various settings?

  • I hear sometimes PhDs are overqualified for some openings,

  • though it seems some companies prefer advanced degrees for their AI research.

  • Thank you so much for this."

  • So of course, you're welcome.

  • We appreciate it.

  • But also, that's a great question on the difference between PhD MS versus BS.

  • And actually, a lot of that can vary by country.

  • I know that in certain countries it's actually really common to get a BS,

  • then an MS, then a PhD.

  • In the United States, a lot of times you jump from BS to PhD.

  • There's maybe some years of research in between,

  • but you actually just go straight from one to the other.

  • And so that's a really good question for asking, maybe,

  • what are companies looking for?

  • And a lot of companies, especially the big five, you're looking at them

  • and you might say, OK, well I might want to go work

  • for like DeepMind, Google DeepMind.

  • They tend to look for a lot of PhDs.

  • I don't know of too many people who are undergrads

  • who are going to work on DeepMind.

  • That's very cool.

  • It's super technical work.

  • And they tend to do a lot more theory before they put it into practice.

  • But I think differentiating between people who are purely theoretical

  • and people who are purely practical is kind of a false dichotomy.

  • There are people who are very theoretical who can do

  • a lot of really cool practical stuff.

  • They can build apps out of nowhere.

  • And there are people who are super practical who

  • also happen to know a lot of theory.

  • And so I think that they go hand-in-hand.

  • The more theory you know, I think it makes it a little bit easier

  • to do things practically.

  • There are some classic counterexamples to this,

  • but I think in general, that's true.

  • And the more practical things you do, it motivates a lot of the theory.

  • Why am I curious about how this actually works underneath the hood

  • until I've done it?

  • And then I see where things break.

  • Now, maybe when I understand how things work,

  • I might say, oh, that makes a lot of sense.

  • I can actually fix this now.

  • And so it influences both directions.

  • COLTON OGDEN: We can answer questions like, what's a sigmoid?

  • NICK WONG: Right.

  • Right, exactly.

  • COLTON OGDEN: "What if we live in a cartoon?"

  • says MAGGUS.

  • NICK WONG: That would be really cool.

  • Actually, there is a friend of mine who posed a really cool thought

  • experiment on that, which was-- could you ever

  • believe that we would be living in a matrix?

  • I'm going to just absolutely bungle this.

  • But he said it really eloquently, which was

  • like, if you could believe that we could ever be a matrix,

  • then it's probably pretty possible that we're

  • in one, because if you think that AI could ever be representative of a human

  • being, then the fact that that could already exist

  • would make us exactly where we are now.

  • It would be on not identifiable.

  • And I think there's a very famous discussion or set

  • of quotations between Elon Musk and Stephen

  • Hawking and talking about the probability that we live inside

  • of a simulation.

  • And they say that it's really high.

  • The probability that we don't live in a simulation is very low.

  • And based on what we know, that's true, at least

  • based on that topic and my readings of the articles on that.

  • However, a counter to that example is, let's say

  • that I have a soccer team A and a soccer team B,

  • and I say, what's the probability that soccer team

  • A wins when they play together?

  • And you would pretty reasonably, hopefully, say it's about 50%.

  • And that's true.

  • That's the current probability given what you know.

  • However, if I tell you that soccer team A is Harvard's varsity boys soccer team

  • and soccer team B is my kindergarten soccer team,

  • you now know that the probability that soccer team A wins is roughly 100.

  • I would argue actually 100.

  • I hope it's 100.

  • So that changes--

  • COLTON OGDEN: Shout out to the Harvard soccer team.

  • NICK WONG: Yeah, go Harvard.

  • So that's a crazy shift in what our probability is.

  • It went from something that is literally probabilistic to something

  • that is quite hopefully deterministic.

  • And that, I think, points to a lot of these questions about--

  • are we a cartoon?

  • Are we in a matrix?

  • Are we in these sort of things?

  • There are people who are very smart and very

  • relevant to this sort of discussion, and they have a lot of cool opinions on it,

  • but it's good to also temper those opinions

  • with the reality of the probability that we know very much

  • about our world and our universe even.

  • Pretty minimal.

  • We don't even really know what's in the oceans.

  • So to say that we might understand something about the universe

  • is pretty limited to what we can actually

  • say given that the amount of information we know, the total breadth of knowledge

  • of humanity-- pretty small if you imagine that the breadth of knowledge

  • that actually exists is infinitely larger, or almost infinitely larger.

  • COLTON OGDEN: We're into metaphysical conversations here on--

  • NICK WONG: Yeah, very metaphysical, very philosophical.

  • COLTON OGDEN: --CS50.

  • Twitch.TV/CS50TV.

  • Let's see, make sure we didn't miss anything.

  • NICK WONG: Right.

  • COLTON OGDEN: "Colton, Nick, you're amazing.

  • Thank you for the great work you're doing."

  • NICK WONG: Thank you very much.

  • COLTON OGDEN: Thanks so much, BELLA.

  • Appreciate you coming in time and time again.

  • I think I've seen you on every stream so far, so thanks so much.

  • NICK WONG: Wow, that's awesome.

  • COLTON OGDEN: "Thanks for the live stream," says BHAVIK.

  • "I really enjoy being here."

  • Glad to have you, BHAVIK.

  • Also a regular.

  • David's heading out.

  • Thanks, David--

  • NICK WONG: Thank you, David.

  • COLTON OGDEN: --for joing us on the chat.

  • We got some thank yous for David there in the chat.

  • "Which IDE are you using?" says MARLINDO71.

  • NICK WONG: Visual Studio Code.

  • And I think BHAVIK_KNIGHT actually answered that right below.

  • COLTON OGDEN: Oh yes.

  • NICK WONG: But yeah, I'm using VSC.

  • It's one of my favorite IDEs.

  • I also sometimes switch to Adam if I'm feeling kind of fun.

  • But yeah, whatever works for you.

  • COLTON OGDEN: "How to choose the best partition

  • after preparing a Gini or Shannon tree?

  • How to identify the prediction exceptions?"

  • I can't say I know.

  • NICK WONG: OK.

  • So I don't know super well what exactly Gini and Shannon

  • trees are, though I would imagine that it's related to forest decision

  • problems in machine learning.

  • And I can explain a little bit of that.

  • It's basically you pass a set of data into a set of-- they're

  • called trees that make decisions.

  • And at first, they're random, and then they

  • start to really successfully choose based on decisions.

  • And this goes iteratively all the way through.

  • That's a very rough, leaving out a lot [INAUDIBLE]..

  • COLTON OGDEN: It's like a decision tree pipeline almost?

  • NICK WONG: Yeah.

  • I would think of it like that, and there's some nuance

  • that I'm pretty much omitting.

  • And I believe if you're talking about something along those lines,

  • then it's very similar in the decision to choose a rectified linear unit

  • versus a sigmoid versus different activations

  • then what we saw here in our linear models.

  • And that's roughly what you end up seeing.

  • But because I don't actually know what a Gini or Shannon tree is or are,

  • I can't necessarily speak to how you would actually identify the prediction

  • exceptions on those.

  • COLTON OGDEN: Cool.

  • Cool.

  • "Thank you," says TWITCHHELLOWORLD.

  • Astley says, "If we lived in a cartoon, we could run in the air

  • until we looked down."

  • NICK WONG: Possibly, yeah.

  • COLTON OGDEN: That's one of my biggest desires right there.

  • "VSCode.

  • If you are looking for an IDE to do Python stuff, PyCharm is very good.

  • I use PyCharm for Python."

  • I haven't used it myself--

  • NICK WONG: [INAUDIBLE].

  • COLTON OGDEN: --but I've heard the same thing, yeah.

  • People tend to really enjoy it.

  • Patrick Schmidt tends to really like that a lot.

  • NICK WONG: Oh, that's awesome.

  • COLTON OGDEN: "Love Professor Margot Seltzer's phrase, if you do good,

  • you will do well."

  • Yeah.

  • Do good, do well.

  • That's one phrase that actually definitely says.

  • It's very true, very true.

  • "Also relevant."

  • We have a Reddit post here.

  • I'm going to pull it up on my computer.

  • NICK WONG: Yeah, I was going to say, that's a dangerous--

  • COLTON OGDEN: DRUNKBEAR-- "What is happening here?"

  • OK, the number one data--

  • maybe if you want to pull it up, it looks OK.

  • NICK WONG: Oh, all right.

  • COLTON OGDEN: Pull it up on your web browser there.

  • NICK WONG: Let's see.

  • Twitch.tv.

  • COLTON OGDEN: Oh, actually-- oh, yeah, yeah.

  • You have to go to the chat, I guess.

  • NICK WONG: CS50TV.

  • COLTON OGDEN: Yeah.

  • And then I'll paste it in there again when you're in the chat.

  • NICK WONG: Beautiful.

  • Gotta love the Wi-Fi.

  • Anytime you are actively observing the Wi-Fi, it's millions of times slower.

  • But then if I look away, it's gone.

  • Look at that.

  • Cool.

  • Oh, this is going to be pretty meta.

  • You're going to be watching the stream.

  • Oh, well you get an ad.

  • I was going to say, you're going to be watching the stream--

  • COLTON OGDEN: Watching the stream within a stream kind of.

  • NICK WONG: --while it's streaming.

  • Yeah, kind of.

  • COLTON OGDEN: Kind of.

  • I think the chat is--

  • I think you have to expand that a little bit.

  • NICK WONG: I'm going to expand this a little bit.

  • There we go.

  • COLTON OGDEN: And there.

  • You should be there.

  • So let's paste that bad boy right in there.

  • NICK WONG: Beautiful.

  • All right.

  • So we get this beautiful gem.

  • It almost looks like an XKCD comic.

  • All right.

  • So "The number one data scientist excuse for legitimately slacking off."

  • COLTON OGDEN: Yeah.

  • That's like the old "my code's compiling" joke from back when--

  • NICK WONG: That's my favorite.

  • COLTON OGDEN: --back when programs took 30 minutes to an hour.

  • NICK WONG: My build.

  • COLTON OGDEN: Some probably still do take that long.

  • NICK WONG: I believe so.

  • COLTON OGDEN: Probably much longer than that.

  • Things like-- I don't know about Chrome, but maybe

  • Microsoft products like Word and Excel.

  • NICK WONG: Yeah.

  • They're known for taking forever.

  • COLTON OGDEN: Those probably take ages, right?

  • "What is happening here?"

  • DRUNKBEAR.

  • We are just recapping the conclusion to our machine learning stream here

  • with Nick Wong, but it's going to be on YouTube.

  • The VOD should be up on Twitch.

  • We built a simple binary classifier.

  • And then I guess this is a perfect way to close it off too with having the--

  • NICK WONG: Yeah, I figure we can loop all the way back around.

  • COLTON OGDEN: --the visual of Nick's custom little shell screensaver here,

  • which is really cool.

  • I don't know if maybe you want to link to how you got that if it's simple.

  • NICK WONG: Yeah.

  • Actually, I can link to what that is.

  • COLTON OGDEN: Because that's really cool.

  • NICK WONG: I can post that here.

  • Well, let me log into Twitch.

  • COLTON OGDEN: "Yeah, you're genius guys.

  • Gini and Shannons are algorithms to identify the best fitting model

  • to adopt so you can build a classifier."

  • Yeah it sounds like what you were saying.

  • NICK WONG: OK, I can't log in.

  • I can tell you what it is.

  • It's CMatrix is the program that actually runs.

  • I piped CMatrix into lolcat.

  • So I can actually show you what that looks like.

  • COLTON OGDEN: "Atom is a bit slow to start.

  • I like Vim's key binding very much, so I use it inside the IDE," says BHAVIK.

  • Yea, Atom is a bit slow.

  • Try VSCode out, see if that's a little bit faster.

  • I know it uses less memory than Atom does.

  • But the two are, I believe, both built on Electron,

  • so they both do use a bit more memory than they need to.

  • Well, not than they need to, but than is--

  • compared to other--

  • NICK WONG: Than standard-- yeah.

  • COLTON OGDEN: --IDEs or other text editors.

  • PUPPYLIONHD-- "When is the next stream?"

  • The next stream is tomorrow, so we'll have CS50's Bryan Yu, our head teaching

  • fellow, who'll be up here.

  • NICK WONG: Yeah, Brian Yu.

  • COLTON OGDEN: Shout out to Brian.

  • He'll be teaching us how to React, the basics of React.

  • And his web course is--

  • I think I pasted it earlier, but just to do it one more time.

  • He taught that course, a web course, so if you

  • want to maybe glance over some of the course materials there.

  • I don't recall offhand if he taught React in that class.

  • I know that Jordan's class did teach React or React

  • Native, which part of that was React.

  • "Recently installed VSCode."

  • Oh sorry, did you put the thing in the--

  • NICK WONG: I did.

  • Sorry.

  • I put the command up here.

  • It's this command is technically how to run.

  • Or I guess this is the actual command.

  • COLTON OGDEN: OK.

  • So they need to install a CMatrix and a lolcat, and then once they have that,

  • they can alias that in their Bash profile,

  • and then they're all set to go.

  • NICK WONG: Yeah.

  • And I like to call it rest because it's what I do when I'm resting

  • and I'm not actually teaching anything actively.

  • COLTON OGDEN: Nice.

  • Love it.

  • I love it.

  • It's nice.

  • NICK WONG: Cool.

  • COLTON OGDEN: We'll keep that on there for just a second.

  • NICK WONG: All right.

  • Sounds good.

  • COLTON OGDEN: But yeah, this was Nick Wong

  • building a binary classifier, a humble introduction to machine learning.

  • I didn't absorb probably more than 50% or 60% of it, but it's very inspiring.

  • NICK WONG: We went through it very fast.

  • Thank you, I appreciate it.

  • COLTON OGDEN: It looks like a deep rabbit hole.

  • Looks like a--

  • NICK WONG: Yeah, it just goes.

  • COLTON OGDEN: --lot of very interesting stuff, a lot of fancy terminology

  • that I would love to understand that the theoretical meaning of.

  • But machine learning powers so much of the modern world,

  • so it's nice to get a glimpse into how it works.

  • NICK WONG: Thank you for having me, Colton.

  • I really appreciate it.

  • It's been awesome.

  • COLTON OGDEN: Yeah.

  • No, it was a terrific--

  • NICK WONG: Looking forward to all of the future talks we get to do.

  • COLTON OGDEN: Terrific stream.

  • And join us again next week.

  • On next Friday, we'll have Nick talking to us about some of this fancy terminal

  • stuff.

  • He'll be showing us--

  • NICK WONG: Terminal tricks.

  • COLTON OGDEN: --basic Linux commands.

  • So a bit more of an entry level stream, some how do we use the command line?

  • What are some common commands?

  • Things like ls, cd, and maybe some other things like piping,

  • or however much time we have.

  • NICK WONG: Yeah.

  • I think we'll go for a little while.

  • COLTON OGDEN: Which is a great foundation

  • upon which we can build other streams and get people more fluent

  • in using the command line.

  • NICK WONG: Yeah.

  • COLTON OGDEN: "What password managers do you guys use?

  • I keep forgetting to ask that.

  • David Sir talked about it in cyber security lecture,

  • but I'd like some recommendations to try."

  • We use 1Password for CS50, so take a look at 1Password.

  • I don't know if you use--

  • NICK WONG: I use LastPass, but all very--

  • COLTON OGDEN: They're all probably more or less--

  • NICK WONG: They're pretty much the same, yeah.

  • COLTON OGDEN: --identical or feature compatible.

  • But yeah, using a password manager is a very handy tool.

  • NICK WONG: I just got a student discount for LastPass.

  • COLTON OGDEN: Oh, nice.

  • OK, a little bit of a interest there.

  • All right.

  • Well, thanks so much everybody.

  • This has been CS50 on Twitch, our seventh episode, Binary Classifier

  • with TensorFlow, Keras, and Python.

  • Thanks for everybody who's tuning in.

  • As always, if you have suggestions, definitely let us know.

  • We'll be joined again by Nick next week.

  • And this week, we'll be joined by Brian tomorrow for some React.

  • And then on Friday, we'll take a look at some Unity.

  • We'll make 3D Pong.

  • NICK WONG: Nice.

  • That's awesome.

  • COLTON OGDEN: So yeah, that'll be it.

  • Any last things you'd like bring up or mention?

  • NICK WONG: No.

  • Thank you, guys.

  • Really appreciate it.

  • I'll see you guys next Friday.

  • COLTON OGDEN: Cool.

  • See you, everybody.

  • Thanks so much again.

  • Looking forward to the next stream.

  • NICK WONG: Awesome.

COLTON OGDEN: All right.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

BINARY CLASSIFIER WITH TENSORFLOW - CS50 on Twitch、EP.7 (BINARY CLASSIFIER WITH TENSORFLOW - CS50 on Twitch, EP. 7)

  • 1 0
    林宜悉 に公開 2021 年 01 月 14 日
動画の中の単語