Placeholder Image

字幕表 動画を再生する

  • Alright. Hello everybody.

  • Hopefully you can hear me well.

  • Yes?

  • Yes.

  • Great!

  • So, welcome to Course 6.S094.

  • Deep Learning for Self-Driving Cars.

  • We will introduce to you the methods of deep learning,

  • of deep neural networks using the guiding case study of building self-driving cars.

  • My name is Lex Fridman.

  • You get to listen to me for a majority of these lectures

  • and I am part of an amazing team with some brilliant TAs.

  • Would you say brilliant?

  • (CHUCKLES)

  • Dan Brown.

  • You guys want to stand up?

  • They're in the front row.

  • Spencer, William Angell.

  • Spencer Dodd and all the way in the back.

  • The smartest and the tallest person I know, Benedict Jenik.

  • Well you see there on the left of the slide is a visualization of one of the two projects that one of the two simulations, games that we'll get to go through.

  • We use it as a way to teach you about deep reinforcement learning but also as a way to excite you.

  • By challenging you to compete against others

  • if you wish to in a special prize yet to be announced.

  • Super secret prize.

  • So you can reach me and the TA's at deepcars@MIT.edu if you have any questions about the tutorials, about the lecture, about anything at all.

  • The website cars.mit.edu has the lecture content.

  • Code tutorials, again like today, the lectures slides for today are already up in PDF form.

  • The slides themselves, if you want to see them just e-mail me but there are over a gigabyte in size because they're very heavy in videos so I'm just posting the PDS.

  • And there will be lecture videos available a few days after the lectures were given.

  • So speaking of which there is a camera in the back.

  • This is being videotaped and recorded but for the most part the camera is just on the speaker.

  • So you shouldn't have to worry.

  • If that kind of thing worries you then you could sit on the periphery of the classroom

  • or maybe I suggest sunglasses and a moustache, fake mustache, would be a good idea.

  • There is a competition for the game that you see on the left.

  • I'll describe exactly what's involved

  • in order to get credit for the course you have to

  • design a neural network that drives the car just above the speed limit sixty five miles an hour.

  • But if you want to win, we need to go a little faster than that.

  • So who's this class is for?

  • You may be new to programming,

  • new to machine learning,

  • new to robotics,

  • or you're an expert in those fields but want to go back to the basics.

  • So what you will learn is an overview of deep reinforcement learning,

  • of convolutional neural networks,

  • recurring neural networks

  • and how these methods can help improve each of the components of autonomous driving -

  • perception, visual perception, localization, mapping, control planning and the detection of driver state.

  • Okay, two projects.

  • Code named "DeepTraffic" is the first one.

  • There is, in this particular formulation of it,

  • there is seven lanes.

  • It's a top view.

  • It looks like a game but I assure you it's very serious.

  • It is the agent in red,

  • the car in red is being controlled by a neural network and we'll explain

  • how you can control and design the various aspects, the various parameters of this neural network

  • and it learns in the browser.

  • So this, we're using ConvNet.JS

  • which is a library that is programmed by Andrej Karpathy in javascript.

  • So amazingly we live in a world where you can train in a matter of minutes

  • a neural network in your browser.

  • And we'll talk about how to do that.

  • The reason we did this

  • is so that there is very few requirements to get you up and started with neural networks.

  • So in order to complete this project for the course,

  • you don't need any requirements except to have a Chrome browser.

  • And to win the competition you don't need anything except the Chrome browser.

  • The second project code name "DeepTesla"

  • or "Tesla"

  • is using data from a Tesla vehicle

  • of the forward road way

  • and using end-to-end learning

  • taking the image and putting into convolutional neural networks

  • that directly maps

  • "or aggressor" that maps to a steering angle.

  • So all it takes is a single image

  • and it predicts a steering angle for the car.

  • We have data for the car itself

  • and you get to build a neural network

  • that tries to do better,

  • tries to steer better or at least as good as the car.

  • Okay.

  • Let's get started with the question,

  • with the thing that we understand so poorly at this time

  • because it's so shot in mystery

  • but it fascinates many of us.

  • And that is the question of: "What is intelligence?"

  • This is from a March 1996 Time magazine.

  • And the question: "Can machines think?"

  • is answered below with, "they already do."

  • So what if anything is special about the human mind?

  • It's a good question for 1996,

  • a good question for 2016,

  • 2017 now,

  • and the future.

  • And there's two ways to ask that question.

  • One is the special purpose version.

  • Can an artificial intelligence system achieve a well defined,

  • specifically, formally defined finite set of goals?

  • And this little diagram

  • from a book that got me into artificial intelligence as a bright-eyed high school student

  • they are artificial intelligence to modern approach.

  • This is a beautifully simple diagram of a system.

  • It exists in an environment.

  • It has a set of sensors that do the perception.

  • It takes those sensors in.

  • It does something magical.

  • There's a question mark there.

  • And with a set of affectors acts in the world, manipulates objects in that world,

  • and so special purpose.

  • We can,

  • under this formulation,

  • as long as the environment is formally defined,

  • well defined;

  • as long as a set of goals are well defined.

  • As long as the set of actions,

  • sensors,

  • and the ways that the perception carries itself out as well defined.

  • We have good algorithms

  • which will talk about

  • that can optimize for those goals.

  • The question is,

  • if we inch along this path,

  • will we get closer to the general formulation,

  • to the general purpose version of what artificial intelligence is?

  • Can it achieve poorly defined,

  • unconstrained set of goals

  • with an unconstrained, poorly defined set of actions

  • and unconstrained, poorly defined utility functions rewards.

  • This is what human life is about.

  • This is what we do pretty well most days.

  • Exist in an undefined, full of uncertainty, world.

  • So, okay.

  • We can separate tasks into three different, categories,

  • formal tasks.

  • This is the easiest.

  • It doesn't seem so, it didn't seem so at the birth of artificial intelligence

  • but that's in fact true if you think about it.

  • The easiest is the formal tasks,

  • playing board games, theory improving.

  • All the kind of mathematical logic problems that can be formally defined.

  • Then there is the expert tasks.

  • So this is where a lot of the exciting breakthroughs have been happening

  • where machine learning methods,

  • data driven methods,

  • can help aid or improve on

  • the performance of our human experts.

  • This means medical diagnosis, hardware design,

  • scheduling,

  • and then there is the thing that we take for granted.

  • The trivial thing.

  • The thing that we do so easily every day when we wake up in the morning.

  • The mundane tasks of everyday speech,

  • of written language,

  • of visual perception,

  • of walking which we'll talk about in today's lecture

  • is a fascinatingly difficult task

  • on object manipulation.

  • So the question is that we're asking here,

  • before we talk about deep learning,

  • before we talk about the specific methods,

  • we really want to dig in and try to see what is it about driving,

  • how difficult is driving.

  • Is it more like chess which you see on the left there

  • where we can formally define a set of lanes,

  • a set of actions and formulate it as there's five set of actions - you can change your lane,

  • you can avoid obstacles.

  • You can formally define an obstacle.

  • You can the formally define the rules of the road.

  • Or is there something about natural language,

  • something similar to everyday conversation about driving

  • that requires a much higher degree of reasoning,

  • of communication,

  • of learning,

  • of existing in this under-actuated space.

  • Is it a lot more than just left lane,

  • right lane,

  • speed up,

  • slow down?

  • So let's look at it as a chess game.

  • Here's the chess pieces.

  • What are the sensors we get to work with on an autonomous vehicle?

  • And we get a lot more in-depth on this

  • especially with the guest speakers

  • who built many of these.

  • There's radar.

  • There's the Rays sensors.

  • Radar lidar.

  • They give you information about the obstacles in their environment.

  • They'll help localize the obstacles in the environment.

  • There's the visible light camera

  • and stereo vision that gives you texture information,

  • that helps you figure out not just where the obstacles are

  • but what they are,

  • helps to classify those,

  • has to understand their subtle movements.

  • Then there is the information about the vehicle itself,

  • about the trajectory and the movement of the vehicle that comes from the GPS

  • an IMU sensors.

  • And there is the rich state of the vehicle itself.

  • What is it doing?

  • What are all the individual systems doing

  • that comes from the canned network.

  • And there is one of the less studied

  • but fascinating to us on the research side is audio.

  • The sounds of the road

  • that provide the rich context

  • of a wet road.

  • The sound of a road that when it stop raining

  • but it's still wet,

  • the sound that it makes.

  • The screeching tire

  • and honking.

  • These are all fascinating signals as well.

  • And the focus of the research in our group,

  • the thing that's really much

  • under-investigated

  • is the internal facing sensors.

  • The driver,

  • sensing the state of the driver,

  • were they looking?

  • Are they sleepy?

  • The emotional state.

  • Are they in the seat at all?

  • And the same with audio.

  • That comes from the visual information and the audio information.

  • More than that.

  • Here are the tasks.

  • If you were to break into modules the tasks

  • of what it means to build a self-driving vehicle.

  • First, you want to know where you are.

  • Where am I.

  • Localization and mapping.

  • You want to map the external environment.

  • Figure out where all the different

  • obstacles are,

  • all the entities are,

  • and use that estimate of the environment

  • to then figure out where I am,

  • where the robot is.

  • Then there is scene understanding.

  • It's understanding not just the positional aspects

  • of the external environment and the dynamics of it

  • but also what those entities are.

  • Is it a car? Is it a pedestrian?

  • Is it a bird?

  • There is movement planning.

  • Once you have kind of figured out to the best of your abilities

  • your position and the position of other entities in this world,

  • it's figuring out a trajectory through that world.

  • And finally,

  • once you've figured out how to move about safely

  • and effectively through the world

  • it's figuring out what the human that's on board is doing

  • because as I will talk about

  • the path to a self-driving vehicle

  • and that is, hence, our focus on Tesla

  • may go through semi-autonomous vehicles.

  • Where the vehicle must not only drive itself

  • but effectively hand over control

  • from the car

  • to the human

  • and back.

  • Ok, quick history.

  • Well, there's a lot of fun stuff from the eighty's and ninety's but

  • the big breakthroughs came in the second DARPA Grand Challenge

  • with Stanford Stanley,

  • when they won the competition.

  • One of five cars that finished.

  • This was an incredible accomplishment in a desert race.

  • A fully autonomous vehicle was able to complete the race

  • in record time.

  • The DARPA Urban Challenge in 2007

  • where the task was no longer a race to the desert

  • but through an urban environment

  • and CMU's "Boss" with GM won that race

  • and a lot of that work went directly into the

  • acceptance and large major industry players

  • taking on the challenge of building these vehicles.

  • Google, now "Waymo" self-driving car.

  • Tesla with its "Autopilot" system and now "Autopilot 2" system.

  • Uber with its testing in Pittsburgh.

  • And there's many other companies

  • including one of the speakers for this course

  • of nuTonomy

  • that are driving the wonderful streets of Boston.

  • Ok. So let's take a step back.

  • We have, if we think about the accomplishments in the DARPA Challenge,

  • and if you look at the accomplishments of the Google self-driving car

  • which essentially boils the world down into a chess game.

  • It uses incredibly accurate sensors

  • to build a three dimensional map of the world,

  • localize itself effectively in that world

  • and move about that world

  • in a very well-defined way.

  • Now, what if driving...

  • The open question is: if driving is more like a conversation,

  • like in natural language conversation,

  • how hard is it to pass the Turing Test?

  • The Turing Test,

  • as the popular current formulation is,

  • can a computer be mistaken for a human being

  • in more than thirty percent of the time?

  • When a human is talking behind a veil,

  • having a conversation with their computer or a human,

  • can they mistake the other side of that conversation

  • for being a human when it's in fact a computer.

  • And the way you would, in a natural language,

  • build a system that has successfully passes the Turing Test is,

  • the natural language processing part

  • to enable it to communicate successfully?

  • So, general language and interpret language,

  • then you represent knowledge the state of the conversation

  • transferred over time.

  • And the last piece and this is the hard piece,

  • is the automated reasoning,

  • is reasoning.

  • Can we teach machine learning methods to reason?

  • That is something that will propagate through our discussion

  • because as I will talk about the various methods,

  • the various deep learning methods,

  • neural networks are good at learning from data

  • but they're not yet, there is no good mechanism for reasoning.

  • Now reasoning could be just something

  • that we tell ourselves we do to feel special.

  • Better to feel like we're better than machines.

  • Reasoning may be simply

  • something as simple as learning from data.

  • We just need a larger network.

  • Or there could be a totally different mechanism required

  • and we'll talk about the possibilities there.

  • Yes.

  • (Inaudible question from one of the attendees)

  • No, it's very difficult to find these kind of situations in the United States.

  • So the question was,

  • for this video, is it in the United States or not?

  • I believe it's in Tokyo.

  • So India, as is a few European countries, are much more towards the direction

  • of natural language versus chess.

  • In the United States, generally speaking, we follow rules more concretely.

  • The quality of roads is better.

  • The marking on the roads is better.

  • So there's less requirements there.

  • (Inaudible question from one of the attendees)

  • These cars are are driving on one side?

  • I see.

  • I just- Okay, you're right.

  • It is because, yeah-

  • So, but it's certainly not the United States.

  • I spent quite a bit of googling

  • trying to find in the United States and it is difficult.

  • So let's talk about

  • the recent breakthroughs in machine learning

  • and what is at the core of those breakthroughs

  • is neural networks

  • that have been around for a long time

  • and I will talk about what has changed.

  • What are the cool new things

  • and what hasn't changed

  • and what are its possibilities.

  • But first a neuron, crudely,

  • is a computational building block of the brain.

  • I know there's a few folks here, neuroscience folks,

  • this is hardly a model.

  • It is mostly an inspiration

  • and so the human neuron

  • has inspired the artificial neuron

  • the computational building block of a neural network,

  • of an artificial neural network.

  • I have to give you some context.

  • These neurons,

  • for both artificial and human brains,

  • are interconnected.

  • And the human brain,

  • there's about, I believe 10,000 outgoing connections from every neuron

  • on average and they're interconnected to each other,

  • are the largest current, as far as I'm aware,

  • artificial neural network, has 10 billion of those connections.

  • Synapses.

  • Our human brain, to the best estimate that I'm aware of,

  • has 10,000X that.

  • So one hundred to one thousand trillion synapses.

  • Now what is an artificial neuron?

  • That is the building block of a neural network.

  • It takes a set of inputs.

  • It puts a weight on each of those inputs, sums them together,

  • applies a bias value on each neuron

  • and using an activation function

  • that takes its input,

  • that sum plus the bias and it squishes it together

  • to produce a zero to one signal.

  • And this allows us a single neuron

  • to take a few inputs and produces an output

  • a classification for example, a zero one.

  • And then we'll talk about, simply, it can

  • serve as a linear classifier

  • so it can draw a line.

  • It can learn to draw a line between, like what you'd seen here,

  • between the blue dots and the yellow dots.

  • And that's exactly what we'll do in the iPython Notebook that I'll talk about

  • but the basic algorithm is you initialize the weights

  • on the inputs and you compute the output.

  • You perform this previous operation I talked about sum up

  • and compute the output.

  • And if the output does not match the ground truth,

  • The expected output, the output it should produce,

  • the weights are punished accordingly

  • and will talk through a little bit of the math of that.

  • And this process is repeated until the perceptron does not make any more mistakes.

  • Now here's the amazing thing about neural networks.

  • There are several and I'll talk about them.

  • One on the mathematical side is the universality of neural networks

  • with just a single layer if you stack them together, a single hidden layer,

  • the inputs on the left, the outputs on the right.

  • And in the middle there is a single hidden layer,

  • it can closely approximate any function. Any function.

  • So this is an incredible property

  • that with a single layer any function you could think of,

  • that you could think of driving as a function.

  • It takes its input,

  • the world outside as output

  • to control the vehicle.

  • There exists a neural network out there that can drive perfectly.

  • It's a fascinating mathematical fact.

  • So we can think of this then these functions as a special purpose function,

  • special purpose intelligence.

  • You can take, say as input,

  • the number of bedrooms, the square feet,

  • the type of neighborhood.

  • Those are the three inputs.

  • It passes that value through to the hidden layer.

  • And then one more step.

  • It produces the final price estimate for the house or for the residence.

  • And we can teach a network to do this pretty well in a supervised way.

  • This is supervised learning.

  • You provide a lot of examples

  • where you know the number of bedrooms, the square feet,

  • the type of neighborhood

  • and then you also know the final price of the house or the residence.

  • And then you can, as I'll talk about through a process of back propagation,

  • teach these networks to make this prediction pretty well.

  • Now some of the exciting breakthroughs recently

  • have been in the general purpose intelligence.

  • This is is from Andrej Karpathy who is now at OpenAI.

  • I would like to take a moment here to try to explain how amazing this is.

  • This is a game of "pong".

  • If you're not familiar with "pong", there are two paddles

  • and you're trying to bounce the ball back

  • and in such a way that prevents the other guy from bouncing the ball back at you.

  • The artificial intelligence agent is on the right in green

  • and up top is the score 8-1.

  • Now this takes about three days to train

  • on a regular computer, this network.

  • What is this network doing?

  • It's called the Policy Network.

  • The input is the raw pixels.

  • There's slightly a process and also you take the difference between two frames

  • but it's basically the raw pixel information.

  • That's the input.

  • There's a few hidden layers

  • and the output is the single probability of moving up.

  • That's it. That's the whole system and what it's doing is, it learns.

  • You don't know at any one moment,

  • you don't know what the right thing to do is.

  • Is it to move up? Is it's moved down?

  • You only know what the right thing to do is

  • by the fact that eventually you win or lose the game.

  • So this is the amazing thing here is, there's no supervised learning.

  • There's no universal fact about anyone stay being good or bad.

  • And anyone actually being good or bad in the state

  • but if you punish or reward every single action you took,

  • every single action you took, for an entire game

  • based on the result. So no matter what you did, if you won the game,

  • the end justifies the means.

  • If you won the game, every action you took in every every action state pair gets rewarded.

  • If you lost the game, it gets punished.

  • And this process, with only two hundred thousand games

  • where the system just simulates the games, it can learn to beat the computer.

  • This system knows nothing about "pong", nothing about games,

  • this is general intelligence.

  • Except for the fact, that it's just a game "pong".

  • And I will talk about how this can be extended further,

  • why this is so promising

  • and why we should proceed with caution.

  • So again, there's a set of actions you take up, down, up, down,

  • based on the output of the network.

  • There's a threshold given the probability of moving up,

  • you move up or down based on the output of the network.

  • And you have a set of states

  • and every single state action pair is rewarded if there's a win

  • and it's punished if there's a loss.

  • When when you go home, think about how amazing that is

  • and if you don't understand why that's amazing,

  • spend some time on it.

  • It's incredible.

  • (Inaudible question from one of the attendees)

  • Sure, sure thing.

  • The question was: "What is supervised learning?

  • What is unsupervised learning? What's the difference?"

  • So supervised learning is,

  • when people talk about machine learning they mean supervised learning most of the time.

  • Supervised learning is

  • learning from data, is learning from example.

  • When you have a set of inputs and a set of outputs that you know are correct or

  • called Ground Truth.

  • So you need those examples, a large amount of them,

  • to train any of the machine learning algorithms

  • to learn to then generalize that to future examples.

  • Actually, there's a third one called Reinforcement Learning where the Ground Truth is sparse.

  • The information about when something is good or not,

  • the ground truth only happens every once in a while, at the end of the game.

  • Not every single frame.

  • And unsupervised learning is when you have no information

  • about the outputs.

  • They are correct or incorrect.

  • And it is the excitement of the deep learning community is unsupervised learning,

  • but it has achieved no major breakthroughs at this point.

  • I'll talk about what the future of deep learning is

  • and a lot of the people that are working in t he field are excited by it.

  • But right now, any interesting accomplishment has to do with supervised learning.

  • (Partially inaudible question from one of the attendees)

  • And the wrong one is just has the [00:33:29] (Inaudible) solution like looking at the philosophy.

  • So basically, the reinforcement learning here is learning from somebody who has certain hopes

  • and how can that be guaranteed that it would generalize to somebody else?

  • So the question was this:

  • the green paddle learns to play this game successfully

  • against this specific one brown paddle operating under specific kinds of rules.

  • How do we know it can generalize to other games, other things and it can't.

  • But the mechanism by which it learns generalizes.

  • So as long as you let it play,

  • as long as you let it play in whatever world you wanted it to succeed in long enough,

  • it will use the same approach to learn to succeed in that world.

  • The problem is this works for worlds you can simulate well.

  • Unfortunately, one of the big challenges of neural networks

  • is they're not currently efficient learners.

  • We need a lot of data to learn anything.

  • Human beings need one example often times

  • and they learn very efficiently from that one example.

  • And again I'll talk about that as well, it's a good question.

  • So the drawbacks of neural networks.

  • So if you think about the way a human being would approach this game,

  • this game of "pong", it would only need a simple set of instructions.

  • You're in control of a paddle and you can move it up and down.

  • And your task is to bounce the ball past the other player controlled by AI.

  • Now the human being would immediately, they may not win the game

  • but they would immediately understand the game

  • and would be able to successfully play it well enough

  • to pretty quickly learn to beat the game.

  • But they would need to have a concept of control.

  • What it means to control a paddle, need to have a concept of a paddle,

  • need to have a concept of moving up and down

  • and a ball and bouncing,

  • they have to know, they have to have at least a loose concept of real world physics

  • that they can then project that real world physics on to the two dimensional world.

  • All of these concepts are concepts that you come to the table with.

  • That's knowledge.

  • And the kind of way you transfer that knowledge from your previous experience,

  • from childhood to now when you come to this game,

  • that something is called reasoning.

  • Whatever reasoning means.

  • And the question is whether through this same kind of process,

  • you can see the entire world as a game of "pong"

  • and reasoning is simply the ability to simulate that game in your mind

  • and learn very efficiently, much more efficiently, than 200,000 innovations.

  • The other challenge of deep neural networks and machine learning broadly

  • is you need big data and efficient learners as I said.

  • And that data also need to be supervised data.

  • You need to have Ground Truth which is very costly for annotation.

  • A human being looking at a particular image, for example,

  • and labeling that as something as a cat or dog,

  • whatever objects is in the image,

  • that's very costly.

  • And particularly for neural networks there's a lot of parameters to tune.

  • There's a lot of hyper-parameters.

  • You need to figure out the network structure first.

  • How does this network look, how many layers?

  • How many hidden nodes?

  • What type of activation function for each node?

  • There's a lot of hyper-parameters there

  • and then once you've built your network,

  • there's parameters for how you teach that network.

  • There's learning rate, loss function - meaning bad size -

  • number of training iterations, gradient updates moving

  • and selecting even the optimizer with which

  • you solve the various differential equations involved.

  • It's a topic of many research paper, certainly it's rich enough for research papers,

  • but it's also really challenging.

  • It means you can't just pop the network down

  • it will solve the problem generally.

  • And defining a good lost function,

  • or in the case of "pong" or games,

  • a good reward function is difficult.

  • So here's a game, this is a recent result from OpenAI,

  • I'm teaching a network to play the game of coast runners.

  • And the goal of coast runners

  • is you're in a boat the task is to go around the track

  • and successfully complete a race against other people you're racing against.

  • Now this network is an optimal one.

  • And what is figured out that actually in the game,

  • it gets a lot of points for collecting certain objects along the path.

  • So you see it's figured out to go in a circle and collect those those green turbo things.

  • And what is figured out is you don't need to complete the game to earn the award.

  • And despite being on fire and hitting the wall and going through this whole process,

  • it's actually achieved at least the local optima

  • given the reward function of maximizing the number of points.

  • And so it's figured out a way to earn a higher reward

  • while ignoring the implied bigger picture goal of finishing the race

  • which us as humans understand much better.

  • This raises, for self-driving cars, ethical questions.

  • Besides other quick questions.

  • (CHUCKLING)

  • We could watch this for hours and it will do that for hours and that's the point:

  • It's hard to teach, it's hard to encode the formally defined utility function under which

  • an intelligent system needs to operate.

  • And that's made obvious even in a simple game.

  • And so what is - Yup, question.

  • (Inaudible question from one of the attendees)

  • So the question was: "what's an example of a local optimum that an autonomous car,

  • similar to the cost racer, what would be the example in the real world for an autonomous vehicle?

  • And it's a touchy subject.

  • But it would certainly have to be involved

  • the choices we make under near crashes and crashes.

  • The choices a car makes want to avoid.

  • For example, if there's a crash imminent

  • and there's no way you can stop

  • to prevent the crash, do you keep the driver safe

  • or do you keep the other people safe.

  • And there has to be some, even if you don't choose to acknowledge it,

  • even if it's only in the data and the learning that you do,

  • there's an implied reward function there.

  • And we need to be aware of that reward function is

  • because it may find something.

  • Until you actually see it, we won't know it.

  • Once we see it, we realize that oh that was a bad design

  • and that's the scary thing.

  • It's hard to know ahead of time what that is.

  • So the recent breakthroughs from deep learning came several factors.

  • First is the compute, Moore's Law.

  • CPUs are getting faster, hundred times faster, every decade.

  • Then there's GPU use.

  • Also the ability to train neural networks and GPUs and now ASICs

  • has created a lot of capabilities in terms of energy efficiency

  • and being able to train larger networks more efficiently.

  • Well, first of all in the in the 21st Century there's digitized data.

  • There's larger data sets of digital data

  • and now there is that data is becoming more organized,

  • not just vaguely available data out there on the internet,

  • it's actual organized data sets like Imagenet.

  • Certainly for natural languages there's large data sets.

  • There is the algorithm innovations, Backprop.

  • Back propagation, Convolutional Neural Networks, LSTMs.

  • All these different architectures for dealing with specific types of domains and tasks.

  • There is the huge one, is infrastructure.

  • It's on the software and the hardware side.

  • There's Git, Ability to Share and Open Source Way software.

  • There are pieces of software that make robotics and make machine learning easier.

  • ROS, TensorFlow.

  • There is Amazon Mechanical Turk

  • which allows for efficient, cheap annotation of large scale data sets.

  • As AWS and the cloud hosting, machine learning hosting the data and the compute.

  • And then there's a financial backing of large companies - Google, Facebook, Amazon.

  • But really nothing is changed.

  • There really has not been any significant breakthroughs.

  • Convolutional networks have been around since the 90s,

  • neural networks has been around since the 60s.

  • There's been a few improvements

  • but the hope is, that's in terms of methodology,

  • the compute has really been the work horse.

  • The ability to do the hundred fold improvement every decade,

  • holds promise and the question is whether that reasoning thing I talked about,

  • all you need is a larger network.

  • That is the open question.

  • Some terms for deep learning.

  • First of all deep learning, is a PR term for neural networks.

  • It is a term for utilising deep neural networks

  • for neural networks to have many layers.

  • It is symbolic term for the newly gained capabilities that compute has brought us.

  • That training on GPUs have brought us.

  • So deep learning is a subset of machine learning.

  • There's many other methods that are still effective.

  • The terms that will come up in this class is, first of all, Multilayer Perceptron (MLP)

  • Deep neural networks (DNN), Recurrent neural networks (RNN),

  • LSTM (Long Short-Term Memory) Networks, CNN and ConvNet (Convolutional neural networks),

  • Deep Belief Networks.

  • And the operational come up is Convolutional, Pooling, Activation functions and Backpropagation.

  • Yes, you've got a question?

  • (Inaudible question from one of the attendees)

  • So the question was, what is the purpose of the different layers in neural network?

  • What is the need of one configuration versus another?

  • So a neural network, having several layers,

  • it's the only thing you have an understanding of, is the inputs and the outputs.

  • You don't have a good understanding about what these layer does.

  • They are mysterious things, neural networks.

  • So I'll talk about how, with every layer, it forms a higher level.

  • A higher order representation of the input.

  • So it's not like the first layer does localization,

  • the second layer does path planning,

  • the third layer does navigation - how you get from here to Florida -

  • or maybe it does, but we don't know.

  • So we know we're beginning to visualize neural networks for simple tasks

  • like for ImageNet classifying cats versus dogs.

  • We can tell what is the thing that the first layer does, the second layer, the third layer

  • and we look at that.

  • But for driving, as the input provide just the images the output the steering.

  • It's still unclear what you learned

  • partially because we don't have neural networks that drive successfully yet.

  • (Points to a member of the class)

  • (Inaudible question)

  • So the question was, does a neural network generate layers over time, like does it grow it?

  • That's one of the challenges, that a neural network is pre-defined.

  • The architecture, the number of nodes, the number of layers. That's all fixed.

  • Unlike the human brain where the neurons die and are born all the time.

  • A neural Network is pre-specified, that's it.

  • That's all you get and if you want to change that,

  • you have to change that and then retrain everything.

  • So it's fixed.

  • So what I encourage you is to proceed with caution

  • because there's this feeling when you first teach a network with very little effort,

  • how to do some amazing tasks like classify a face versus non-face,

  • or your face versus other faces or cats versus dogs, its an incredible feeling.

  • And then there's definitely this feeling that I'm an expert

  • but what you realize is we don't actually understand how it works.

  • And getting it to perform well for more generalized task,

  • for larger scale data sets, for more useful applications,

  • requires a lot of hyper-parameter tuning.

  • Figuring out how to tweak little things here and there

  • and still in the end, you don't understand why it work so damn well.

  • So deep learning, these deep neural network architectures is representation learning.

  • This is the difference between traditional machine learning methods where,

  • for example, for the task of having an image here is the input.

  • The input to the network here is on the bottom, the output up on top,

  • and the input is a single image of a person in this case.

  • And so the input, specifically, is all the pixels in that image.

  • RGB, the different colors of the pixels in the image.

  • And over time, what a network does is build a multiverse solutional representation of this data.

  • The first layer learns the concept of edges, for example.

  • The second layer starts to learn composition of those edges, corners, contours.

  • Then it starts to learn about object parts.

  • And finally, actually provide a label for the entities that are in the input.

  • And this is the difference in traditional machine learning methods

  • where the concepts like edges and corners and contours

  • are manually pre-specified by human beings, human experts, for that particular domain.

  • And representation matters because figuring out a line

  • for the Cartesian coordinates of this particular data set

  • where you want to design a machine learning system

  • that tells the difference between green triangles and blue circles is difficult.

  • There is no line that separates them cleanly.

  • And if you were to ask a human being, a human expert in the field.

  • to try to draw that line they would probably do a Ph. D. on it and still not succeed.

  • But a neural network can automatically figure out

  • to remap that input into polar coordinates

  • where the representation is such that it's an easily, linearly separable data set.

  • And so, deep learning is a subset of representation learning,

  • is a subset of machine learning and a key subset artificial intelligence.

  • Now, because of this,

  • because of its ability to compute an arbitrary number of features

  • that are at the core of the representation.

  • So if you are trying to detect a cat in an image,

  • you're not specifying 215 specific features of cat ears and whiskers and so on

  • that a human expert will specify you allow and you'll know

  • it discover tens of thousands of such features,

  • which maybe for cats you are an expert

  • but for a lot of objects you may never be able to sufficiently provide the features

  • which successfully will be used for identifying the object.

  • And so, this kind of representation learning,

  • one is easy in the sense that all you have to provide is inputs and outputs.

  • All you need to provide is a data set the care about without [00:53:39] features.

  • And two, because of it's ability to construct arbitrarily sized representations,

  • deep neural networks are hungry for data.

  • The more data we give them,

  • the more they are able to learn about this particular data set.

  • So let's look at some applications.

  • First, some cool things that deep neural networks have been able to accomplish up to this point.

  • Let me go through them.

  • First, the basic one.

  • AlexNet is for- ImageNet is a famous data set and a competition of classification,

  • localization where the task is given an image,

  • identify what are the five most likely things in that image

  • and what is the most likely and you have to do so correctly.

  • So on the right, there's an image of a leopard

  • and you have to correctly classify that that is in fact the leopard.

  • So they're able to do this pretty well given a specific image.

  • Determine that it's a leopard.

  • And we started, what's shown here on the x-axis is years

  • on the y-axis is error in classification.

  • So starting from 2012 on the left with AlexNet and today

  • the errors decreased from 16% and 40% before then with traditional methods

  • have decreased to <4%.

  • So human level performance,

  • if I were to give you this picture of a leopard

  • is a 4% of those pictures of leopards you would not say it's a leopard.

  • That's human level performance.

  • So for the first time in 2015, convolutional neural networks are performed human beings.

  • That in itself is incredible. That is something that seemed impossible.

  • And now is because it's done is not as impressive.

  • But I just want to get to why this is so impressive

  • because computer vision is hard.

  • Now we as human beings have evolved visual perception over millions of years,

  • hundreds of millions of years.

  • So we take it for granted but computer vision is really hard, visual perception is really hard.

  • There's illumination variability.

  • So it's the same object.

  • The only way we are telling you a thing is from the shade, the reflection of light from that surface.

  • It could be the same object with drastically, in terms of pixels,

  • drastically different looking shapes and we still know it's the same object.

  • There is post-variability in occlusion.

  • Probably my favorite caption for an image

  • for a figure in a academic paper is deformable and truncated cat.

  • These are pictures, you know cats are famously deformable.

  • They can take a lot of different shapes.

  • (LAUGHTER)

  • Its arbitrary poses are possible so you have to have computer vision

  • to know it's still the same objects, still the same class of objects,

  • given all the variability in the pose and occlusions is a huge problem.

  • We still know it's an object.

  • We still know it's a cat even when parts of it are not visible.

  • And sometimes large parts of it are not visible.

  • And then there's all the inter-class variability.

  • Inter-class, all of these on the top two rows are cats.

  • Many of them look drastically different.

  • And the top bottom two rows are dogs also look drastically different.

  • And yet some of the dogs look like cats,

  • some of the cats look like dogs.

  • And as human beings are pretty good at telling the difference

  • and we want computer vision to do better than that.

  • It's hard. So how is this done? This is done with convolutional neural networks.

  • The input to which is a raw image.

  • Here's an input on the left of a number three

  • and I'll talk about through convolutional layers

  • that image is processed past through convolutional layers

  • maintain spatial information.

  • On the output, in this case predicts which of the images

  • what number is shown in the image.

  • 0, 1, 2 through 9.

  • And so, these networks, everybody's using the same kind of network to determine exactly that.

  • Input is an image, output is a number.

  • And in the case of probability, that is a leopard. What is that number?

  • Then there is segmentation built on top of these convolution neural networks

  • where you chop off the end and convolutionise the network.

  • You chop off the end where the output is a heat map.

  • So you can have, instead of a detector for a cat, you can do a cat heat map

  • where it's the part of the image, the output heat map gets excited,

  • the neurons in that output get excited

  • in the spatially excited, in the parts of the image that contain a tabby cat.

  • And this kind of process can be used to segment the image into different objects, a horse.

  • So the original input on the left is a woman on a horse

  • and the output is a fully segmented image of knowing where is the woman, where is the horse.

  • And this kind of process can be used for object detection

  • which is the task of detecting an object in an image.

  • Now the traditional method with convolutional neural networks

  • and in general computer vision is the sliding window approach.

  • We have a detector, like the leopard detector, where you slide through the image

  • to find where in that image is the leopard.

  • This, the segmenting approach,

  • the R-CNN approach, is efficiently segmenting the image

  • in such a way that it can propose different parts of the image

  • that are likely to have a leopard, or in this case a cowboy,

  • and that drastically reduces the computational requirements of the object detection task.

  • And so these networks, this is currently one of the best networks for the ImageNet task of localization

  • is the Deep residual networks. They're deep. So VGG-19 is one of the famous ones.

  • You started to get above twenty layers in many cases,

  • thirty four layers is the rise in that one.

  • So the lesson there is, the deeper you go the more representation power you have,

  • the higher accuracy but you need more data.

  • Other applications, colorization of images.

  • So this again, input is a single image and output is a single image.

  • So you can take a black and white video from a film, from an old film,

  • and recolor it. And all you need to do to train that network in the supervised way

  • is provide modern films and convert them to grayscale.

  • So now you have arbitrarily sized data sets, data sets of gray scale to color.

  • And you're able to, with very little effort on top of it, to successfully

  • well, somewhat successful recolor images.

  • Again, Google Translate does image translation in this way, image to image.

  • It first perceives, here in German I believe, famous German correct me if I'm wrong,

  • dark chocolate written in German on a box.

  • So this can take this image, detect different letters convert them to text,

  • translate the text and then using the image to image mapping

  • map the letters, the translated letters, back onto the box and you could do this in real time on video.

  • So what we've talked about up to this point on the left are "vanilla" neural networks,

  • convolutional neural networks, that map a single input, a single output,

  • a single image to a number, single image another image.

  • Then there is recurrent neural networks, the map.

  • This is the more general formulation,

  • they map a sequence of images

  • or a sequence of words

  • or a sequence of any kind to another sequence.

  • And these networks are able to do incredible things with natural language,

  • with video, and any type of series of data.

  • For example, you can convert text to hand written digits, with hand written text.

  • Here, you type in and you can do this online, type in deep learning for self-driving cars

  • and it will use an arbitrary handwriting style to generate the words "deep learning for self-driving cars".

  • This is done using recurring neural networks.

  • We can also take Char-RNNs they're called, it's character level recurring neural networks

  • that train on a data set

  • an arbitrary text data set and learn to generate text one character at a time.

  • So there is no preconceived syntactical semantic structure that's provided to the network.

  • It learns that structure.

  • So for example, you can train it on Wikipedia articles like in this case.

  • And it's able to generate successfully not only text that makes some kind of grammatical sense at least

  • but also keep perfect syntactic structure for Wikipedia, for Markdown, editing,

  • for late tack editing and so on.

  • This text as "naturalism and decision for the majority of Arab countries capitalide."

  • Whatever that means, "was grounded by the Irish language by John Clare," and so on.

  • These are sentences. If you didn't know better, that might sound correct.

  • And it does so and you pause one character at a time so these aren't words being generated.

  • This is one character, you start with the beginning three letters "nat",

  • you generate "u" completely without knowing of the word naturalism.

  • This is incredible.

  • You can do this to start a sentence and let the neural network complete that sentence.

  • So for example if you start the sentence with "life is" or "life is about" actually,

  • it will complete it with a lot of fun things. "The weather." "Life is about kids."

  • "Life is about the true love of Mr Mom", "is about the truth now."

  • And this is from [01:05:59], the last two,

  • if you start with "the meaning of life," it can complete that with

  • "the meaning of life is literary recognition" may be true for some of us here.

  • Publish or perish.

  • And "the meaning of life is the tradition of ancient human reproduction."

  • (LAUGHTER)

  • Also true for some of us here. I'm sure.

  • Okay, so what else can you do?

  • You can, this has been very exciting recently is image capture recognition. No, generation, I'm sorry.

  • Image capture generation is important for large data sets of images.

  • What we want to be able to determine what's going on inside those images.

  • Specially for search, if you want to find a man sitting in a college with a dog,

  • you type it into Google and it's able to find that.

  • So here shown in black text a man sitting on a couch with a dog is generated by the system.

  • A man sitting in a chair with a dog in his lap is generated by a human observer.

  • And again these annotations are done by detecting the different obstacles,

  • the different objects in the scene.

  • So segmenting the scene detecting on the right there's a woman, a crowd, a cat,

  • a camera, holding, purple.

  • All of these words are being detected then a syntactically correct sentence is generated,

  • a lot of them, and then you order which sentence is the most likely.

  • And in this way you can generate very accurate labeling of the images,

  • captions for the images.

  • And you can do the same kind of process for image question answering.

  • You can ask how many for quantity, how many chairs are there?

  • You can ask about location, where are the ripe bananas?

  • You can ask about the type of object.

  • What is the object in the chair? It's a pillow.

  • And these are, again, using the recurring neural networks.

  • You could do the same thing with video captions generation,

  • video captions description generation.

  • So looking at a sequence of images as opposed to just a single image.

  • What is the action going on in this situation?

  • This is the difficult task. There's a lot of work in it, in this area.

  • On the left is correct descriptions of a man is do stunts on his bike

  • or a herd a zebra are walking in the field and on the right,

  • there's a small bus running into a building.

  • You know it's talking about relevant entities but just doing an incorrect description.

  • A man is cutting a piece of a pair of a paper.

  • So the words are correct. Perhaps, but so you're close, but mostly are.

  • One of the interesting things

  • you can do with a recurring neural networks

  • is if you think about the way we look at images, human beings look at images,

  • is we only have a small phobia with which we focus in a scene.

  • So right now you're periphery is very distorted.

  • The only thing, if you're looking at the slides, you're looking at me

  • that's the only thing that's in focus.

  • Majority of everything else is out of focus.

  • So we can use the same kind of concept to try to teach a neural network to steer around the image.

  • Both for perception and generation of those images.

  • This is important first on the general artificial intelligence point

  • of it being just fascinating that we can selectively steer our attention

  • but also it's important for things like drones.

  • They have to fly at high speeds in an environment

  • where three hundred plus frames a second, you have to make decisions.

  • So you can't possibly localize yourself or perceive the world around yourself successfully

  • if you have to interpret the entire scene.

  • So we can do is you can steer, for example here shown, is reading a house number

  • by steering around an image.

  • You can do the same task for reading and for writing.

  • So reading numbers here, and this data set on the left, is reading numbers.

  • We can also selectively steer a network around an image to generate that image

  • starting with a blurred image first and then getting more and more higher resolution

  • as the steering goes on.

  • Work here at MIT is able to map video to audio.

  • So head stuff for the drumstick silent video and able to generate the sound

  • that would drumstick hitting that particular object makes.

  • So you can get texture information from that impact.

  • So here is the video of a human soccer player playing soccer

  • and a state-of-the-art machine playing soccer.

  • And, well let me give it some time,

  • to build up.

  • (LAUGHTER)

  • Okay. So soccer, we take this for granted, but walking is hard.

  • Object manipulation is hard. Soccer is harder than chess for us to do much harder.

  • On your phone now, you can have a chess engine that beats the best players in the world.

  • And you have to internalize that because the question is,

  • this is a painful video, the question is: where does driving fall?

  • Is it closer to chess or is it closer soccer?

  • For those incredible, brilliant engineers that worked on the most recent DARPA challenge

  • this would be a very painful video to watch, I apologize.

  • This is a video from the DARPA Challenge

  • (LAUGHTER)

  • of robots struggling

  • with basic object manipulation and walking tasks.

  • So it's mostly a fully autonomous navigation task.

  • (LAUGHTER)

  • Maybe I'll just let this play for a few moments to let it internalize how difficult this task is,

  • of balancing, of planning in an underactuated way.

  • We don't have full control of everything.

  • When there is a delta between your perception of what you think the world is and what reality is.

  • So there, a robot was trying to turn an object that wasn't there.

  • And this is an MIT entry that actually successfully, I believe, gotten points for this

  • because it got into that area

  • (LAUGHTER)

  • but as a lot of the teams talked about the hardest part,

  • So one of the things the robot had to do is get into a car and drive it and get out of the car.

  • And there's a few other manipulation task like walking on unsteady ground,

  • it had to drill a hole through a wall.

  • All these tasks and what a lot of teams said is the hardest part, the hardest task of all of them,

  • is getting out of the car.

  • So it's not getting into the car, it's this very task you saw now is the robot getting out of the car.

  • These are things we take for granted.

  • So in our evaluation of what is difficult about driving,

  • we have to remember that some of those things we may take for granted

  • in the same kind of way that we take walking for granted, this is more of X paradox.

  • Will Hans Moravec from CMU, let me just quickly read that quote:

  • "Encoded in the large highly evolved sensory motor portions of the human brain

  • is billions of years of experience about the nature of the world and how to survive in it."

  • So this is data. This is big data. Billions of years and abstract thought which is reasoning.

  • The stuff we think is intelligence is perhaps

  • less than one hundred thousand years of data old.

  • We haven't yet mastered it and so,

  • I'm sorry I'm asserting my own statements in the middle of a quote,

  • but it's been very recent that we've learned how to think.

  • And so we respected perhaps more than the things we take for granted

  • like walking, the visual perception and so on but those may be strictly a matter of data,

  • data and training time and network size.

  • So walking is hard.

  • The question is how hard is driving?

  • And that's an important question because the margin of error is small.

  • One, there's 1 fatality per 100 million miles.

  • That's the number of people that die in car crashes every year,

  • 1 fatality per 100 million miles.

  • That's a point 0.000001% margin of error.

  • That's through all the time you spend on the road, that is the error you get.

  • More impressed with ImageNet being able to classify a leopard, a cat or a dog

  • at above human level performance but this is the margin of error we get with driving.

  • And we have to be able to deal with snow, with heavy rain, with big open parking lots,

  • with parking garages, any pedestrians that behaves irresponsibly as rarely as that happens

  • or just some predictably, again especially in Boston, reflections.

  • The ones especially some things you don't think about:

  • the lighting variations that blind the cameras.

  • (Inaudible question from one of the attendees)

  • The question was if that number changes, if you look at just crashes, the fatalities per crash.

  • So one of the big things is that cars have gotten really good at crashing and not hurting anybody.

  • So the number of crashes is much, much larger than the number of fatalities

  • which is a great thing, we've built safer cars.

  • But still, you know even one fatality is too many.

  • So this is one that Google self-driving car team

  • is quite open about their performance since hitting public road,

  • this is from a report that shows the number of times

  • the driver disengaged

  • the car gives up control,

  • that it asked the driver to take control back

  • or the driver takes control back by force.

  • Meaning that they're unhappy with the decision that the car was making

  • or it was putting the car or other pedestrians or other cars in unsafe situations.

  • And so, if you see over time there's been a total

  • from 2014 to 2015

  • there's been a total of 341 times on beautiful San Francisco roads

  • and I say that seriously because the weather conditions are great there,

  • 341 times that the driver had to elect to control back.

  • So it's a work in progress.

  • And let me give you something to think about here.

  • This, with neural networks is a big open question.

  • The question of robustness.

  • So this is an amazing paper, I encourage people to read it.

  • There's a couple of papers around this topic.

  • Deep neural networks are easily fooled.

  • So here are 8 images where, if given to a neural network as input,

  • a convolutional neural network as input, the network with higher than 99.6% confidence says

  • that the image, for example the top left, as a robin.

  • Next to is a cheetah, then an armadillo, a panda, an electric guitar,

  • a baseball, a starfish, a king penguin.

  • All of these things are obviously not in the images.

  • So the networks can be fooled with noise.

  • More importantly, practically for the real world, adding just a little bit of distortion,

  • a little bit of noise distortion to the image, can force the network to produce a totally wrong prediction.

  • So here's an example, there's 3 columns,

  • correct image classification, the slight addition of distortion

  • and the resulting prediction of an ostrich for all three images on the left

  • and a prediction of an ostrich for all three images on the right.

  • This ability to fool networks easily brings up an important point.

  • And that point is that there has been a lot of excitement

  • about neural networks throughout their history.

  • There's been a lot of excitement about artificial intelligence throughout its history

  • and not coupling that excitement, not granting that excitement, in the reality

  • the real challenges around that has resulted in in crashes, in A.I. winters when funding dried out

  • and people became hopeless in terms of the possibilities of artificial intelligence.

  • So here is the 1958 New York Times article that said the Navy revealed the embryo of an electronic computer today.

  • This is when the first perceptron that I talked about

  • was implemented in hardware by Frank Rosenblatt.

  • It took 400 pixel image input and it provided a single output.

  • Weights were encoded in the hardware potentiometers

  • and waves were updated with electric motors.

  • Now New York Times wrote, the Navy revealed the embryo vanilla electronic computer today

  • that expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence.

  • Dr. Frank Rosenblatt, a research psychologist at the Cornell Aeronautical Laboratory in Buffalo,

  • said perceptrons might be fired to the planets as mechanical space explorers.

  • This might seem ridiculous but this is the general opinion of the time.

  • And as we know now, perceptrons cannot even separate a non-linear function.

  • They're just linear classifiers.

  • And so this led to 2 major A.I. winters in the 70s, in the late 80s and early 90s.

  • The Lighthill Report, in 1973 by the UK government, said there are no part of the field

  • of discoveries made so far produced the major impact that was promised.

  • So if the hype builds beyond the capabilities of our research,

  • reports like this will come and they have the possibility of creating another A.I. winter.

  • So I want to pare the optimism, some of the cool things we'll talk about in this class,

  • with the reality of the challenges ahead of us.

  • The focus of the research community, this is some of the key players in deep learning,

  • what are the things that are next for deep learning, the five year vision?

  • We want to run on smaller, cheaper mobile devices.

  • We want to explore more in the space of unsupervised learning as I mentioned

  • and reinforcement learning.

  • We want to do things that explore the space of videos more,

  • the recurring neural networks, like being able to summarize videos or generate short videos.

  • One of the big efforts, especially in the companies we do in large data,

  • is multi-modal learning.

  • Learning from multiple data sets with multiple sources of data.

  • And lastly, making money from these technologies.

  • There's a lot of this despite the excitement.

  • There has been an inability for the most part to make serious money

  • from some of the more interesting parts of deep learning.

  • And while I got made fun of by the TAs for including this slide

  • because it's shown in so many sort of business type lectures,

  • but it is true that we're at the peak of a hype cycle

  • and we have to make sure be given the large amount of hype and excited there is,

  • we proceed with caution.

  • One example of that, let me mention, is we already talked about spoofing the cameras.

  • Spoofing the cameras with a little bit of noise.

  • So if you think about it, self-driving vehicles operate with a set of sensors

  • and they rely on those sensors to convey to accurately capture that information.

  • And what happens, not only when the world itself produces noisy visual information,

  • but what if somebody actually tries to spoof that data.

  • One of the fascinating things have been recently done is spoofing of LIDAR.

  • So these LIDAR is a range sense that gives a 3D-point cloud of the objects in the external environment.

  • And you're able to successfully do a replay attack where you have the car

  • see people in other cars around it when there's actually nothing around it.

  • In the same way that you can spoof a camera to see things that are not there.

  • A neural network.

  • So let me run through some of the libraries that we'll work with

  • and they're out there that you my work with if you proceed with deep learning.

  • TensorFlow, that is the most popular one these days.

  • It's heavily backed and developed by Google.

  • It's primarily a python interface and is very good at operating on multiple GPUs.

  • There's Keras and also TF Learn and TF Slim which are libraries that operate on top of TensorFlow

  • that make it slightly easier, slightly more user friendly interfaces, to get up and running.

  • Torch, if you're interested to get in at the lower level

  • tweaking of the different parameters of neural networks

  • creating your own architectures.

  • Torch is excellent for that with it's own Lua interface.

  • Lua's a programming language and heavily backed by Facebook.

  • There is the old school "theano" which is what I started on a lot of people early on,

  • in deep learning started on, as one of the first libraries that supported

  • ahead came with GPU support.

  • It definitely encourages lower level tinkering, has a python interface.

  • And many of these, if not all, rely on Nvidia's library

  • for doing some of the low level computations involved with training these neural networks on Nvidia GPUs.

  • "mxnet" heavily supported by Amazon and they have officially recently announced

  • that they're going to be, their AWS, is going to be all in on the mxnet.

  • Neon, recently bought by Intel, started out as a manufacturer of neural network chips

  • which is really exciting and it performs exceptionally well.

  • I hear good things.

  • Caffe, started in Berkeley, also was very popular in Google before Tensorlow came out.

  • It's primarily designed for computer vision with ConvNet's

  • but has now expanded to all of the domains.

  • There is CNTK, used to be known and now called the Microsoft Cognitive Toolkit.

  • Nobody calls it that still I'm aware of.

  • It says multi GPU support, has its own brain script custom language

  • as well as other interfaces.

  • And we'll get to play around in this class is, amazingly, deep learning in the browser, right.

  • Our favorite is ConvNetJS, what you use, built by Andrej Karpathy from Stanford now OpenAI.

  • It's good for explaining the basic concept of neural networks.

  • It's fun to play around with. All you need is a browser and some very few requirements.

  • It can't leverage GPUs, unfortunately.

  • But for a lot of things that we're doing, you don't need GPUs.

  • You'd be able to train a network with very little and relatively efficiently without the [01:30:15] GPUs.

  • It has full support for CNNs, RNNs and even deeper reinforcement learning.

  • Keras.js, which seems incredible, we try to use for this class.

  • It has GPU support so it runs in the browser with GPU support

  • with Open GL or however it works magically

  • but we're able to accomplish a lot of things we need without the use of GPUs.

  • It's incredible to live in a day and age when it literally, as I'll show on the tutorials,

  • it takes just a few minutes to get started with building your own neural network

  • that classifies images and a lot of these libraries are friendly in that way.

  • So all the references mentioned in this presentation

  • are available at this link and the slides are available there as well.

  • So I think in the interest of time, let me wrap up.

  • Thank you so much for coming in today and tomorrow I'll explain the deep reinforcement learning game

  • and the actual competition and how you can win.

  • Thanks very much guys.

Alright. Hello everybody.

字幕と単語

ワンタップで英和辞典検索 単語をクリックすると、意味が表示されます

B1 中級

MIT 6.S094: ディープラーニングと自動運転車入門 (MIT 6.S094: Introduction to Deep Learning and Self-Driving Cars)

  • 221 23
    alex に公開 2021 年 01 月 14 日
動画の中の単語