## 字幕表 動画を再生する

• Hey, I'm Jabril and welcome to Crash Course AI!

• One way to make an artificial brain is by creating a neural network, which can have

• millions of neurons and billions (or trillions) of connections between them.

• Nowadays, some neural networks are fast and big enough to do some tasks even better than

• humans can, like for example playing chess or predicting the weather!

• But as we've talked about in Crash Course AI, neural networks don't just work on their

• own.

• They need to learn to solve problems by making mistakes.

• Sounds kind of like us, right?

• INTRO

• Neural networks handle mistakes.

• using an algorithm called backpropagation to make sure all the neurons that contributed

• to an error get their math adjusted, and we'll unpack this a bit later.

• And neural networks have two main parts: the architecture and the weights.

• The architecture includes neurons and their connections.

• And the weights are numbers that fine-tune how the neurons do their math to get an output.

• So if a neural network makes a mistake, this often means that the weights aren't adjusted

• correctly and we need to update them so they make better predictions next time.

• The task of finding the best weights for a neural network architecture is called optimization.

• And the best way to understand some basic principles of optimization is with an example

• with the help of my pal John Green Bot.

• Say that I manage a swimming pool, and I want to predict how many people will come next

• week, so that I can schedule enough lifeguards.

• A simple way to do this is by graphing some data points, like the number of swimmers and

• the temperature in fahrenheit for every day over the past few weeks.

• Then, we can look for a pattern in that graph to make predictions.

• A way computers do this is with an optimization strategy called linear regression.

• We start by drawing a random straight line on the graph, which kind of fits the data

• points.

• To optimize though, we need to know how incorrect this guess is.

• So we calculate the distance between the line and each of the data points, add it all up,

• and that gives us the error.

• We're quantifying how big of a mistake we made.

• The goal of linear regression is to adjust the line to make the error as small as possible.

• We want the line to fit the training data as much as it can.

• The result is called the line of best fit.

• We can use this straight line to predict how many swimmers will show up for any temperature,

• but parts of it defy logic.

• For example, super cold days have a negative number, while dangerously hot days have way

• more people than the pool can handle.

• To get more accurate results, we might want to consider more than two features, like for

• example adding the humidity which would turn our 2d graph into 3d.

• And our line of best fit would be more like a plane of best fit.

• But if we added a fourth feature, like whether it's raining or not, suddenly we can't

• visualize this anymore.

• So as we consider more features, we add more dimensions to the graph, the optimization

• problem gets trickier, and fitting the training data is tougher.

• This is where neural networks come in handy.

• Basically, by connecting together many simple neurons with weights, a neural network can

• learn to solve complicated problems, where the line of best fit becomes a weird multi-dimensional

• function.

• Let's give John Green-bot an untrained neural network.

• To stick with the same example, the input layer of this neural network takes features

• like temperature, humidity, rain, and so on.

• And the output layer predicts the number of swimmers that will come to the pool.

• We're not going to worry about designing the architecture of John Green-bot's neural

• network right now.

• Let's just focus on the weights.

• He'll start, as always, by setting the weights to random numbers, like the random line on

• the graph we drew earlier.

• Only this time, it's not just one random line.

• Because we have lots of inputs, it's lots of lines that are combined to make one big,

• messy function.

• Overall, this neural network's function resembles some weird multi-dimensional shape

• that we don't really have a name for.

• To train this neural network, we'll start by giving John Green-bot a bunch of measurements

• from the past 10 days at the swimming pool, because these are the days where we also

• know the output attendance.

• We'll start with one day, where it was 80 degrees Fahrenheit, 65% humidity, and not

• raining (which we'll represent with 0).

• The neurons will do their thing by multiplying those features by the weights, adding the

• results together, and passing information to the hidden layers until the output neuron

• What do you think, John Green-bot?

• John Green-bot: 145 people were at the pool!

• Just like before, there is a difference between the neural network's output and the actual

• swimming pool attendance -- which was recorded as 100 people.

• Because we just have one output neuron, that difference of 45 people is the error.

• Pretty simple.

• In some neural networks though, the output layer may have a lot of neurons.

• So the difference between the predicted answer and the correct answer is more than just one

• number.

• In these cases, the error is represented by what's known as a loss function.

• Moving forward, we need to adjust the neural network's weights so that the next time

• we give John Green-bot similar inputs, his math and final output will be more accurate.

• Basically, we need John Green-bot to learn from his mistakes, a lot like when we pushed

• a button to supervise his learning when he had the perceptron program.

• But this is trickier because of how complicated neural networks are.

• To help neural networks learn, scientists and mathematicians came up with an algorithm

• called backpropagation of the error, or just backpropagation.

• The basic goal is to look at the loss function and then assign blame to neurons back in the

• previous layers of the network.

• Some neurons' calculations may have been more to blame for the error than others, so

• their weights will be adjusted more.

• This information is fed backwards, which is where the idea of backpropagation comes from.

• So for example, the error from our output neuron would go back a layer and adjust the

• weights that get applied to our hidden layer neuron outputs.

• And the error from our hidden layer neurons would go back a layer and adjust the weights

• that get applied to our features.

• Remember: our goal is to find the best combination of weights to get the lowest error.

• To explain the logic behind optimization with a metaphor, let's send John Green Bot on

• a metaphorical journey through the Thought Bubble.

• Let's imagine that weights in our neural network are like latitude and longitude coordinates

• on a map.

• And the error of our neural network is the altitude -- lower is better.

• John Green-bot the explorer is on a quest to find the lowest point in the deepest valley.

• The latitude and longitude of that lowest point -- where the error is the smallest -- are

• the weights of the neural network's global optimal solution.

• But John Green-bot has no idea where this valley actually is.

• By randomly setting the initial weights of our neural network, we're basically dumping

• him in the middle of the jungle.

• All he knows is his current latitude, longitude, and altitude.

• Maybe we got lucky and he's on the side of the deepest valley.

• But he could also be at the top of the highest mountain far away.

• The only way to know is to explore!

• Because the jungle is so dense, it's hard to see very far.

• The best John Green-bot can do is look around and make a guess.

• He notices that he can descend down a little by moving northeast, so he takes a step down

• and updates his latitude and longitude.

• From this new position, he looks around and picks another step that decreases his altitude

• a little more.

• And then anotherand another.

• With every brave step, he updates his coordinates and decreases his altitude.

• Eventually, John Green-bot looks around and finds that he can't go down anymore.

• He celebrates, because it seems like he found the lowest point in the deepest valley!

• Or... so he thinks.

• If we look at the whole map, we can see that John Green-bot only found the bottom of a

• small gorge when he ran out ofdown.”

• It's way better than where he started, but it's definitely not the lowest point of

• the deepest valley.

• So he just found a local optimal solution, where the weights make the error relatively

• small, but not the smallest it could be.

• Sorry, buddy.

• Thanks, Thought Bubble.

• Backpropagation and learning always involves lots of little steps, and optimization is

• tricky with any neural network.

• If we go back to our example of optimization as exploring a metaphorical map, we're never

• quite sure if we're headed in the right direction or if we've reached the lowest

• valley with the smallest error -- again that's the global optimal solution.

• But tricks have been discovered to help us better navigate.

• For example, when we drop an explorer somewhere on the map, they could be really far from

• the lowest valley, with a giant mountain range in the way.

• So it might be a good idea to try different random starting points to be sure that the

• neural network isn't getting stuck at a locally optimal solution.

• Or instead of restarting over and over again, we could have a team of explorers that start

• from different locations and explore the jungle simultaneously.

• This strategy of exploring different solutions at the same time on the same neural network

• is especially useful when you have a giant computer with lots of processors.

• And we could even adjust the explorer's step size, so that they can step right over

• small hills as they try to find and descend into a valley.

• This step size is called the learning rate, and it's how much the neuron weights get

• adjusted every time backpropagation happens.

• We're always looking for more creative ways to explore solutions, try different combinations

• of weights, and minimize the loss function as we train neural networks.

• But even if we use a bunch of training data and backpropagation to find the global optimal

• solutionwe're still only halfway done.

• The other half of training an AI is checking whether the system can answer new questions.

• It's easy to solve a problem we've seen before, like taking a test after studying

• We may get an A, but we didn't actually learn much.

• To really test what we've learned, we need to solve problems we haven't seen before.

• Same goes for neural networks.

• This whole time, John Green-bot has been training his neural network with swimming pool data.

• His neural network has dozens of features like temperature, humidity, rain, day of the

• week, and wind speedbut also grass length, number of butterflies around the pool, and

• the average GPA of the lifeguards.

• More data can be better for finding patterns and accuracy, as long as the computer can

• handle it!

• Over time, backpropagation will adjust the neuron weights, so that neural network's

• output matches the training data.

• Remember, that's called fitting to the training data, and with this complicated neural network,

• we're looking for a multi-dimensional function.

• And sometimes, backpropagation is too good at making a neural network fit to certain

• data.

• See, there are lots of coincidental relationships in big datasets.

• Like for example, the divorce rate in Maine may be correlated with U.S. margarine consumption,

• or skiing revenue may be correlated with the number of people dying by getting trapped

• in their bedsheets.

• Neural networks are really good at finding these kinds of relationships.

• And it can be a big problem, because if we give a neural network some new data that doesn't

• adhere to these silly correlations, then it will probably make some strange errors.

• That's a danger known as overfitting.

• The easiest way to prevent overfitting is to keep the neural network simple.

• If we retrain John Green-bot's swimming pool program /without/ data like grass length

• and number of butterflies, and we observe that our accuracy doesn't change, then ignoring

• those features is best.

• So training a neural network isn't just a bunch of math!

• We need to consider how to best represent our various problems as features in AI systems,

• and to think carefully about what mistakes these programs might make.

• Next time, we'll jump into our very first lab of the course, where we'll apply all

• this knowledge and build a neural network together.

• Crash Course Ai is produced in association with PBS Digital Studios.

• If you want to help keep Crash Course free for everyone, forever, you can join our community

• on Patreon.

• And if you want to learn more about the math of k-means clustering, check out this video

• from Crash Course Statistics.

Hey, I'm Jabril and welcome to Crash Course AI!

B1 中級

# ニューラルネットワークのトレーニング。クラッシュコースAI #4 (Training Neural Networks: Crash Course AI #4)

• 0 0
林宜悉 に公開 2021 年 01 月 14 日