字幕表 動画を再生する
Jabril: John Green Bot are you serious?!
I made this game and you beat my high score?
John-Green-bot: Pizza!
Jabril: So John Green Bot is pretty good at Pizza Jump, but what about this new game we made, TrashBlaster?
John-Green-bot: Hey, that's me!
Jabril:Yeah, let's see watch you've got.
John-Green-bot: That's not fair, Jabril!!
Jabril: It's okay John Green Bot we've got you covered.
Today we're gonna design and build an AI program to help you play this game like a pro.
INTRO
Hey, I'm Jabril and welcome to Crash Course AI!
Last time, we talked about some of the ways that AI systems learn to play games.
I've been playing video games for as long as I can remember.
They're fun, challenging, and tell interesting stories where the player gets to jump on goombas
or build cities or cross the road or flap a bird.
But games are also a great way to test AI techniques because they usually involve simpler
worlds than the one we live in.
Plus, games involve things that humans are often pretty good at like strategy, planning,
coordination, deception, reflexes, and intuition.
Recently, AIs have become good at some tough games, like Go or Starcraft II.
So our goal today is to build an AI to play a video game that our writing team and friends
at Thought Cafe designed called TrashBlaster!
The player's goal in TrashBlaster is to swim through the ocean as a little virtual
John-Green-bot, and destroy pieces of trash.
But we have to be careful, because if John-Green-bot touches a piece of trash, then he loses and
the game restarts.
Like in previous labs, we'll be writing all of our code using a language called Python
in a tool called Google Colaboratory.
And as you watch this video, you can follow along with the code in your browser from the
link we put in the description.
In these Colaboratory files, there's some regular text explaining what we're trying
to do, and pieces of code that you can run by pushing the play button.
These pieces of code build on each other, so keep in mind that we have to run them in
order from top to bottom, otherwise we might get an error.
To actually run the code and experiment with changing it, you'll have to either click
“open in playground” at the top of the page or open the File menu and click “Save
a Copy to Drive”.
And just an fyi: you'll need a Google account for this.
So to create this game-playing AI system, first, we need to build the game and set up
everything like the rules and graphics.
Second, we'll need to think about how to create a TrashBlaster AI model that can play
the game and learn to get better.
And third, we'll need to train the model and evaluate how well it works.
Without a game, we can't do anything.
So we've got to start by generating all the pieces of one.
To start, we're going to need to fill up our toolbox by importing some helpful libraries,
such as PyGame.
The first step in 1.1 and 1.2 loads the libraries, and step 1.3 saves the game so we can watch
it later.
This might take a second to download.
The basic building blocks of any game are different objects that interact with each other.
There's usually something or someone the player controls and enemies that you battle
-- All these objects and their interactions with one another need to be defined in the
code.
So to make TrashBlaster, we need to define three objects and what they do: a blaster,
a hero, and trash to destroy.
The blaster is what actually destroys the trash, so we're going to load an image that
looks like a laser-ball and set some properties.
How far does it go, what direction does it fly, and what happens to the blast when it
hits a piece of trash?
Our hero is John-Green-bot, so now we've got to load his image, and define
properties like how fast he can swim and how a blast appears when he uses his blaster.
And we need to load an image for the trash pieces, and then code how they
move and what happens if they get hit by a blast, like, for example, total destruction
or splitting into 2 smaller pieces.
Finally, all these objects are floating in the ocean, so we need a piece of code to generate
the background.
The shape of this game's ocean is toroidal, which means it wraps around, and if any object
flies off the screen to the right, then it will immediately appear on the far left side.
Every game needs some way to track how the player's doing, so we'll show the score too.
Now that we have all the pieces in place, we can actually build the game and decide
how everything interacts.
The key to how everything fits together is the run function.
It's a loop of checking whether the game is over; moving all the objects; updating
the game; checking whether our hero is okay; and making new trash.
As long as our hero hasn't bumped into any trash, the game continues.
That's pretty much it for the game mechanics.
We've created a hero, a blaster, trash, and a scoreboard, and code that controls their
interactions.
Step 2 is modeling the AI's brain so John-Green-bot can play!
And for that, we can turn back to our old friend the neural network.
When I play games, I try to watch for the biggest threat because I don't want to lose.
So let's program John-Green-bot to use a similar strategy.
For his neural network's input layer, let's consider the 5 pieces of trash that are closest
to his avatar.
(And remember, the closest trash might actually be on the other side of the screen!)
Really, we want John-Green-bot to pay attention to where the trash is and where it's going.
So we want the X and Y positions relative to the hero, the X and Y velocities relative
to the hero, and the size of each piece of trash.
That's 5 inputs for 5 pieces of trash, so our input layer is going to have 25 nodes.
For the hidden layers, let's start small and create 2 layers with 15 nodes each.
This is just a guess, so we can change it later if we want.
Because the output of this neural network is gameplay, we want the output nodes to be
connected to the movement of the hero and shooting blasts.
So there will be 5 nodes total: an X and Y for movement, an X and Y direction for aiming
the blaster, and whether or not to fire the blaster.
To start, the weights of the neural network are initialized to 0, so the first time John-Green-bot
plays he basically sits there and does nothing.
To train his brain with regular supervised learning, we'd normally say what the best
action is at each timestep.
But because losing TrashBlaster depends on lots of collective actions and mistakes, not
just one key moment, supervised learning might not be the right approach for us.
Instead, we'll use reinforcement learning strategies to train John-Green-bot based on
all the moves he makes from the beginning to the end of a game, and we'll evolve
a better AI using a genetic algorithm which is commonly referred to as GA.
To start, we'll create some number of John-Green-bots with empty brains
(let's say 200), and we'll have them play TrashBlaster.
They're all pretty terrible, but because of luck,
some will probably be a little bit less terrible.
In biological evolution, parents pass on most of their characteristics to their offspring
when they reproduce.
But the new generation may have some small differences, or mutations.
To replicate this, we'll use code to take the 100 highest-scoring John-Green-bots and
clone each of them as our reproduction step.
Then, we'll slightly and randomly change the weights in those 100 cloned neural networks,
which is our mutation step.
Right now, we'll program a 5% chance that any given weight will be mutated, and randomly
choose how much that weight mutates (so it could be barely any change or a huge one).
And you could experiment with this if you like.
Mutation affects how much the AI changes overall, so it's a little bit like the learning rate
that we talked about in previous episodes.
We have to try and balance steadily improving each generation with making big changes that
might be really helpful (or harmful).
After we've created these 100 mutant John-Green-bots, we'll combine them with the 100 unmutated
original models (just in case the mutations were harmful) and have them all play the game.
Then we evaluate, clone, and mutate them over and over again.
Over time, the genetic algorithm usually makes AI that are gradually better at whatever they're
being asked to do, like play TrashBlaster.
This is because models with better mutations will be more likely to score high and reproduce
in the future.
ALL of this stuff, from building John-Green-bot's neural network to defining mutation for our
genetic algorithm, are in this section of code.
After setting up all that, we have to write code to carefully define what doing “better”
at the game means.
Destroying a bunch of trash?
Staying alive for a long time?
Avoiding off-target blaster shots?
Together, these decisions about what “better” means define an AI model's fitness.
Programming this function is pretty much the most important part of this lab, because how
we define fitness will affect how John-Green-bot's AI will evolve.
If we don't carefully balance our fitness function, his AI could end up doing some pretty
weird things.
For example, we could just define fitness as how long the player stays alive, but then
John-Green-bot's AI might play \TrashAvoider\ and dodge trash instead of TrashBlaster and
destroy trash.
But if we define the fitness to only be related to how many trash pieces are destroyed, we
might get a wild hero that's constantly blasting.
So, for now, I'm going to try a fitness function that keeps the player alive and blasts
trash.
We'll define the fitness as +1 for every second that John-Green-bot stays alive, and
+10 for every piece of trash that is zapped.
But it's not as fun if the AI just blasts everywhere, so let's also add a penalty
of -2 for every blast he fires.
The fitness for each John-Green-bot AI will be updated continuously as he plays the game,
and it'll be shown on the scoreboard we created earlier.
You can take some time to play around with this fitness function and watch how John-Green-bot's
AI can learn and evolve differently.
Finally, we can move onto Step 3 and actually train John-Green-bot's AI to blast some trash!
So first, we need to start up our game.
And to kick off the genetic algorithm, we have to define how many randomly-wired John-Green-bot
models we want in our starting population.
Let's stick with 200 for now.
If we waited for each John-Green-bot model to start, play, and lose the game… this
training process could take DAYS.