字幕表 動画を再生する 英語字幕をプリント [WHISTLE] Hello, and welcome to a coding challenge, Tic, Tac, Toe. Oh, with the Minimax algorithm. That's why I'm here, because I made this other coding challenge, Tic-Tac-Toe, where I created a kind of a big-- it was kind of a mess if I'm being perfectly honest. But I made a working version of the Tic-Tac-Toe game that just played with two players picking random spots. So since then, you can check one of my live streams if you want to find where I did this. I made some adjustments to it so that I could, as a human being, play the game. So right now I'm going to play the random computer picker. I'm going to go here. And then I'm going to block X. And then I'm going to go here. And ha, ha, I win. O wins. So what I have the adjustment that I made is that I added a mouse pressed function where I find where did I click? And I put my human variable, which is the letter o onto the board. And then I called next turn where next turn picks a random spot in the board and makes that the AI spot or X's spot. So the whole point of this video is for me to implement something called Minimax, which is an algorithm, a search algorithm, if you will, to find the optimal next move for the AI. I mean, it could find it for myself. And then I could implement it. But the idea is I want this player, the computer player to beat me at the game or at least tie to play a perfect game. If you want to learn more about the Minimax algorithm, I would suggest two resources that you can look at that I actually looked at before beginning this coding challenge. I haven't programmed this before. But I watched this video by Sebastian Lague that explains the Minimax algorithm, also something called alpha-beta pruning, which I'm not going to implement but could be a good exercise, next step for you. And I also found this article on the geeksforgeeks.org website, which is three-part series about the Minimax algorithm and how to apply it to Tic-Tac-Toe. So those resources are probably your best bet for learning about the theory of the Minimax algorithm and how it can be applied to a wide variety of scenarios. But I'm going to look at the specifics of Tic-Tac-Toe, which is a great starting example, because it's a simple enough game that I can actually diagram out all of the possibilities, which even if that's computationally possible, it's very hard to diagram. And there's lots of scenarios where you couldn't even compute all of the possible outcomes. And chess is an example of that. So Minimax is typically visualized as a tree. So the root of the tree is the current state of the game board. [MUSIC PLAYING] So I've drawn a configuration here of the Tic-Tac-Toe board midway through the game because I don't want to start at the beginning where there's so many possibilities that I couldn't diagram. So right now it's X's turn. And X has three possible moves. So I could express that in a tree diagram as move one, move two, and move three. So let's take a look at what I mean by that. [MUSIC PLAYING] I'm going to use a blue marker so you can see the new moves. So let's say one possible move is X going here. Let me diagram out the next two possible moves. [MUSIC PLAYING] Another possible move is X going here. [MUSIC PLAYING] And another possible move is X going here. So we need to look at each of these and determine is the game over, or is it continuing? So only in this case is the game over. And in this case, the game is over, and X has one. So I'm going to mark this green for the state of X having won the game. Now, it's O's turn. And for each of these spots, O could make two possible options. [MUSIC PLAYING] O could go here, or O could go here. Oh could go here, or O could go here. And look at these. In this case and this case, O has won. Remember, we're trying to solve for the optimal move for X. X is making the first turn. So this means I can mark these were O wins as red, as like, those are bad outcomes. So while these are not terminal states, there's only one move possible for X. So I could draw an arrow and put that down here. But ultimately I can just consider this as X has to go here. So these, you could consider as terminal and in which case X will win. Now, you see that I visualized all the possible outcomes of the Tic-Tac-Toe game based on this starting configuration. So how does X pick which path to go down where we can see here that the goal is X to win the game to get to here, here, or there. How does it do that? And the way that it does that-- by the way, spoiler alert. This is the move it should pick. It should just win instantly. But the idea of the Minimax algorithm is if we could score every possible outcome, those scores can ripple back up the tree and help X decide which way to go to find the highest possible score. The nice thing about Tic-Tac-Toe is there's not a range of scores. The endpoints are either you won, you lost, or you tied. I could say if X wins, it's a score of plus 1 . If O wins, it's a score of negative 1. And if it's a tie, it's a score of 0. So in other words, we can give this a score of plus 1. This a score of plus 1 . This a score of negative 1. This a score of negative 1, and this a score of plus 1. But we've got an issue. I'm trying to pick between this option, this option, and this option. These two don't have a score. How do I pick their score? Well, I can pick their score based on which one of these would be picked. But who's picking these? This is why the algorithm is called Minimax, because we have two players each trying to make an optimal score. X is what's known as the maximizing player. So to go from here to here, it's X's turn. And X is maximizing. But here, it's now O's turn. O wants to win. X can assume that O is going to play their optimal move. Now, you could have a situation where your player doesn't play optimally. And that's something you might have to consider in terms of game theory. But in this case of the algorithm, we can assume that O is going to make the best possible move. I mean, if it makes a worse move, that's only better for us anyway so it's all going to work out. So O is what's referred to as the minimizing player-- the minimizing player. So O has the option of picking negative 1 or plus 1, negative 1 or plus 1. Which one is O going to pick as the minimizing player? The option to win so we can then take the best outcome for O, the minimizing outcome for O and pass that back up here's the score and the same thing here. Now, X is the maximizing player. Which one of these is it going to pick-- negative 1, plus 1, or negative 1. These are negative 1, because even though it could win here, if X goes this way, O is going to win. If X goes this way, O is going to win if O plays optimally. So this is the way to go. This scenario that I picked might not be the best demonstration of the idea. In fact, there's no ties here. There's only two turns being played. So maybe what you might want to do is pause this video and do the same exact diagram but have X and O with two starting positions and see if you can diagram that whole thing out yourself. So how do we implement this in code? So that your first clue should be the fact that this is a tree diagram. Anytime you see a tree diagram, typically speaking, a recursive algorithm, a function that calls itself is something that will apply. And this case is no different. I want to call Minimax on this board position, try, and then loop through all the possible moves and call Minimax on each one of these board positions, loop through all the possible moves. If I ever get to an end point, return the score, and have that score backtrack up or ripple back up the chain. So let's see if we can write the code to do this. So this here is the place where I want to write the new code. Currently, a move is chosen by picking a random available spot, and I no longer want to do that. I'm actually going to change the name of this function to Best Move. And I don't need to worry about this, what's available. I just want to actually look at all the possible spots. So basically, I just want to know is the spot available? If the spot is available, I want to try going there. So I'm going to say board-- IJ is the AI player. And then I want to call Minimax on the board. Why do I want to call Minimax? Because I want to get what is the score? I want Minimax with return to me the score for this particular move. Now, it's going to have to recursively check these two spots. But we'll get there. We're getting there. Ultimately, though, I need to figure out which one was the best. So I need to keep track by saying, like the best score is negative infinity. If score is greater than the best score, then that's the new best score. And the best move is this particular IJ. And let me declare best move. So, again, I haven't implemented Minimax yet. I'm just getting the basic idea down. For the furthest turn, I need to start by looking at all the possible moves, get the score for those by calling this mysterious Minimax function I'm about to write, and find the best one, then apply that move. There's a big issue here, though, because I am changing the global variable aboard. I am altering the game. I am moving X to that spot and then moving X to the next spot. So I can make a copy of the board. But I think an easy way to deal with this would actually just be to undo it. So right afterwards, I'm going to just quickly undo that move. The next step would be to write the Minimax function that truly write the algorithm itself. I kind of want to know to see if there's any errors here. So actually, what I'm going to do is I'm just going to put a placeholder function in, Minimax, which receives a board. And then I'm just going to return the score of 1. So Minimax is always going to return the score of 1, meaning it's not going to be able to pick. They're all going to be a tie. So it still is going to pick the first one. But let's see how that goes. Next turn is not defined because I called it best move. And it's in one more place here in Setup. OK, where? It went there. Watch, it's going to go to the next available spot, to the next available spot, to the next available spot. So you can see this is working. It's working in a sense that the algorithm is making a choice. But no matter what, it's always just going to go in order of the board and oh, X won. There's also a big issue here, which thankfully no major problems happened. But I named this function Best Move. And then I created a variable inside of the function with the same name, which can always run into trouble. So let me actually just change this to move. Sort of by definition, it is the best move-- make sure this still works. All right. X is just picking straight down. Now, the hard part. I need to write the Minimax function itself. So one thing that I've skipped here, which is typical of a Minimax algorithm, is I would also keep track of the depth, because ultimately, I could want to consider like, how long is it taking me to get somewhere? What is the depth of the tree that I'm on? So that's something that you'll see often within the algorithm implementation. So let me add that. So I add an argument depth here and then give it a 0 at first. Oh, you know what. I need another really important argument. This is a turn-based game. And the algorithm is going to behave differently, whether it is the maximizing player's turn, X, or the minimizing players turn, O, in this case. And, again, you could have more than two players. Or there could be a lot of weird stuff going on. But that's ultimately what I need here. So let me also add is maximizing-- is maximizing. And this here would be true. All right, so I want to perform Minimax on this board with a starting depth to 0. And the first move is the maximizing player. So let's add maximizing. What do I want to do? Actually, what I want to do even before this is check to see if somebody won. [MUSIC PLAYING] I'm pretty sure I already have a check winner function. That's something that I wrote in the previous coding challenge. Let me double-check that-- check winner. The winner is null. And then by the end, it returns a tie. It returns null, a tie, or the winner. So the score is-- let's make a little lookup table that's basically exactly what I wrote here. If X wins, it's a 0. If O wins, it's a a negative 1. If it's a tie, it's a 0. [MUSIC PLAYING] All right, so this is my lookup table for what the score then-- and, again, it's, a very simple scenario. There's only three possible scores. If the result is not equal to null, the score is the number associated with this particular O. It got rid of those. I didn't need those quotes. Visual Studio Code, just clean that up for me, thank you very much. The score is based on whoever won, and then I can return that. So this would be the terminal condition. If I am calling Minimax on this particular board configuration at this particular depth, and it's an end state, if it's a terminal state, just return the score. That's what it's supposed to do. [RINGING] If it's not this state, I can't just return the score. I need to check all the possible next moves. So I got this wrong actually. The next move, this is the AI player trying a spot, their move. So the next move is actually not the maximizing player. This should be false. But I'm going to write this as if it is. So if it's the maximizing player, I can copy, paste that from above, check all the possible spots again. If it's available, try. Now, that's the human, right? Then, oh, no, this is the AI. I'm confused, because clearly the way I'm stepping through this, in my mind it's O's turn. But I'm ready the code for both, whether it's X's turn or O's turn. So for the sake of argument, it's X's turn right now. So I'm checking all of those spots. And then I'm going to say return. I need to find the best move. I'm kind of doing when I did again. Is there a way for me to reduce the amount of code I'm writing? I'm not going to worry about this. I want to, once again, find the best score, which will be in this case, negative infinity. And I want to say the score is the Minimax algorithm of the board at depth plus 1. And now the next player's turn is false because I'm the maximizing player. And then I'm going to undo this just like I did before. Why do I have to write this code twice? I'll think about it later if I can refactor this, only do it once. If score is greater than best score, best score is equal to score. And then after all of this-- for loop, for loop, if, it's an empty spot, boom, boom, return the best score. So this is finding the best score for all the possible next turns by the AI player. Or if it's not the AI player, if it's the minimizing player, we can do exactly the same thing. And, again, maybe there's a way. I'm missing a curly bracket? Yes, thank you. There's a curly bracket. So this is very important. There's a lot of ifs and for state for loops here. So maybe there's also a nice way to refactor this and make it more readable. But if it's the maximizing player, check all the spots, find the best possible outcome, and return it. But call Minimax recursively at the next future move. And then here, the minimizing player wants to find the best score for it, which is the lowest score. So and that's the human player. And if the score is less than the best score and then return that best score. Well, oh, forget about this return one. I'm going to stare at this to make sure all my curly brackets line up-- if, if, for, for, and then return the best score. Otherwise, if, if, for, for, and return the best score. I think this might actually be good. Should we just try it? [DRUM ROLL] Error on line 33, return-- hold on. Oh, this should be-- ah, look at this. So this is a huge mistake here. Return true-- why did I have that there? I don't know why I did that. That's return score. This should be a return score. The whole point of this-- and I don't even need a separate variable here. I can just say, return scores result. Right, so this is any kind of recursive function needs a terminal condition, needs to exit, right? This function is always going to keep calling itself and calling itself Minimax-- Minimax-- Minimax-- Minimax. I don't know what this extra code is in here and Minimax, OK. Oh, yes. Simon is pointing out something to me, which is great, which I don't need this IF statement to find the best score-- the whole point of this mini. And also, there's a great chance we use the min function and the max function because it's the Minimax algorithm. So I can actually say score is the lower one. The minimum between score and best score is going to make it way easier to read. Thank you. And oh, no, but this one is max-- is max. And then this one-- oh, thank you for this. I don't know why I didn't think of this. I'm sure it's in every implementation score is the minimum between score and best score. So this makes it much easier to read here, OK. Let's try it. Let's try it. Why not? Caution to the wind. [DRUM ROLL] OK, X went in the top left. That's definitely the optimal place to go. I'm going to go here. [DRUM ROLL] No, no bad X-- bad X. You're not going in the right place. [MUSIC PLAYING] Yeah, I beat you. You're still going in order. Why are you still going in order? Oh, whoa, you're not going in order. You're making weird decisions. Oh, oh, oh. I see the error. I see the error of my ways. This has to be true, right? After the maximizing player's turn, it's the minimizing player's turn. After the minimizing player's turn, it's the maximizing player's turn, OK. [DRUM ROLL] That's a good move, X. I see what you did there. [DRUM ROLL] [MUSIC PLAYING] Why can't you figure out to go there, X? Oh, oh. [BUZZ] OK. This, I'm finding this. Score is the new score. And the best score should be the bigger one, the maximum between score and best score, not score. This has to be this, because I'm returning best score. And this has to be this. OK, OK. [WHISTLING] Let's see. I really should not continue to play this drum sound. [DRUM ROLL] Come on, X-- figure out where to go. OK, I'm going to go here. Oh, shit, come on. Oh, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no-- [BUZZ] That's in the wrong place. It's in the wrong part. I knew. Look, my brackets-- bad brackets. I need to return the best score, hello, after I checked all of the possibilities, meaning both for loops, the I and the J I've completed. I got that right up here but not down here. Oh, OK. I really think we've got it this time. Tympani sound. [DRUM ROLL] Ah, yes. [DRUM ROLL] [MUSIC PLAYING] Ha, ha, tied. All right. All right, you. We're going to go with the competition now. I'm going let you in by going here. Ah. So if I don't go in, if I go in a corner, yet, if I don't go to the middle, it's always going to win. So X is always going to win, unless I go to the middle in which there's always going to be a tie unless I make a mistake. Then, it's going to win. But if I go to any other spot, X is going to be able to win because it's always going to-- [LAUGHS] OK, this work. Hey, man. So that's the Minimax algorithm. [RING] So one thing that might be interesting for me to try here really quickly before I sort of finish off this challenge would just be the comment out the AI going first. So I know this is technically correct because X is supposed to go first. But let's see what happens if I start with a fresh broad and I go first. So I'm going to go here. I'm going to go here. I'm going to here. Oh. I'm going to go here. Oh, that's high. Ooh, and something became undefined. Oh, if I go last, I guess there's no move for X. And I've got an error. So that's [INAUDIBLE]. Can I beat-- No, I lost. I don't believe there's any way for me to beat the AI player. It's a solved game because it's always going to go to the center. So I could try going here. I could go here, go here. The best I could do is a tie. So you've watched this coding challenge. Maybe you have a creative idea about how to make your own version of this. But I just quickly whipped up thanks to the chat in the Livestream. Also, some suggestions of things you could try. So first of all, what happens if you just make two AI players that just play against each other over and over again? Well, in that case, it's going to be a tie every single time. It's a solved game. But that might lead to some interesting possibilities. One thing that I'm not using is using the depth in the score. So, for example, if I go back to the code, and I change these to, say, 10 and negative 10, I could account for the depth, meaning I could have a higher score if I win more quickly than if it takes longer to win. Right now, I'm not accounting for that. So where in the code would you subtract out depth? That's something you could try. Certainly it would be interesting to make a larger board. So can you play Tic-Tac-Toe in a 5 by 5 or a 7 by 7? The winning conditions have changed. The optimal play would change. That would be exciting to try. I hope somebody does that. You could try more players. Like, how would you have a Tic-Tac-Toe game with three players? That's something you try with a larger board. You could try another game. Maybe Connect Four would work. Might be able to apply Minimax too. Just thinking about the interface, animation. Like, right now whenever I go, the next turn happens immediately. You could think about timing and that sort of thing. But then I would say high degree of difficulty. And maybe worth me coming back to in a future video at some point would be these two last items. One is known as alpha beta pruning. Alpha beta pruning refers to an aspect of the Minimax algorithm where you find an optimal path. And you know that all the other possibilities are going to be worse. And you don't need to go forth and check every possibility. So you could research how that works and add it to this algorithm. It's explained in the Sebastian Lague video. Then there's a possibility of a game where the number of moves is so vast you couldn't possibly compute the entire tree. Chess is the classic example of that. So you need some heuristic or a way of an estimated guess of what the score could be with any given state. So one way of doing this, which I'm not saying is a good strategy. But a simple thing you could do with chess is you could add up the total number of white pieces versus black pieces. And you could say, like, well, the score is higher if the maximizing player has more pieces than the opponent. So that's one way of approaching it. So you don't actually stop at the end of the tree. But you just go some predetermined depth and then calculate an estimate and have that ripple back up the tree. So maybe give that a try with a more complex game. Certainly, other types of AI algorithms and neural networks could play a role at some point with how you make that estimate. But ultimately, that is something that would be interesting to try as well. One more idea that just came from the chat. So this is quite similar to the idea of a larger board. But instead of just a larger two-dimensional board, you could create a three-dimensional board. So Tic-Tac-Toe, that happens in a cube. I should just do that as a coding challenge separately myself. So hopefully, you've learned something. And maybe this algorithm looked a little terrifying to you at one point, now seems quite accessible and doable. Maybe your creative idea is not on this list. Please make your own version of this. Share it with me at the codingtrain.com. They'll be a link in this video's description with instructions for how to share it. If you don't know how to share it, just write a comment, file an issue. We, the coding train community are here to help you. [WHISTLE] And I can't wait to see you in a future video. Goodbye. [MUSIC PLAYING]
A2 初級 コーディングチャレンジ154ミニマックスアルゴリズムを用いたTic Tac Toe AI (Coding Challenge 154: Tic Tac Toe AI with Minimax Algorithm) 3 0 林宜悉 に公開 2021 年 01 月 14 日 シェア シェア 保存 報告 動画の中の単語