Two Kinds of Learning in Turn-Based Game AI

AI is all the rage, and for some folks, it’s just magic. Well, it’s not magic. It’s an algorithmic system based on combining decision models with data analysis.

We’re going to take a look at some different aspects of AI systems in the context of Turn-Based games. Games like chess, tic-tac-toe, card games, etc. Specifically we’re going to take a look at two kinds of learning that are useful in a Game AI system:

Facts
Associations

Game AI Basics

Games have rules. Strict, codifiable rules. They have clear win and loss conditions. Because games have such clear and unambiguous rules, in a game, we can always evaluate a ‘best move’, subject to certain qualifications and constraints (we’ll come back to that).

A Game AI is a decision model. We model the game rules. We model the decision process for how to choose the best move. We remember game states and outcomes so we can use past gameplay to inform the current move choice.

At the core of a Game AI system is the game’s transition function, or system dynamics equation. (All turn-based games are discrete time dynamical systems, and many are discrete state systems as well). We’ll use the term transition function, since we are in computing land here. Turn-based games can be modeled as state machines. There is a set of possible states, and for each state, there is a set of possible actions. There is a transition function that puts that all together:

(State, Action) => State

The first piece of that is the game State. A game state is a particular configuration of all the pieces of the game at a particular point in time. This could be a particular board position in chess, or the combination of chip counts, your cards, and board cards in a hand of Texas Hold ‘Em, or the scoresheet and current dice roll in Yahtzee. It is every piece of info that is relevant to determining the outcome of the game, and therefore which actions you should choose to take.

Game states may include unknown, but probabilistic, information. In Seven Card Stud, you don’t know what hole cards your opponent has, and you don’t know what cards are at the top of the deck that are going to show up on the board. But you do know your cards, and you know the revealed board cards, so you know what cards your opponent does not have. This information is used all the time to make good decisions. You can compute the probabilities of getting a flush, or a straight, and so on. You can deduce that your opponent can’t have 3 Aces because you have an Ace, and another player folded an Ace already.

The second piece of that is the Action. Actions are taken by players, and cause a change to the game’s state, according to the game rules, which are encoded in the transition function. In chess this is moving a piece, or capturing, or castling, or resigning. In poker, this is checking, calling, raising, or folding – and in draw poker, choosing which cards to exchange.

Transitions can be probabilistic as well. If my game involves a possible action to roll a 6-sided dice, there are six possible next states that can occur with equal probability. If my action is to draw a card, that can be one of several possible cards with various probabilities.

Modeling Games and Decisions

The transition function models the game’s rules. With that in hand, we can start to build on top of that to find the best move for any given state. The goal of the Game AI is to find the best move, or in terms of the transition function, the best Action. Given a game state, it can search through the Action Space, the set of possible actions for that game state, call the transition function for that (State, Action) pair, and see the next game state. It can evaluate the game state to see how it stacks up against the other possible actions. The following pseudo-code illustrates the process:

bestAction, bestActionValue = ?, ?

for action in possibleActions() {
  nextState = transition(currentState, action)
  actionValue = evaluateState(nextState)

  if actionValue > bestActionValue {
    bestAction = action
    bestActionValue = actionValue
  }
}

There are two complex things here to unpack later, possibleActions(), and evaluateState(nextState).

If there are few actions, possibleActions may just naively iterate through a list and evaluate all the actions. If there are a tremendous number of possible actions, there may need to be a whole sub-process dedicated to deciding in which order to iterate through those actions. This may end up needing to be a parameterized model of its own, and the Game AI needs to learn what the best way to iterate through actions is.

The evaluateState function is where the topic of this article comes into play. Firstly, there will be a min-max tree. This is the idea of ‘thinking ahead’ in games. Instead of just looking at the game state after a single action, the AI plays out a series of possible actions to see what happens in various combinations. In a complex game, these trees can be enormous, so it is not practical to search all possible paths of the tree. The search will need to be controlled by parameters to determine how many moves ahead in the game to look (search depth), and how many different moves to evaluate at each state (search breadth). The search may find an end state for the game – a winning or losing state. It may not. When it doesn’t, it needs to make a best guess about the value of the state – the probability of winning or losing from that state – so that it can evaluate the proposed action at the top of the tree. And that brings us to the two kinds of learning to discuss.

Facts

In the context of Game AI decision making, facts are remembered game states that have already been analyzed to find guaranteed win/loss outcomes. For example, in chess, there are positions where there is a forced checkmate in 1 move, or 2 moves, or however many moves. Once that position is reached, the outcome is unavoidable, given you know the correct sequence of moves to play from that point. You only need to figure out that forced checkmate once, then you can remember the position, and any time you see it in the future, play the sequence of moves that guarantees victory.

In computing terms, it is caching. At some point in the past, the Game AI has already analyzed a position and found that it leads to guaranteed win or loss. That may have been an expensive process that took minutes, or hours, or days. We can save the result of that analysis. When the AI encounters that position in the future, it can look it up ‘from memory’ in a tiny fraction of the time it took to analyze the position. Probably milliseconds or seconds, so its a reduction of time of several orders of magnitude.

Collecting ‘facts’ about positions in a game is a form of learning. As the AI encounters different game states, it analyzes them and saves them for later, so that it can deal with those game states faster in the future. But it isn’t the kind of learning people typically think about when they think about AI or machine learning. That’s where associations come in.

But before we move on, I want to make note here about the value of facts in a Game AI. They are guaranteed. No matter who the AI is playing, no matter how good the opponent is, a fact about the outcome of a game state is always true. If you get to a state that is a known win state, you always win, there is no changing the value of that game state later, it is always a perfect value of 1 in terms of your probability of winning from that state. They are inherent to the structure of the game.

Associations

Associations are remembered game states where we don’t know of a guaranteed win/loss outcome. Why would we remember those game states if we don’t know for sure whether it leads to victory or not?

Many games are complex. The possible number of game states can be incomprehensibly large, to the point where it is not feasible for a modern computer to find a sure outcome for some states. It has to make some kind of best guess.

One thing we often say as humans is Whenever I do X, it doesn't go well. When we say a phrase like that, we are observing an association. We haven’t thoroughly analyzed the sequence of events in between the action and the ultimate outcome. We simply remark that action X is associated, or correlated, to bad outcomes.

Game AI can make use of the same idea. They can play a lot of games in a very short span of time, and use this to gather information. They can look at all those past games and make associations between game states and outcomes. If they observe that a certain early game state is associated with losing 95% of the time, they may choose moves that avoid that game state. We don’t know for certain that its a losing game state, but based on the history of games played, it sure seems like it.

There can be a tremendous number of possible game states, so the collected data on game states may be sparse – in other words, you may have a lot of game states that you have only seen once. From a statistics perspective, if I saw a game state once, and the outcome of a game was a loss, I don’t have a lot of confidence in that as a decision guide. There isn’t enough data. If I’ve seen that game state 1000 times, and the outcome of the game is always a loss, that is more convincing.

There are ways to make that data more dense, beyond just playing more games. We can use Feature Engineering to share data from different game states. Many game states share common features or patterns. Many times, only certain aspects of a given game state are relevant to the outcome. A pawn’s position on the opposite side of the board may have absolutely no impact on the outcome of a mating sequence. You may have 1000 distinct game states that you have recorded that only occur once, but you may be able to identify a common pattern that occurs in all of them, which gives you 1000 data points on that particular pattern. That can be used to make good decisions. These features also extrapolate well to new situations. I may not have seen this particular entire game state before, but I have seen some of the patterns present in this game state many times.

Storing the game history and using these associations is powerful. You can build a Game AI entirely on this idea without a min-max tree, and without using a state transition function to evaluate the next state for a possible move. But this is suboptimal, because the quality of the Game AI is sensitive to the game play history. If the Game AI plays only weak opponents, it will learn that certain game states are always winning states, even if there is a guaranteed sequence of moves that leads to defeat. Then it goes up against a strong opponent that knows how to win in that position, and the Game AI loses. In chess, for example, a game state may have a very tricky multi-move forced checkmate that weak players don’t see. You need a process of playing out game states to find things like that. Without it, the AI is only as good as its opponents. Furthermore, it can degrade over time, based on its play history. Even if it is trained against strong opponents, if it plays against weaker opponents, it will learn on that data, and get weaker.

Decision Model – Evaluating a Best Move

When we put those two kinds of learning together on top of our active min-max search model, we can get a very powerful decision model. An untrained model can still make good decisions in certain situations based on the game state. For simple games, an untrained model can play perfectly. For example, in tic-tac-toe, there are only 19,683 possible game states. For a computer, that’s a small number. A min-max tree AI can play that perfectly with no training. For complex games, the model needs training on game history. The min-max tree helps prevent the AI from making total blunders, the stored history of known winning and losing states helps guide it towards or away from guaranteed outcomes, and the accumulated associative data of its game playing history helps it choose moves in positions of uncertainty.

Let’s circle back to what I said earlier…

in a game, we can always evaluate a ‘best move’, subject to certain qualifications and constraints

Uncertainty is the qualification and constraint. There are two kinds of uncertainty to deal with in games:

Combinatorial Explosion
Imperfect information

Let’s talk about Combinatorial Explosion first. In chess, there is an absurdly large number of possible positions. Complex games like this have a combinatorial explosion of possible game states. This causes a computability problem. A computer can’t search all the possible game states for a given move in a reasonable amount of time. There is uncertainty about the value of a particular move, because we can’t see where it ultimately leads. IF we had the time and resources, we COULD play it all out and eliminate the uncertainty. Over time, it is possible to build up the database of facts to reduce and eventually eliminate this uncertainty. This is being done with chess . At the time of this writing, all possible endgames with 7 pieces are less have known outcomes. This fact learning can lead to systems that play perfectly.

Imperfect Information, or probabilistic states and actions, can never be eliminated within the rules of the game. In poker, within the rules, you are never going to know what cards your opponent has, or what card is going to be turned up next on the board. Decisions in these situations are inherently probabilistic, so the ‘best’ decision is in terms of probability and mathematical expectation. The absolute best decision possible may still lose 40% of the time, but that’s just the nature of probabilistic games. If that’s the best win percentage you can get, no matter what you choose, it’s still the best decision.

(You can eliminate the uncertainty outside the context of the rules by cheating of course. In poker, you can mark the cards, use a card mechanic, and so forth. With something like a dice roll, it is theoretically possible to compute the outcome of the roll if you knew all the relevant variables for the physics of the situation, but practically speaking, that isn’t possible in our current world. In the digital world, it is possible, depending on the kind of random number generator that is used. That may not technically be cheating, but it is absolutely dishonest and unfair if other players don’t have the same advantage.)

Conclusion

Machine learning is a tremendously useful tool, but sometimes its value is overblown. Some things are inherent in the structure of the problem. They don’t need to be learned. Learning is expensive, it takes time, and doesn’t guarantee perfect outcomes. If the problem is modeled correctly, you get a lot of good things for free. A good AI system starts with a good problem model, and a great AI system incorporates learning specifically to fill gaps that can only be filled with collected empirical data.

(Of course, someone has to learn how to model the problem, so we’ve got a chicken and egg problem. We should probably model that problem too.)