Monday, March 18, 2013

Dots 'n' Boxes - playable, sometimes beatable

Over the weekend, I did a little bit of work on the small Dots-n-Boxes (DnB) game that I've been working on for March.

Dots and Boxes is Hard(ish)

Years and years ago, I took a game theory class taught, in part, by Elwyn Berlekamp. The man's an important figure in combinatorial game theory, but I was frustrated, because that didn't seem to cover the parts of game theory that I was interested in. We had one lecture on finding optimal mixed strategies, given a payoff matrix - which I still think is more generally applicable.

Berlekamp wrote the book on DnB, and surprise, surprise, it's a good example of using combinatorial game theory to find optimal play. Well, its one of the few examples of a game that is appropriate for CGT. When was the last time you played a good game of Nim? Yeah.

Later on, I saw John Conway speak at the Game Developers Conference, and the point of his keynote was that game developers should put more mathematical grounding in their games. I think he wanted more games that could have provably optimal play. If you want a contest to prove your mathematical insight, you're welcome to build such a thing, but that's (mostly) not the kind of games I'm interested in.

As part of that keynote, Conway brought a person from the audience (let's assume this guy wasn't a plant) and challenged the audience member to play a game of DnB. Conway pointed out that a 3x3 dot game of DnB has approximately the complexity of tic-tac-toe, and no adult would have any difficulty playing to a draw after they've played tic-tac-toe a few times. Conway proceeded to trounce the audience member game after game after game.

I expect there are a few tricks to 3x3 DnB, which Conway must have picked up, but it's weird to me that they're hard for a casual player to grasp.

I think one of the things that distinguishes DnB from tic-tac-toe is that DnB is a combinatorial game, where tic-tac-toe is not. If you're not familiar with CGT, that may require a little more explanation. In CGT, an important property of games is that they break up into independent smaller games. Nim is a good example of this kind of game - in Nim, players alternately take stones from a set of piles of stones, and the last player to move wins. With some analysis, you can determine that you can play optimally by XORing the sizes of the stone piles together and always leaving your opponent a game configuration that XORs to 0. (I leave the analysis to the reader. Or Google it. Or take it on faith. Or don't - it's been a while, but that sounds correct.)

So, Nim is actually one game, made up of several smaller games, each on a single pile of stones. Each time you make a move in the game of multi-pile Nim, you're actually selecting one pile of stones, and making a move in a game of single-pile Nim. The analysis of multi-pile Nim ends up being the study of how these independent games of single-pile Nim work together.

Back to DnB, though. At the beginning of the game, maybe you play randomly, and you try to avoid drawing a third wall on any box, or else your opponent gets a free box. Good so far. At some point, all these "free" lines have been taken, and all that remains are a number of chains. So, maybe you pick the smallest chain, and add a line to a chain, and your opponent takes that chain and gives you the next chain, alternating, and if you're lucky, you get enough of the long chains to win.

Wait, lucky? Did I mean that? Well, I shouldn't have - there's no luck in this game. Well, if you're actually playing randomly, there could be.

Anyway, you can see that these long chains have the feel of the single-pile games of Nim, and you can hear Conway and Berlekamp in the shadows, laughing at you. Should have paid attention in game theory class.

It's been a while since I read Berlekamp's book on DnB, but I recall him making some claim that an observant DnB player can beat a good computer playing DnB using normal lookahead. I think that he had a research student write a mediocre computer player, and Berlekamp was able to beat it at some reasonable lookahead depth. Maybe he was being unfair, or maybe he was making a point that you can abstract board state in DnB down to a much simpler representation (the set of lengths of chains in late-game play), which is much more susceptible to efficient analysis.

Weekend Progress

This weekend, I started by making just about the simplest computer player that I could - one that listed all the legal moves, and picked one at random. I like giving my AI opponents names, so I named this one "Randy".

This opponent isn't much to play against - one can readily beat him if you're paying any sort of attention. Well, and not trying to lose. I spent a little bit of time trying to lose to Randy, and found a few bugs where the AI kept trying to find good moves, even when the game was over.

After getting Randy to play as well as he was going to, I set Randy playing against another Randy, and it was interesting to me that it was typical that games typically were won by a large margin. This may have been due to there being a few very long chains, and once you started playing in a long chain, you usually cleaned the board.

After Randy, I proceeded to use the same framework of choosing moves, and added in a fixed-depth Minimax opponent.

One thing that I find frustrating is the gulf between the academic presentation of AI (particularly minimax game tree analysis) and the practical AI that's used for computer opponents in actual games. This sort of gets to my dismissal of John Conway's plea for more mathematically-based computer opponents. The problem is that the assumptions and the goals are completely different - academics typically assume that they have a single thread of execution and can consume that, maybe for a maximum length of time, and if they can come up with an optimal answer, that's great, or, if not, at least a best guess. In practical opponent design, it's more important to provide a good challenge to the human, and you typically only get a few milliseconds at a time before you have to give the CPU back to let the game render or play music or whatever. Some of this can sometimes be handled by threads, but it seems to me that being able to make incremental progress towards a good solution is important, as it'll give you more room to negotiate as the other parts of the game consume more or less resources.

So, I hacked together a simple minimax opponent, and I'm being all sorts of gross, as I'm making heavy copies of a heavy board state object, and I'm not caching them, so if I'm evaluating one board state in one branch of the game tree, and then I re-evaluate that board state in another branch, I don't reuse the knowledge I came up with, I redo all of the work. So, branching factor kills me.

Even with all of that, I play a little bit of 3x3 dot DnB, and the 2 and 3 depth minimax player is playing acceptably - not perfectly, but decently. Depth 3 on 3x3 can play me to a draw and beat me now and then, and I can beat it, but I think that means I'm a terrible player.

I tried playing 3x4 dot DnB, and it was at the edge of my patience. I also cranked it up to 4x4, and that was well outside what I wanted to wait for.

This morning, I added in a simpler representation - I took each piece of important information out of the board state and combined all of the data into a single number that uniquely represented the board. In the code, I call it a 'hash', which is not exactly correct - it's a binary packed representation. I can turn a board into a 'hash', and construct a board from a 'hash'. From this, I can start caching results, which will help prune out redundant evaluation.

Another thing I could do is look into implementing alpha-beta pruning, which ought to eliminate a bunch of other unnecessary evaluation, but I think won't be as useful.

Another thing that would help a great deal is to identify equivalent boards. The easiest set of equivalences is to recognize that rotating a board 90 degrees doesn't make an interesting difference when figuring out the best move. If you can rotate the board into a canonical best state, and evaluate that, then you can reverse the rotation and get the best move for 4 rotations (on a square board) plus 4 more with reflection. That's an 8x speedup, which is pretty good for shallow trees.

Another bit of equivalences that'd be useful to capture would be to boil the board down to chain lengths. A chain of length three behaves the same way, no matter where it is on the board. And then a chain of three and a chain of five combine the same, even though there are a lot of ways that those two chains might be positioned on the board.

No comments:

Post a Comment