N-Grams and the Last-Good-Reply Policy Applied in General Game Playing

M J W Tak,Y Bjornsson,M H M Winands

doi:10.1109/tciaig.2012.2200252

Abstract

The aim of general game playing (GGP) is to create programs capable of playing a wide range of different games at an expert level, given only the rules of the game. The most successful GGP programs currently employ simulation-based Monte Carlo tree search (MCTS). The performance of MCTS depends heavily on the simulation strategy used. In this paper, we introduce improved simulation strategies for GGP that we implement and test in the GGP agent CADIAPLAYER, which won the International GGP competition in both 2007 and 2008. There are two aspects to the improvements: first, we show that a simple ϵ-greedy exploration strategy works better in the simulation play-outs than the softmax-based Gibbs measure currently used in CADIAPLAYER and, second, we introduce a general framework based on N-grams for learning promising move sequences. Collectively, these enhancements result in a much improved performance of CADIAPLAYER. For example, in our test suite consisting of five different two-player turn-based games, they led to an impressive average win rate of approximately 70%. The enhancements are also shown to be effective in multiplayer and simultaneous-move games. We additionally perform experiments with the last-good-reply policy (LGRP). The LGRP combined with N-grams is also tested. The LGRP has already been shown to be successful in Go programs and we demonstrate that it also has promise in GGP.

Full Text