Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning

Marco A Wiering

doi:10.4236/jilsa.2010.22009

Abstract

A promising approach to learn to play board games is to use reinforcement learning algorithms that can learn a game position evaluation function. In this paper we examine and compare three different methods for generating training games: 1) Learning by self-play, 2) Learning by playing against an expert program, and 3) Learning from viewing ex-perts play against each other. Although the third possibility generates high-quality games from the start compared to initial random games generated by self-play, the drawback is that the learning program is never allowed to test moves which it prefers. Since our expert program uses a similar evaluation function as the learning program, we also examine whether it is helpful to learn directly from the board evaluations given by the expert. We compared these methods using temporal difference methods with neural networks to learn the game of backgammon.

Highlights

The success of the backgammon learning program temporal difference (TD)-Gammon of Tesauro (1992, 1995) was probably the greatest demonstration of the impressive ability of machine learning techniques to learn to play games
Since our expert program uses a similar evaluation function as the learning program, we examine whether it is helpful to learn directly from the board evaluations given by the expert
In this paper we study the class of reinforcement learning methods named temporal difference (TD) methods

Summary

Introduction

The success of the backgammon learning program TD-Gammon of Tesauro (1992, 1995) was probably the greatest demonstration of the impressive ability of machine learning techniques to learn to play games. TDGammon used reinforcement learning [1,2] techniques, in particular temporal difference (TD) learning [2,3], for learning a backgammon evaluation function from training games generated by letting the program play against itself. This has led to a large increase of interest in such machine learning methods for evolving game playing computer programs from a randomly initialized program (i.e., initially there is no a priori knowledge of the game evaluation function, except for a human extraction of relevant input features). Samuel (1959, 1967) pioneered research in the use of machine learning approaches in his work on learning a checkers program In his work he already proposed an early version of temporal difference learning for learning an evaluation function. Reinforcement learning has been applied to learn a variety of games, including backgammon [7,8], chess [9,10], checkers [11,12,13], and Go [14]

Methods

Findings

Discussion

Conclusion