Hybrid Training Strategies: Improving Performance of Temporal Difference Learning in Board Games

Jesús Fernández-Conde,José M Cañas,Pedro Cuenca-Jiménez

doi:10.3390/app12062854

Abstract

Temporal difference (TD) learning is a well-known approach for training automated players in board games with a limited number of potential states through autonomous play. Because of its directness, TD learning has become widespread, but certain critical difficulties must be solved in order for it to be effective. It is impractical to train an artificial intelligence (AI) agent against a random player since it takes millions of games for the agent to learn to play intelligently. Training the agent against a methodical player, on the other hand, is not an option owing to a lack of exploration. This article describes and examines a variety of hybrid training procedures for a TD-based automated player that combines randomness with specified plays in a predetermined ratio. We provide simulation results for the famous tic-tac-toe and Connect-4 board games, in which one of the studied training strategies significantly surpasses the other options. On average, it takes fewer than 100,000 games of training for an agent taught using this approach to act as a flawless player in tic-tac-toe.

Highlights

Reinforcement learning (RL) is the study of how artificial intelligence (AI) agents may learn what to do in a particular environment without having access to labeled examples [1,2]
We provide simulation results for the famous tic-tac-toe and Connect-4 board games, in which one of the studied training strategies significantly surpasses the other options
Following the learning techniques and rules described in the previous section, we created two Temporal difference (TD) learning agents, one for the tic-tac-toe game and another for the 4 × 4 Connect-4 game

Summary

Introduction

Reinforcement learning (RL) is the study of how artificial intelligence (AI) agents may learn what to do in a particular environment without having access to labeled examples [1,2]. The RL agent learns to play board games by obtaining feedback (reward, reinforcement) after the conclusion of each game [3], knowing that something good has happened after winning or something bad has happened after losing. This agent acts without having any prior knowledge of the appropriate strategies to win the game. An RL agent, on the other hand, can learn an evaluation function that yields relatively accurate estimations of the likelihood of winning from any given position by knowing whether it won or lost each game played

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid Training Strategies: Improving Performance of Temporal Difference Learning in Board Games

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Journal: Applied Sciences	Publication Date: Mar 10, 2022
License type: CC BY 4.0

Similar Papers

An Efficient Training Strategy for a Temporal Difference Learning Based Tic-Tac-Toe Automatic Player
Jesús Fernández-Conde ... Pedro Cuenca-Jiménez
-
Jesús Fernández-Conde, et. al.Jesús Fernández-Conde ... Pedro Cuenca-Jiménez
03 Nov 2019
03 Nov 2019

Temporal difference learning with eligibility traces for the game connect four
Markus Thill ... Samineh Bagheri
-
Markus Thill, et. al.Markus Thill ... Samineh Bagheri
01 Aug 2014
01 Aug 2014

AlphaDDA: strategies for adjusting the playing strength of a fully trained AlphaZero system to a suitable human training partner
Kazuhisa Fujita
PeerJ Computer Science | VOL. 8
Kazuhisa FujitaKazuhisa Fujita
04 Oct 2022
PeerJ Computer Science | VOL. 8

On Constructing Static Evaluation Function using Temporal Difference Learning
Samuel Choi Ping Man
Computer Engineering and Applications Journal | VOL. 2
Samuel Choi Ping ManSamuel Choi Ping Man
25 Feb 2013
Computer Engineering and Applications Journal | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Training Strategies: Improving Performance of Temporal Difference Learning in Board Games

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences