Abstract

Temporal difference (TD) learning is a well-known approach for training automated players in board games with a limited number of potential states through autonomous play. Because of its directness, TD learning has become widespread, but certain critical difficulties must be solved in order for it to be effective. It is impractical to train an artificial intelligence (AI) agent against a random player since it takes millions of games for the agent to learn to play intelligently. Training the agent against a methodical player, on the other hand, is not an option owing to a lack of exploration. This article describes and examines a variety of hybrid training procedures for a TD-based automated player that combines randomness with specified plays in a predetermined ratio. We provide simulation results for the famous tic-tac-toe and Connect-4 board games, in which one of the studied training strategies significantly surpasses the other options. On average, it takes fewer than 100,000 games of training for an agent taught using this approach to act as a flawless player in tic-tac-toe.

Highlights

  • Reinforcement learning (RL) is the study of how artificial intelligence (AI) agents may learn what to do in a particular environment without having access to labeled examples [1,2]

  • We provide simulation results for the famous tic-tac-toe and Connect-4 board games, in which one of the studied training strategies significantly surpasses the other options

  • Following the learning techniques and rules described in the previous section, we created two Temporal difference (TD) learning agents, one for the tic-tac-toe game and another for the 4 × 4 Connect-4 game

Read more

Summary

Introduction

Reinforcement learning (RL) is the study of how artificial intelligence (AI) agents may learn what to do in a particular environment without having access to labeled examples [1,2]. The RL agent learns to play board games by obtaining feedback (reward, reinforcement) after the conclusion of each game [3], knowing that something good has happened after winning or something bad has happened after losing. This agent acts without having any prior knowledge of the appropriate strategies to win the game. An RL agent, on the other hand, can learn an evaluation function that yields relatively accurate estimations of the likelihood of winning from any given position by knowing whether it won or lost each game played

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.