Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Bonwoo Gu,Yunsick Sung

doi:10.3390/app11031291

Abstract

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

Highlights

Gomoku is a two-player board game that originated in ancient China
As convolutional neural networks (CNN)-based decision-making determines a single optimal position in the same recognized Gomoku board state, some cases arise where a stone cannot be placed in the relevant position
Cao et al presented a Gomoku artificial intelligence (AI) model using an algorithm that combined the upper confidence bounds that were applied to the trees (UCT) [12] and adaptive dynamic programming (ADP) [13,14]

Summary

Introduction

Gomoku is a two-player board game that originated in ancient China. Two players alternate in placing a stone of their choice of color, and the player who first completes the five-in-a-row horizontally, vertically, or diagonally wins the game. Efficient Gomoku board recognition and decision-making was made possible while using a convolution layer of the deep-learning convolutional neural networks (CNN) algorithm [3]. As CNN-based decision-making determines a single optimal position in the same recognized Gomoku board state, some cases arise where a stone cannot be placed in the relevant position. We propose an improved reinforcement learning-based high-level decision algorithm while using CNN. We verify the performance of the proposed reinforcement learning algorithm by applying it to Gomoku. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform. This paper is expected to contribute to the field of incremental algorithms of reinforcement learning and deep learning-based 3D simulation by introducing the functions and performance of GuPyEngine.

Related Works

Framework Overview

ANN-Based One-Hot Encoding Vector Combination Stage

CNN Training Stage

Experiments

Experimental Environment

GuPyEngine

Number of Winning Games

Findings

Next Best Answer Selection

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Feb 1, 2021
Citations: 26	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Sample-Based Tree Search with Fixed and Adaptive State Abstractions
Jesse Hostetler ... Alan Fern
Journal of Artificial Intelligence Research | VOL. 60
Jesse Hostetler, et. al.Jesse Hostetler ... Alan Fern
14 Dec 2017
Journal of Artificial Intelligence Research | VOL. 60

Reinforcement learning and simulation-based search in computer go

-

01 Jan 2009
01 Jan 2009

Development of rehabilitation system (RehabGame) through Monte-Carlo tree search algorithm using kinect and Myo sensor interface
Shabnam Sadeghi Esfahlani ... George Wilson
-
Shabnam Sadeghi Esfahlani, et. al.Shabnam Sadeghi Esfahlani ... George Wilson
01 Jul 2017
01 Jul 2017

Manipulating the Distributions of Experience used for Self-Play Learning in Expert Iteration
Dennis J N J Soemers ... Eric Piette
-
Dennis J N J Soemers, et. al.Dennis J N J Soemers ... Eric Piette
01 Aug 2020
01 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences