AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)

Kevin Du,Yingying Wu,Ian Gemp,Yi Wu

doi:10.1609/aaai.v37i13.26962

Abstract

Reinforcement learning has been used to approach well-known NP-hard combinatorial problems in graph theory. Among these, Hamiltonian cycle problems are exceptionally difficult to analyze, even when restricted to individual instances of structurally complex graphs. In this paper, we use Monte Carlo Tree Search (MCTS), the search algorithm behind many state-of-the-art reinforcement learning algorithms such as AlphaZero, to create autonomous agents that learn to play the game of Snake, a game centered on properties of Hamiltonian cycles on grid graphs. The game of Snake can be formulated as a single-player discounted Markov Decision Process (MDP), where the agent must behave optimally in a stochastic environment. Determining the optimal policy for Snake, defined as the policy that maximizes the probability of winning -- or win rate -- with higher priority and minimizes the expected number of time steps to win with lower priority, is conjectured to be NP-hard. Performance-wise, compared to prior work in the Snake game, our algorithm is the first to achieve a win rate over 0.5 (a uniform random policy achieves a win rate < 2.57 x 10^{-15}), demonstrating the versatility of AlphaZero in tackling NP-hard problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Similar Papers

A linear-time algorithm for finding Hamiltonian cycles in rectangular grid graphs with two rectangular holes
Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
Optimization Methods and Software | VOL. 38
Fatemeh Keshavarz-Kohjerdi, et. al.Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
05 Jan 2023
Optimization Methods and Software | VOL. 38

Linear-time algorithms for finding Hamiltonian and longest [formula omitted]-paths in [formula omitted]-shaped grid graphs
Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
Discrete Optimization | VOL. 35
Fatemeh Keshavarz-Kohjerdi, et. al.Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
03 Sep 2019
Discrete Optimization | VOL. 35

Finding Hamiltonian cycles of truncated rectangular grid graphs in linear time
Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
Applied Mathematics and Computation | VOL. 436
Fatemeh Keshavarz-Kohjerdi, et. al.Fatemeh Keshavarz-Kohjerdi ... Alireza Bagheri
08 Sep 2022
Applied Mathematics and Computation | VOL. 436

Correspondence between the Hamiltonian cycle problem and the quantum lattice gauge theory
Xiaopeng Cui ... Yu Shi
Europhysics Letters | VOL. 144
Xiaopeng Cui, et. al.Xiaopeng Cui ... Yu Shi
01 Nov 2023
Europhysics Letters | VOL. 144

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AlphaSnake: Policy Iteration on a Nondeterministic NP-Hard Markov Decision Process (Student Abstract)

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence