MCTS Based on Simple Regret

David Tolpin,Solomon Shimony

doi:10.1609/aaai.v26i1.8126

Abstract

UCT, a state-of-the art algorithm for Monte Carlo tree search (MCTS) in games and Markov decision processes, is based on UCB, a sampling policy for the Multi-armed Bandit problem (MAB) that minimizes the cumulative regret. However, search differs from MAB in that in MCTS it is usually only the final ``arm pull'' (the actual move selection) that collects a reward, rather than all ``arm pulls''. Therefore, it makes more sense to minimize the simple regret, as opposed to the cumulative regret. We begin by introducing policies for multi-armed bandits with lower finite-time and asymptotic simple regret than UCB, using it to develop a two-stage scheme (SR+CR) for MCTS which outperforms UCT empirically. Optimizing the sampling process is itself a metareasoning problem, a solution of which can use value of information (VOI) techniques. Although the theory of VOI for search exists, applying it to MCTS is non-trivial, as typical myopic assumptions fail. Lacking a complete working VOI theory for MCTS, we nevertheless propose a sampling scheme that is ``aware'' of VOI, achieving an algorithm that in empirical evaluation outperforms both UCT and the other proposed algorithms.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MCTS Based on Simple Regret

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence	Publication Date: Sep 20, 2021
Citations: 8

Similar Papers

VOI-aware MCTS
...
-
, et. al. ...
27 Aug 2012
27 Aug 2012

MCTS Based on Simple Rerget
David Tolpin ... Solomon Shimony
Proceedings of the International Symposium on Combinatorial Search | VOL. 3
David Tolpin, et. al.David Tolpin ... Solomon Shimony
20 Aug 2021
Proceedings of the International Symposium on Combinatorial Search | VOL. 3

Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search
Tom Pepels ... Marc Lanctot
-
Tom Pepels, et. al.Tom Pepels ... Marc Lanctot
01 Jan 2014
01 Jan 2014

Regulation of exploration for simple regret minimization in Monte-Carlo tree search
Yun-Ching Liu ... Yoshimasa Tsuruoka
-
Yun-Ching Liu, et. al.Yun-Ching Liu ... Yoshimasa Tsuruoka
01 Aug 2015
01 Aug 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MCTS Based on Simple Regret

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence