PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm

Yashar Naderzadeh,Daniel Grosu,Ratna Babu Chinnam

doi:10.1016/j.jpdc.2024.104944

Abstract

Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as Parallel Partial-Backpropagation MCTS (PPB-MCTS). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.

Full Text