Adaptive Reward for CAV Action Planning using Monte Carlo Tree Search

Dhruvkumar Patel,Rym Zalila-Wenkstern

doi:10.1109/itsc48978.2021.9564688

Abstract

Cooperative action planning for Connected and Autonomous Vehicles (CAVs) in an emergency scenario is an important task in the autonomous driving domain. Reinforcement learning algorithms such as Monte Carlo Tree Search (MCTS) have popularly been used to solve this problem with some success. MCTS rely on performing many simulations of CAV actions to learn expected reward values for CAV actions. A refined reward function design is a necessary precondition for better success rates in MCTS. Traditionally, predefined reward functions with fixed reward parameters are used in all CAVs scenarios by most MCTS-based algorithms. This paper presents a novel Monte Carlo Tree Search (MCTS) based algorithm that dynamically modifies the reward function parameters to encourage or discourage particular CAV actions. Our proposed algorithm with a dynamic reward function significantly improves the reliability of MCTS having a fixed reward function. We evaluate the proposed algorithm in a large-scale multi-agent-based traffic simulation system. Experimental results show that our algorithm significantly improves upon current state-of-the-art centralized and decentralized algorithms.

Full Text