Abstract
This paper investigates the multi-robot efficient search (MuRES) problem with a focus on maximizing the probability of capturing a moving target within a predefined time constraint. Given the complexity of the MuRES problem, traditional optimization algorithms often result in significant computational overhead. As a result, learning-assisted intelligent optimization, particularly reinforcement learning (RL), has emerged as a prominent research trend, providing more efficient and adaptive solutions. However, the non-additive nature of the objective to maximize capture probability complicates the direct application of canonical RL-based algorithms. To address the challenge, we propose the probabilistically factorized multi-agent actor-critic (PF-MAAC) algorithm, which serves as a lightweight solution aligned with probability theory specifically designed to handle the complexities of the maximal capture probability objective. PF-MAAC is composed of (1) a generalized temporal difference (GTD) module to establish the temporal-difference relationship of the central value function, (2) a probability-based factorization (P-FAC) module to decompose the central value function into individual ones in a probability-compliant manner, and (3) an extended policy gradient (EPG) module which updates each robot’s actor-network based on the decomposed individual value function. Comparative simulations across various MuRES test environments demonstrate that PF-MAAC outperforms state-of-the-art methods. Furthermore, we successfully deployed PF-MAAC in a real multi-robot system for moving target search in a self-constructed indoor environment, achieving the satisfactory results for different time constraints.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have