PF-MAAC: A learning-based method for probabilistic optimization in time-constrained non-adversarial moving target search

Qihang Peng,Hongliang Guo,Zhengyan Zhang,Chih-Yung Wen,Yaochu Jin

doi:10.1016/j.swevo.2024.101785

Qihang Peng, Hongliang Guo + Show 3 more

https://doi.org/10.1016/j.swevo.2024.101785

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

This paper investigates the multi-robot efficient search (MuRES) problem with a focus on maximizing the probability of capturing a moving target within a predefined time constraint. Given the complexity of the MuRES problem, traditional optimization algorithms often result in significant computational overhead. As a result, learning-assisted intelligent optimization, particularly reinforcement learning (RL), has emerged as a prominent research trend, providing more efficient and adaptive solutions. However, the non-additive nature of the objective to maximize capture probability complicates the direct application of canonical RL-based algorithms. To address the challenge, we propose the probabilistically factorized multi-agent actor-critic (PF-MAAC) algorithm, which serves as a lightweight solution aligned with probability theory specifically designed to handle the complexities of the maximal capture probability objective. PF-MAAC is composed of (1) a generalized temporal difference (GTD) module to establish the temporal-difference relationship of the central value function, (2) a probability-based factorization (P-FAC) module to decompose the central value function into individual ones in a probability-compliant manner, and (3) an extended policy gradient (EPG) module which updates each robot’s actor-network based on the decomposed individual value function. Comparative simulations across various MuRES test environments demonstrate that PF-MAAC outperforms state-of-the-art methods. Furthermore, we successfully deployed PF-MAAC in a real multi-robot system for moving target search in a self-constructed indoor environment, achieving the satisfactory results for different time constraints.

Full Text