A team of autonomous decision-making robots can be employed for some critical tasks, such as disaster detection, plant protection, and military reconnaissance. The use of such team of robots to perform well is a challenging problem. To address the challenge, a decentralized multi-robot patrolling problem in uncertain environments without prior knowledge is investigated in this paper. First, the patrolling environment is modeled as an undirected graph, and information is attached to the vertex, that follows an unknown discrete Markov chain. Each robot patrols independently within a particular area, where these areas may intersect each other. Robots coordinate their actions periodically to gather as much valuable information as possible. Then, the patrolling problem is cast as a Bayes-adaptive transition-decoupled partially observable Markov decision process formulation. However, it is intractable to solve the multi-robot patrolling problem with complex interactions. To solve the problem, a scalable decentralized online learning and planning algorithm is proposed by extending the Monte Carlo tree search method, which computes patrols that are bounded optimal. In the proposed algorithm, each local look-ahead tree has a low branching factor and the coordinated manner is not complicated. In addition, we benchmark the proposed algorithm against some state-of-the-art online solvers to empirically assess the performance. The results show that the proposed algorithm is able to achieve high performance in typical decentralized patrolling scenarios.
Read full abstract