Cooperative offensive decision-making for soccer robots based on bi-channel Q-value evaluation MADDPG

Lingli Yu,Keyi Li,Shuxin Huo,Kaijun Zhou

doi:10.1016/j.engappai.2023.105994

Abstract

Applications of discrete–continuous hybrid action decision-making are more common in real life. However, there are fewer studies on multi-robot deep reinforcement learning based on parameterized action spaces. Cooperative decision-making for soccer robots is the representative task for studying it. In this paper, the reward function is desired to guide the learning of cooperative offensive for soccer robots. Hence, the shooting angle reward is designed to improve the scoring rate based on the basic reward function. Moreover, a MADDPG network structure based on bi-channel Q-value estimation (BI-MAPDDPG) is proposed. Two channels of Critic network with the discrete action weight deal with coupling between the discrete action and continuous action parameters well. Finally, simulation results show that soccer robots’ cooperative offensive decision-making based on BI-MAPDDPG is robust and scalable.

Full Text