Abstract

Applications of discrete–continuous hybrid action decision-making are more common in real life. However, there are fewer studies on multi-robot deep reinforcement learning based on parameterized action spaces. Cooperative decision-making for soccer robots is the representative task for studying it. In this paper, the reward function is desired to guide the learning of cooperative offensive for soccer robots. Hence, the shooting angle reward is designed to improve the scoring rate based on the basic reward function. Moreover, a MADDPG network structure based on bi-channel Q-value estimation (BI-MAPDDPG) is proposed. Two channels of Critic network with the discrete action weight deal with coupling between the discrete action and continuous action parameters well. Finally, simulation results show that soccer robots’ cooperative offensive decision-making based on BI-MAPDDPG is robust and scalable.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call