Abstract
Generating collision free, time efficient paths for followers is a challenging problem in formation control with collision avoidance. Specifically, the followers have to consider both formation maintenance and collision avoidance at the same time. Recent works have shown the potentialities of deep reinforcement learning (DRL) to learn collision avoidance policies. However, only the collision factor was considered in the previous works. In this paper, we extend the learning-based policy to the area of formation control by learning a comprehensive task. In particular, a two-stage training scheme is adopted including imitation learning and reinforcement learning. A fusion reward function is proposed to lead the training. Besides, a formation-oriented network architecture is presented for environment perception and long short-term memory (LSTM) is applied to perceive the information of an arbitrary number of obstacles. Various simulations are carried out and the results show the proposed algorithm is able to anticipate the dynamic information of the environment and outperforms traditional methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have