Abstract

Deep reinforcement learning, by taking advantage of neural networks, has made great strides in the continuous control of robots. However, in scenarios where multiple robots are required to collaborate with each other to accomplish a task, it is still challenging to build an efficient and scalable multi-agent control system due to increasing complexity. In this paper, we regard each unmanned aerial vehicle (UAV) with its manipulator as one agent, and leverage the power of multi-agent deep deterministic policy gradient (MADDPG) for the cooperative navigation and manipulation of a load. We propose solutions for addressing navigation to grasping point problem in targeted and flexible scenarios, and mainly focus on how to develop model-free policies for the UAVs without relying on a trajectory planner. To overcome the challenges of learning in scenarios with an increasing number of grasping points, we incorporate the demonstrations from an Optimal Reciprocal Collision Avoidance (ORCA) algorithm into our framework to guide the policy training and adapt two novel techniques into the architecture of MADDPG. Furthermore, curriculum learning with the attention mechanism is utilized by reusing knowledge from fewer grasping points to facilitate the training of a load with more points. Our experiments were validated by a load with three, four and six grasping points respectively in Coppeliasim simulator and then transferred into the real world with Crazyflie quadrotors. Our results show that the average tracking deviations from the desirable grasping point to the final position of the UAV can be less than 10 cm in some real-world experiments. Compared with state-of-the-art model-free reinforcement learning and swarm optimization algorithms, results show that our proposed methods outperform other baselines with a reasonable success rate especially in the scenarios with more grasping points. Furthermore, the learned optimal policies enable UAVs to reach and hover over all the grasping points before manipulation without any collision. We conducted a comprehensive analysis of both targeted and flexible navigation, highlighting their respective advantages and disadvantages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call