Abstract

Reinforcement learning needs to learn policies through trial and error. The unstable policies in the early stage of training make it expensive (and time-consuming) to train directly in the real environment, which may cause disastrous consequences. The popular solution is to use the simulator to train the policy and deploy it in a real environment. However, the modeling error and external disturbance between the simulation and the real environment may fail the physical deployment, resulting in the sim2real transfer problem. In this letter, we propose a novel robust adversarial reinforcement learning framework, which uses the ensemble training of multi-adversarial agents that can adaptively adjust adversaries' strength to enhance RL policy's robustness. More specifically, we take the accumulative reward as feedback and construct a PID controller to adjust the adversary's output magnitude to perform the adversarial training well. Experiments in the simulated and the real environment show that our algorithm improves the generalization ability of the policy for the modeling error and the uncertain disturbance simultaneously, outperforming the next best prior methods across all domains. The algorithm was further proven to be effective in a sim2real transfer task through the load experiment of a real racing drone, and the tracking performance is better than the PID-based flight controller.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.