Abstract
Robust adversarial reinforcement learning is an effective method to train agents to manage uncertain disturbance and modeling errors in real environments. However, for systems that are sensitive to disturbances or those that are difficult to stabilize, it is easier to learn a powerful adversary than establish a stable control policy. An improper strong adversary can destabilize the system, introduce biases in the sampling process, make the learning process unstable, and even reduce the robustness of the policy. In this study, we consider the problem of ensuring system stability during training in the adversarial reinforcement learning architecture. The dissipative principle of robust H-infinity control is extended to the Markov Decision Process, and robust stability constraints are obtained based on L2 gain performance in the reinforcement learning system. Thus, we propose a dissipation-inequation-constraint-based adversarial reinforcement learning architecture. This architecture ensures the stability of the system during training by imposing constraints on the normal and adversarial agents. Theoretically, this architecture can be applied to a large family of deep reinforcement learning algorithms. Results of experiments in MuJoCo and GymFc environments show that our architecture effectively improves the robustness of the controller against environmental changes and adapts to more powerful adversaries. Results of the flight experiments on a real quadcopter indicate that our method can directly deploy the policy trained in the simulation environment to the real environment, and our controller outperforms the PID controller based on hardware-in-the-loop. Both our theoretical and empirical results provide new and critical outlooks on the adversarial reinforcement learning architecture from a rigorous robust control perspective.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceedings of the AAAI Conference on Artificial Intelligence
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.