Abstract

In this paper, a novel model-free reinforcement learning method based on off-policy is introduced to solve nonzero-sum games of discrete-time linear systems. Compared with the traditional policy iteration (PI) method, which requires the knowledge of system dynamics, the proposed method can be trained by state data directly. Moreover, the traditional PI method is proved to be influenced by probing noises. In the analysis of the proposed method, the probing noises are specifically considered and proved to have no influence on the convergence. The solution of the optimal Nash equilibrium is deduced. It is also proved that the proposed algorithm can be applied in both online manner and offline manner. A simulation of the nonzero-sum games control problem on an F-16 aircraft discrete-time system is presented, and the results verify the effectiveness of the proposed algorithm.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.