Abstract

This paper develops a novel off-policy Q-learning method to find the optimal observer gain and the optimal controller for achieving optimality of network-communication based linear discrete-time systems using only measured data. The primary advantage of this off-policy Q-learning method is that it can work for the linear discrete-time systems with inaccurate system model, unmeasurable system states and network-induced delays. To this end, an optimization problem for networked control systems composed of a plant, a state observer and a Smith predictor is formulated first. The Smith predictor is employed to not only compensate network-induced delays, but also make the separation principle hold, thus the observer and controller can be designed separately. Then, the off-policy Q-learning is implemented for learning the optimal observer gain and the optimal controller combined with the Smith predictor, such that a novel off-policy Q-learning algorithm is derived using only input, output and delayed estimated state of systems, not the inaccurate system matrices. The convergences of the iterative observer gain and the iterative controller gain are rigorously proven. Finally, simulation results are given to verify the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.