Abstract

This paper develops a novel off-policy Q-learning method to find the optimal observer gain and the optimal controller for achieving optimality of network-communication based linear discrete-time systems using only measured data. The primary advantage of this off-policy Q-learning method is that it can work for the linear discrete-time systems with inaccurate system model, unmeasurable system states and network-induced delays. To this end, an optimization problem for networked control systems composed of a plant, a state observer and a Smith predictor is formulated first. The Smith predictor is employed to not only compensate network-induced delays, but also make the separation principle hold, thus the observer and controller can be designed separately. Then, the off-policy Q-learning is implemented for learning the optimal observer gain and the optimal controller combined with the Smith predictor, such that a novel off-policy Q-learning algorithm is derived using only input, output and delayed estimated state of systems, not the inaccurate system matrices. The convergences of the iterative observer gain and the iterative controller gain are rigorously proven. Finally, simulation results are given to verify the effectiveness of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call