Abstract
An optimal robust control solution for general nonlinear systems with unknown but observable dynamics is advanced here. The underlying Hamilton-Jacobi-Isaacs (HJI) equation of the corresponding zero-sum two-player game (ZS-TP-G) is learned using a Q-learning-based approach employing only input-output system measurements, assuming system observability. An equivalent virtual state-space model is built from the system’s input-output samples and it is shown that controlling the former implies controlling the latter. Since the existence of a saddle-point solution to the ZS-TP-G is assumed unverifiable, the solution is derived in terms of upper-optimal and lower-optimal controllers. The learning convergence is theoretically ensured while practical implementation is performed using neural networks that provide scalability to the control problem dimension and automatic feature selection. The learning strategy is checked on an active suspension system, a good candidate for the robust control problem with respect to road profile disturbance rejection.
Highlights
Feedback control systems that are robust when faced with external disturbances are a common challenge and frequently pose a direct or indirect design specification
Not using the entire state for learning optimal control poses great challenges to the learning process since the system is a partially observable one. This is the reason why some Approximate Dynamics Programming (ADP) approaches for solving the HJI zero-sum two-player game (ZS-TP-G) were devised for handling observable systems and they rely only on input-output (IO) samples collected from the system
We summarize the neural networks (NNs)-based solutions to the ZS-TP-G aiming at computing the upper-optimal and lower-optimal Qfunctions and upper-optimal and lower-optimal controllers, respectively, using the batch-fitted Q-learning style
Summary
Feedback control systems that are robust when faced with external disturbances are a common challenge and frequently pose a direct or indirect design specification. Not using the entire state for learning optimal control poses great challenges to the learning process since the system is a partially observable one This is the reason why some ADP approaches for solving the HJI ZS-TP-G were devised for handling observable systems and they rely only on input-output (IO) samples collected from the system. - extension of the Q-learning approach to solve the optimal robust control problem as a ZS-TP-G solution to the HJI equation of general unknown nonlinear observable systems. The active suspension system is a well-suited candidate for learning robust control since it inherently deals with the road profile disturbance rejection when employed on a variety of transportation vehicles (cars, trains, etc.) and it presents itself as a naturally underdamped system stemming from the twomass-spring-damper class of systems On another hand, the suspension system is a high order one (it has six natural states when the active hydraulic actuator dynamics is considered) and it makes it costly to measure all states.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have