Abstract

An optimal robust control solution for general nonlinear systems with unknown but observable dynamics is advanced here. The underlying Hamilton-Jacobi-Isaacs (HJI) equation of the corresponding zero-sum two-player game (ZS-TP-G) is learned using a Q-learning-based approach employing only input-output system measurements, assuming system observability. An equivalent virtual state-space model is built from the system’s input-output samples and it is shown that controlling the former implies controlling the latter. Since the existence of a saddle-point solution to the ZS-TP-G is assumed unverifiable, the solution is derived in terms of upper-optimal and lower-optimal controllers. The learning convergence is theoretically ensured while practical implementation is performed using neural networks that provide scalability to the control problem dimension and automatic feature selection. The learning strategy is checked on an active suspension system, a good candidate for the robust control problem with respect to road profile disturbance rejection.

Highlights

  • Feedback control systems that are robust when faced with external disturbances are a common challenge and frequently pose a direct or indirect design specification

  • Not using the entire state for learning optimal control poses great challenges to the learning process since the system is a partially observable one. This is the reason why some Approximate Dynamics Programming (ADP) approaches for solving the HJI zero-sum two-player game (ZS-TP-G) were devised for handling observable systems and they rely only on input-output (IO) samples collected from the system

  • We summarize the neural networks (NNs)-based solutions to the ZS-TP-G aiming at computing the upper-optimal and lower-optimal Qfunctions and upper-optimal and lower-optimal controllers, respectively, using the batch-fitted Q-learning style

Read more

Summary

INTRODUCTION

Feedback control systems that are robust when faced with external disturbances are a common challenge and frequently pose a direct or indirect design specification. Not using the entire state for learning optimal control poses great challenges to the learning process since the system is a partially observable one This is the reason why some ADP approaches for solving the HJI ZS-TP-G were devised for handling observable systems and they rely only on input-output (IO) samples collected from the system. - extension of the Q-learning approach to solve the optimal robust control problem as a ZS-TP-G solution to the HJI equation of general unknown nonlinear observable systems. The active suspension system is a well-suited candidate for learning robust control since it inherently deals with the road profile disturbance rejection when employed on a variety of transportation vehicles (cars, trains, etc.) and it presents itself as a naturally underdamped system stemming from the twomass-spring-damper class of systems On another hand, the suspension system is a high order one (it has six natural states when the active hydraulic actuator dynamics is considered) and it makes it costly to measure all states.

THE UNKNOWN OBSERVABLE SYSTEM
THE ZS CONTROL PROBLEM DEFINITION AND SOLUTION
ZS-TP-G NN IMPLEMENTATION
A STATE FEEDBACK OPTIMAL CONTROLLER NN IMPLEMENTATION
THE ACTIVE SUSPENSION SYSTEM
ACTIVE SUSPENSION SYSTEM OBSERVABILITY DISCUSSION
COLLECTIONG TRANSITION SAMPLES FOR THE LEARNING PROCESS
CONTROLLER LEARNING SETTINGS AND RESULTS
COMPARISONS AND DISCUSSIONS OF THE RESULTS
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call