Off-policy Q-learning Algorithm Research Articles

Traditional model-based control methods are often not applicable in industrial processes given the typical situation that model parameters are unknown, coefficient matrices are difficult to obtain, and system states are unpredictable. Accordingly, an output feedback fault-tolerant control method based on zero-sum game theory and off-policy Q-learning is presented in this study, with the aim of achieving smooth operation and good tracking performance for industrial processes that often contain sensor faults and disturbances. The specific steps are as follows. First, a system tracking error is introduced into the system to realize a novel extended model. Second, by establishing a performance index function and combining it with minimax theory, the fault-tolerant tracking control problem is converted into a zero-sum game problem. The Bellman and Riccati equations can be established after analyzing the relationship between the performance index and value functions. Then, the Q-function is introduced, and an off-policy Q-learning algorithm is combined with the Kronecker product without knowledge of system model parameters to design an optimal controller unbiased to detection noise. Finally, the effectiveness of the algorithm is verified by considering the injection molding process as an example. The experimental results validate that the designed controller demonstrates good control and extends the range of tolerable faults while maintaining good tracking performance.

This paper presents a novel off-policy game Q-learning algorithm to solve $H_\infty $ control problem for discrete-time linear multi-player systems with completely unknown system dynamics. The primary contribution of this paper lies in that the Q-learning strategy employed in the proposed algorithm is implemented in an off-policy policy iteration approach other than on-policy learning, since the off-policy learning has some well-known advantages over the on-policy learning. All of players struggle together to minimize their common performance index meanwhile defeating the disturbance that tries to maximize the specific performance index, and finally they reach the Nash equilibrium of game resulting in satisfying disturbance attenuation condition. For finding the solution of the Nash equilibrium, $H_\infty $ control problem is first transformed into an optimal control problem. Then an off-policy Q-learning algorithm is put forward in the typical adaptive dynamic programming (ADP) and game architecture, such that control policies of all players can be learned using only measured data. More importantly, the rigorous proof of no bias of solution to the Nash equilibrium by using the proposed off-policy game Q-learning algorithm is presented. Comparative simulation results are provided to verify the effectiveness and demonstrate the advantages of the proposed method.

Off-policy Q-learning Algorithm Research Articles

Related Topics

Articles published on Off-policy Q-learning Algorithm

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method

Formula omitted]output feedback fault-tolerant control of industrial processes based on zero-sum games and off-policy Q-learning

Reinforcement learning based proportional–integral–derivative controllers design for consensus of multi-agent systems

A novel dynamic selection approach using on-policy SARSA algorithm for accurate wind speed prediction

Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state

Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics

H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete‐time multi‐agent systems

Networked controller and observer design of discrete-time systems with inaccurate model parameters

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Gaussian Process Based Model-free Control with Q-Learning

Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.

Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Off-policy Q-learning Algorithm Research Articles

Related Topics

Articles published on Off-policy Q-learning Algorithm

Quadratic Tracking Control of Linear Stochastic Systems with Unknown Dynamics Using Average Off-Policy Q-Learning Method

Zero-sum game-based optimal control for discrete-time Markov jump systems: A parallel off-policy Q-learning method

Formula omitted]output feedback fault-tolerant control of industrial processes based on zero-sum games and off-policy Q-learning

Reinforcement learning based proportional–integral–derivative controllers design for consensus of multi-agent systems

A novel dynamic selection approach using on-policy SARSA algorithm for accurate wind speed prediction

Off-policy Q-learning: Solving Nash equilibrium of multi-player games with network-induced delay and unmeasured state

Novel data-driven two-dimensional Q-learning for optimal tracking control of batch process with unknown dynamics

H∞ Control for Discrete-Time Multi-Player Systems via Off-Policy Q-Learning

Off-Policy Q-Learning for Anti-Interference Control of Multi-Player Systems

Output feedback reinforcement learning based optimal output synchronisation of heterogeneous discrete‐time multi‐agent systems

Networked controller and observer design of discrete-time systems with inaccurate model parameters

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Gaussian Process Based Model-free Control with Q-Learning

Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.

Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes