Abstract

Reinforcement learning (RL) for continuous state/action space systems has remained a challenge for nonlinear multivariate dynamical systems even at a simulation level. Implementing such schemes for real-time control is still of a difficulty and remains largely unanswered. In this study, several critical strategies for practical implementation of RL are developed, and a multivariable, multi-modal, hybrid three-tank (HTT) physical process is utilized to illustrate the proposed strategies. A successful real-time implementation of RL is reported. The first step is a meta-heuristic first principles model parameter optimization, where a custom pseudo random binary signal (PRBS) is used to obtain open-loop experimental data. This is followed by in silico asynchronous advantage actor–critic (A3C/A-A2C) based policy learning. In the second step, three different approaches (namely proximal learning, single trajectory learning, and multiple trajectory learning) are utilized to explore the state/action space. In the final step, online learning (A2C) using the best in silico policy on the real process using a socket connection is established. The extent of exploration (EoE, a measure of exploration) is proposed as a parameter for quantifying exploration of the state/action space. While the online sample efficiency of RL application is enhanced, a soft constraint based constrained learning is proposed and validated. With considerations of the proposed strategies, this work demonstrates the possibility of applying RL to solve practical control problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call