Off-Policy Reinforcement Learning: Optimal Operational Control for Two-Time-Scale Industrial Processes.

Jinna Li,Bahare Kiumarsi,Tianyou Chai,Frank L Lewis,Jialu Fan

doi:10.1109/tcyb.2017.2761841

Abstract

Industrial flow lines are composed of unit processes operating on a fast time scale and performance measurements known as operational indices measured at a slower time scale. This paper presents a model-free optimal solution to a class of two time-scale industrial processes using off-policy reinforcement learning (RL). First, the lower-layer unit process control loop with a fast sampling period and the upper-layer operational index dynamics at a slow time scale are modeled. Second, a general optimal operational control problem is formulated to optimally prescribe the set-points for the unit industrial process. Then, a zero-sum game off-policy RL algorithm is developed to find the optimal set-points by using data measured in real-time. Finally, a simulation experiment is employed for an industrial flotation process to show the effectiveness of the proposed method.

Full Text