Offline reinforcement learning for industrial process control: A case study from steel industry

Jifei Deng,Seppo Sierla,Jie Sun,Valeriy Vyatkin

doi:10.1016/j.ins.2023.03.019

Jifei Deng, Seppo Sierla + Show 2 more

Open Access

https://doi.org/10.1016/j.ins.2023.03.019

Copy DOI

Abstract

Flatness is a crucial indicator of strip quality that presents a challenge in regulation due to the high-speed process and the nonlinear relationship between flatness and process parameters. Conventional methods for controlling flatness are based on the first principles, empirical models, and predesigned rules, which are less adaptable to changing rolling conditions. To address this limitation, this paper proposed an offline reinforcement learning (RL) based data-driven method for flatness control. Based on the data collected from a factory, the offline RL method can learn the process dynamics from data to generate a control policy. Unlike online RL methods, the proposed method does not require a simulator for training, the policy can be potentially safer and more accurate since a simulator involves simplifications that can introduce bias. To obtain a steady performance, the proposed method incorporated ensemble Q-functions into policy evaluation to address uncertainty estimation. To address distributional shifts, based on Q-values from ensemble Q-functions, behavior cloning was added to policy improvement. Simulation and comparison results showed that the proposed method outperformed the state-of-the-art offline RL methods and achieved the best performance in producing strips with lower flatness.

Full Text