Abstract
The plant-wide production process is composed of multiple unit processes, in which the operational indices of each unit process are assigned and adjusted according to product quality, yield, and actual operating modes. Due to the changing operational conditions of the production process, the operational indices cannot be effectively adjusted by most of the model-based methods or evolutionary computation. In this article, the decision making of operational indices is formulated as a continuous state, continuous action reinforcement learning (RL) problem and a model-free RL algorithm is proposed, which learns a decision policy to determine the operational indices according to the actual operational conditions. Different from the existing methods, this article presents a multiactor networks ensemble algorithm and an actor-critic framework with stochastic policy to avoid falling into local optimums. The relatively overall optimal policy is obtained by extracting the results of parallel training of multiactor networks, which guarantees the optimality of the obtained policy. In addition, by using the experience replay, it is particularly valuable to effectively deal with the problem that lacking of sampling data in the model-free RL. Simulation studies are conducted on actual data of a mineral processing plant and the results demonstrate the effectiveness of the proposed algorithm.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.