Abstract

The plant-wide production process is composed of multiple unit processes, in which the operational indices of each unit process are assigned and adjusted according to product quality, yield, and actual operating modes. Due to the changing operational conditions of the production process, the operational indices cannot be effectively adjusted by most of the model-based methods or evolutionary computation. In this article, the decision making of operational indices is formulated as a continuous state, continuous action reinforcement learning (RL) problem and a model-free RL algorithm is proposed, which learns a decision policy to determine the operational indices according to the actual operational conditions. Different from the existing methods, this article presents a multiactor networks ensemble algorithm and an actor-critic framework with stochastic policy to avoid falling into local optimums. The relatively overall optimal policy is obtained by extracting the results of parallel training of multiactor networks, which guarantees the optimality of the obtained policy. In addition, by using the experience replay, it is particularly valuable to effectively deal with the problem that lacking of sampling data in the model-free RL. Simulation studies are conducted on actual data of a mineral processing plant and the results demonstrate the effectiveness of the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call