Optimal trajectory tracking control of batch crystallization process based on reinforcement learning

Brahim Benyahia,Chris Rielly,Paul Danny Anandan

doi:10.3390/iocc_2020-07731

Abstract

The control of the particle size distribution, crystal habit, and crystal purity is crucial in most crystallization processes to meet the targeted critical quality attributes of the final product. Over the last two decades, several model-free strategies, such as supersaturation control and direct nucleation control, and model-based techniques, such as model predictive control, have been developed and implemented in a wide range of scientific and industrial sectors, particularly the pharmaceutical industry. Despite the significant progress, there is still an increasing demand for more advanced, versatile, robust, and cost-effective optimization and control technologies for both batch and continuous crystallization processes to ensure, through real-time trajectory tracking control, high product quality and reduce batch-to-batch variation and wastes. A novel optimal trajectory tracking control strategies of a batch cooling crystallization processes based on reinforcement learning is presented. The cooling crystallization of paracetamol in water was used as a case study. A model-based technique is implemented, using a dynamic mathematical model validated elsewhere, to reduce the experimental burden and explore wider design and operating spaces. The main objective is to achieve a large crystal size and reduce the deviation from a targeted final yield and coefficient of variation. Several training strategies and reward functions were tested to help achieve robust optimal training of the agent. The agents, with the best training features, were validated against different particle size, temperature, and supersaturation trajectories, then compared to benchmark control techniques, such as supersaturation control and model predictive control.

Highlights

Introduction andMotivation: The control of the particle size distribution, crystal habit and crystal purity are crucial in most crystallization processes to meet the targeted critical quality attributes of the final product [1]. This work tries to address the increasing demand for more advanced, versatile, robust and cost-effective control technologies for crystallization processes [2]. Reinforcement Learning (RL), has gained a lot of interest for process control and optimization, while positively impacting both research and industries [3]. This work proposes a novel RL method for optimal trajectory tracking and control of batch crystallization processes, reducing quality variation and wastes.2
Several RL training strategies were implemented to enhance the performance of the reward functions and achieve robust and optimal training of the agent
Rt - the reward given to the RL agent at time step ‘t’ of the simulation Ts - the time interval after which the RL agent takes a control action Tf - the final time step at which the simulation ends

Summary

Introduction and Motivation:

The control of the particle size distribution, crystal habit and crystal purity are crucial in most crystallization processes to meet the targeted critical quality attributes of the final product [1]. This work tries to address the increasing demand for more advanced, versatile, robust and cost-effective control technologies for crystallization processes [2]. Reinforcement Learning (RL), has gained a lot of interest for process control and optimization, while positively impacting both research and industries [3]. This work proposes a novel RL method for optimal trajectory tracking and control of batch crystallization processes, reducing quality variation and wastes

Problem Formulation:

Overview of the Reward Function

Results and Validation:

Overview of a RL Training Outcome:

Conclusion and Future