BackgroundThe significant time delay between the manipulated and controlled variables introduces challenges in the task of system identification when implementing model predictive control (MPC) for an industrial process. Recently, deep reinforcement learning (DRL) with model-free characteristics has attracted considerable attention from the process control community. However, the model-free assumption in DRL is based on the property of the Markov decision process (MDP), in which all state variables must be observed. This assumption is not true for an industrial process. MethodsIn this study, the sequence-to-sequence (Seq2seq) network was employed to build a surrogate model based on the industrial Claus process data. Meanwhile, the hidden state output from the encoder of the Seq2seq network, which represents the dynamic feature of the process, connects to the DRL-based controller to compensate for the partial observabilities of a real process. Significant findingsThe results show that the standard deviation of the control variable, which refers to the H2S to SO2 concentration ratio in the tail gas, can be reduced by up to 55% using the proposed DRL-based controller comparing with the current control strategy.