Reinforcement learning optimization of a water resource recovery facility: Evaluating the impact of reward function design on agent training, control optimization, and treatment risk

Henry C Croll,Kaoru Ikuma,Say Kee Ong,Soumik Sarkar

doi:10.1016/j.jwpe.2024.106658

Abstract

This study applied reinforcement learning (RL) optimization to the simulation of a water resource recovery facility (WRRF) to evaluate the impact of reward function design under varying effluent requirements. Several mathematical structures were evaluated for the effluent quality index (EQI) portion of the reward function for the case of current treatment requirements. Of these, a fraction-based structure was found to produce the highest level of optimization, as well as the best mix of results along an optimal risk-reward tradeoff line. The study also found that the training success rate could be tuned by changing the weight given to the EQI. Given the simplicity of the current treatment requirements, agents trained for this case showed a very clear risk-reward tradeoff. The most cost-effective agent reduced operational costs by 10.9 % compared to current operation, equivalent to yearly savings of $267,000. RL agents were also evaluated for the case of future treatment requiring nutrient removal. As the future case was more complex than the current case, relative risk was evaluated using a combination of basic indicators such as maximum effluent value and instantaneous limit exceedance, correlation matrixes to uncover state-action relationships, and challenge testing.

Full Text