Abstract

Summary Reinforcement Learning (RL) algorithms have been shown to achieve superhuman performance on several challenging computer games (e.g., Chess, GO). More recently, RL algorithm were utilized to solve various control problems in applied engineering fields including manufacturing, renewable energy generation and fluid dynamics drag control. These algorithms essentially learn a deep neural network (aka. policy network) that maps the system states or observations to the optimal actions or controls. In contrast, standard frameworks for solving nonlinear control problems utilize a two-stage approach. The first is a model calibration step where all the available observations are used to fit the model. The second is an optimization step where predictions obtained from the calibrated model are used to find optimal actions. This two-step approach relies on repeated model calibration whenever a new set of observations are available. As a result, it is vulnerable to overfitting where the calibrated model underestimates the uncertainties in the model parameters. RL algorithms side-step the model calibration phase and learns optimal control policies by repeated interactions with the un-calibrated flow models. This is conceptually similar to exploration-exploitation based optimization algorithms and requires a large number of flow simulations. In this work, we utilize a combination of low fidelity proxy-models with the high-fidelity simulations to reduce the computational cost of RL algorithms. Intuitively, proxy models are utilized for the initial sampling (exploration phase) of the RL algorithms. Following that, the control policies are refined using high fidelity simulations resulting in significant computational gains. The combined use of proxy models and full-scale simulations is effectively formulated as a multi-fidelity RL framework for optimal control of physical systems governed by a set of partial differential equations (PDEs). A novel technique based on domain randomization and clustering is developed to account for the model parameters uncertainties (e.g., subsurface properties like the permeability field). The proposed framework is demonstrated using a state-of-the-art, model-free policy based RL algorithm called proximal policy optimisation (PPO) on two subsurface flow test cases representing two distinct uncertainty distributions of the permeability field. The results are benchmarked against optimisation results obtained using differential evolution (DE) algorithm. The robustness of the learned control policy is demonstrated on unseen model parameter samples that were not used during the training process. In terms of computational efficiency, we observe significant saving in simulation runtime (approx. 60 to 70%) when utilizing the proposed multi-fidelity RL framework when compared to RL with high fidelity simulations.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call