Abstract

Stochastic dynamic programming (SDP) is a widely-used method for reservoir operations optimization under uncertainty but suffers from the dual curses of dimensionality and modeling. Reinforcement learning (RL), a simulation-based stochastic optimization approach, can nullify the curse of modeling that arises from the need for calculating a very large transition probability matrix. RL mitigates the curse of the dimensionality problem, but cannot solve it completely as it remains computationally intensive in complex multi-reservoir systems. This paper presents a multi-agent RL approach combined with an aggregation/decomposition (AD-RL) method for reducing the curse of dimensionality in multi-reservoir operation optimization problems. In this model, each reservoir is individually managed by a specific operator (agent) while co-operating with other agents systematically on finding a near-optimal operating policy for the whole system. Each agent makes a decision (release) based on its current state and the feedback it receives from the states of all upstream and downstream reservoirs. The method, along with an efficient artificial neural network-based robust procedure for the task of tuning Q-learning parameters, has been applied to a real-world five-reservoir problem, i.e., the Parambikulam–Aliyar Project (PAP) in India. We demonstrate that the proposed AD-RL approach helps to derive operating policies that are better than or comparable with the policies obtained by other stochastic optimization methods with less computational burden.

Highlights

  • Multi-reservoir optimization models are generally non-linear, non-convex, and large-scale in terms of the number of variables and constraints

  • This paper presents an Reinforcement learning (RL)-based model combined with an aggregation–decomposition (AD) approach that reduces the dimensionality problem to efficiently solve a stochastic multi-reservoir operation optimization problem

  • We presented the results of the proposed aggregation decomposition-reinforcement learning (AD-RL) approach for optimizing the Parambikulam–Aliyar Project (PAP) multi-reservoir system operations and compared them with those of three other stochastic optimization methods including MAM-dynamic programming (DP), FP, and Aggregation–Decomposition Dynamic Programming (AD-DP)

Read more

Summary

Introduction

Multi-reservoir optimization models are generally non-linear, non-convex, and large-scale in terms of the number of variables and constraints. Uncertainties in stochastic variables such as inflows, evaporation, and demands make it difficult to find even a sub-optimal operating policy. Two types of stochastic programming approach are used to optimize multi-reservoir systems operations under uncertainty, i.e., implicit and explicit. In implicit stochastic optimization (ISO), a large number of historical or synthetically generated sequences of random variables such as streamflow are generated as the input for a deterministic optimization model. Water 2020, 12, 2688 represent different aspects of the underlying stochastic process, such as spatial or temporal correlations among random variables involved in the process. Optimal operation policies are acquired by performing post-processing analysis on the outputs of the deterministic optimization model solved for different input sequences (samples)

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.