Abstract

Although several variants of stochastic dynamic programming have been applied to optimal operation of multireservoir systems, they have been plagued by a high‐dimensional state space and the inability to accurately incorporate the stochastic environment as characterized by temporally and spatially correlated hydrologic inflows. Reinforcement learning has emerged as an effective approach to solving sequential decision problems by combining concepts from artificial intelligence, cognitive science, and operations research. A reinforcement learning system has a mathematical foundation similar to dynamic programming and Markov decision processes, with the goal of maximizing the long‐term reward or returns as conditioned on the state of the system environment and the immediate reward obtained from operational decisions. Reinforcement learning can include Monte Carlo simulation where transition probabilities and rewards are not explicitly known a priori. The Q‐Learning method in reinforcement learning is demonstrated on the two‐reservoir Geum River system, South Korea, and is shown to outperform implicit stochastic dynamic programming and sampling stochastic dynamic programming methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call