Abstract
Discrete time Markov decision process is studied and the minimum reward risk model is established for the reservoir long-term generation optimization. Different form the commonly used optimization criterion of best expected reward in reservoir scheduling, the probability that the expected generation of the whole period not exceeding the predete reward target to be smallest is chosen as the optimizing target for this random process. For the hydropower tends to operate as peak-clipping mode in Market-based model to gain more profits, the function of electricity price and output can be founded by analyzing on the typical day load course of the electricity system. Compared with the generally used criteria of the largest expectation power generation model, this model is fitted for the decision-making in which the risk is needed to be limited to reflect the risk preference of the policy makers. Stochastic dynamic programming method is adopted to solve the model and the model is tested on the Three Gorges Hydropower Station.
Highlights
Discrete time Markov decision process is studied and the minimum reward risk model is established for the reservoir long-term generation optimization
This paper presents a minimum reward risk model in reservoir optimization scheduling
The specific principle of this model can be described as: for the predetermined level of total hydropower profit during scheduling period, the optimum control strategy of reservoir scheduling is seek to minimize the risk probability that the expected generation reward lower than the predetermined value
Summary
This paper focus on the study of discrete-time, non-time-homogeneous Markov decision model (Liu 2004), which has the following structure: ðSt; At; pt; RtÞ t 1⁄4 0; ::::::; T [. For some state i ∈ St, a set At(i) of allowable actions during the t period can be taken; pt = {piaj, i ∈ St, j ∈ St + 1, a ∈ At(i)} is a system state situation transition probability matrix which satisfies ∑ pai j 1⁄4 1; Function Rt = {Rt (⋅|i, a, j)} is adopt as the reward distribution and j∈Stþ satisfying R([M1, M2]|i, a, j) = 1, where constants M1, M2 denote the upper and lower bound of reward respectively. At, bt mean the system state, action-taken and period reward at time t respectively. (2) Obtain the period reward rt, which is a random variable obeying R(⋅|i, a, j) probability distribution.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.