Online pricing of demand response based on long short-term memory and reinforcement learning

Xiangyu Kong,Deqian Kong,Jingtao Yao,Linquan Bai,Jie Xiao

doi:10.1016/j.apenergy.2020.114945

Abstract

Incentive-based demand response is playing an increasingly important role in ensuring the safe operation of the power grid and reducing system costs, and advances in information and communications technology have made it possible to implement it online. However, in regions where incentive-based demand response has never been implemented, the response behavior of customers is unknown, in this case, how to quickly and accurately set the incentive price is a challenge for service providers. This paper proposes a pricing method that combines long short-term memory networks and reinforcement learning to solve the pricing problem of service providers when the customers’ response behavior is unknown. Taking the total profit of all response time slots in one day as the optimization goal, long and short-term memory networks are used to learn the relationship between customers’ response behavior and incentive price, and reinforcement learning is used to explore and determine the optimal price. The results show that the combination of these two methods can perform virtual exploration of the optimal price, which solves the disadvantage that reinforcement learning can only rely on delayed rewards to perform exploration in the real scene, thereby speeding up the process of setting the optimal price. In addition, because the influence of the incentive prices combination of different time slots on the profit of the service provider is considered, the negative effect of myopia optimization is avoided.

Full Text