The behind-the-meter (BTM) energy management problem has recently attracted a lot of attention due to the increase in the number of residential photovoltaic (PV)-battery energy storage systems (BESSs). In this work, the use of deep reinforcement learning (DRL) combined with a novel heuristic model for real-time control of home batteries is investigated. The control problem is formulated as a finite Markov Decision Process with discrete time steps, where a proximal policy optimization (PPO) algorithm is employed to train the DRL agent with discrete action space. The agent is trained using real-world measured data to learn the policy for sequential charge/discharge tasks, aiming to minimize daily electricity costs. The battery power is calculated using an innovative heuristic model considering the agent's decision and the battery's available capacity, ensuring demand-supply balance through PV self-consumption and load demand shifting. The performance of the model is evaluated by comparing it to four RL agents and two benchmark models based on rule-based and scenario-based stochastic optimization strategies. The results confirm that the presented model outperforms its counterparts, offering €80.38 savings on electricity bills over 46 days of the test data set. This figure exceeds the savings of the rule-based and stochastic models by €15.64 and €19.38, respectively.
Read full abstract