Abstract

Deep reinforcement learning (DRL)-based methods have become predominant in the field of energy management strategy (EMS) today. However, DRL methods are often burdened by extensive training periods and a predisposition towards local optima. This work uses Imitation Learning (IL) to fully exploit the enormous training-guide potential of optimization-based methods, as a way to address the aforementioned disadvantages of DRL, building an IL-embedded DRL framework for EMS. Firstly, the off-line globally optimal trajectory is extracted, and then IL is utilized to imitate the trajectory. Subsequently, the network learned from IL is used as the initial policy network for the DRL algorithm to start training. The EMS based on the framework considers hydrogen consumption, fuel cell degradation, and power battery aging, with the goal of reducing the total driving cost. Behavioral Cloning and Proximal Policy Optimization (PPO) are used as the algorithms of IL and DRL parts, respectively in this study, to experiment the effect of the framework. Simulation results show that, compared to standalone PPO, the proposed framework can reduce the training steps by 51.69% while improving the total reward by 4.57%. Within the test cycle, the proposed framework attains 95.59% of the global optimal driving cost, exceeding standalone PPO by 5.79%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call