Deep reinforcement learning (DRL)-based energy management strategy (EMS) is attractive for fuel cell vehicle (FCV). Nevertheless, the fuel economy and lifespan durability of proton exchange membrane fuel cell (PEMFC) stack and lithium-ion battery (LIB) may not be synchronously optimized since transient degradation variations of PEMFC stack and LIB are not generally regarded for DRL-based EMSs. Furthermore, the inappropriate action space and the overestimated value function of DRL can lead to suboptimal EMS for on-line control. To this end, the objective of this research endeavors to formulate a twin delayed deep deterministic policy gradient (TD3)-based EMS integrating durability information of PEMFC stack and LIB, which can interact with the vehicle operating states to continuously control the hybrid powertrain and limit the overestimation of DRL value function for ensuring maximum multi-objective reward at each moment. Unlike traditional DRL-based EMSs, the multi-objective reward function for this study is enlarged to incorporate the hydrogen consumption, state of charge (SOC)-sustaining penalty and transient lifespan degradation information of PEMFC stack and LIB in off-line training and on-line control. The results demonstrate that the proposed EMS can drastically lessen the training time and computational burden. Meanwhile, in contrast with deep Q-network (DQN)-based and deep deterministic policy gradient (DDPG)-based EMSs in the various real-world urban and standard driving cycles, the proposed EMS can achieve hydrogen abatement at least 9.76% and 1.07%, and slow down total powertrain degradation at least 9.11% and 2.62%, respectively.
Read full abstract