Abstract

In this study, the formulation of a curiosity-inspired asynchronous advantage actor-critic (A3C+) based model-free, self-learning energy management strategy (EMS) is elaborated for hybrid powertrain. The fuel optimality of standard A3C based EMS is strongly influenced by the target strategy including charge sustenance (CS) and charge depletion (CD), being around 92% and 83% in comparison to dynamic programming (DP) based EMS under training cycle. The corresponding performance of A3C derived CD policy further declined to 75% under testing cycles. Similar tendencies are verified by model predictive control (MPC) and deep deterministic policy gradient (DDPG) based EMSs. To this end, random network distillation (RND) and inverse dynamics model (IDM) techniques are incorporated to form novelty-seeking module (NSM) which is leveraged as intrinsic rewards to facilitate the exploration performance. As a result, the proposed A3C + based EMS obtains significant improvement of global optimality, generalization ability and adaptivity. The fuel optimality of A3C + derived CS and CD policies can be guaranteed by at least 92% and 88% under training and testing cycles, comparing with DP based EMS. Besides, the training and running efficacy of A3C + based EMS are apparently superior to MPC based EMS, demonstrating its realtime implementation potential.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call