Abstract

Considering physical constraints in online optimization and training safety is a challenge for implementation of deep reinforcement learning (DRL) algorithm. Especially for the non-linear system, the mapping relationship between output action of the agent and the control signals is difficult to obtain. This paper proposes a novel DRL framework for online optimization in energy management of a power-split hybrid electric vehicle (HEV) which combines a neural-network (NN)-based multi-constraints optimal strategy and a rule-based-restraints system (RBRS). The proposed method named Reward Directed Policy Optimization (RDPO) adopts exterior point method (EPM) and curriculum learning (CL) to direct the agent to recognize and avoid irrational control signals and optimize the fuel economy. The EMS considering fuel consumption minimization and irrational control signals avoidance is optimized by training the agent through WLTC. A competitive fuel economy, 4.495L/100km, is achieved with no irrational control signals. Based on the online adaptability evaluation conducted, the fuel consumption of the vehicle under NEDC and CTUDC has been reduced to 4.113L/100km and 3.221L/100km, respectively, with no irrational control signals. The superiority in optimization, calculation efficiency and safety is verified through the comparisons with various DRL agents.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call