Hierarchical Q-learning network for online simultaneous optimization of energy efficiency and battery life of the battery/ultracapacitor electric vehicle

Bin Xu,Quan Zhou,Junzhe Shi,Sixu Li

doi:10.1016/j.est.2021.103925

Abstract

Reinforcement learning has been gaining attention in energy management of hybrid power systems for its low computation cost and great energy saving performance. However, the potential of reinforcement learning (RL) has not been fully explored in electric vehicle (EV) applications because most studies on RL only focused on single design targets. This paper studied on online optimization of the supervisory control system of an EV (powered by battery and ultracapacitor) with two design targets, maximizing energy efficiency and battery life. Based on a widely used reinforcement learning method, Q-learning, a hierarchical learning network is proposed. Within the hierarchical Q-learning network, two independent Q tables, Q1 and Q2, are allocated in two control layers. In addition to the baseline power-split layer, which determines the power split ratio between battery and ultracapacitor based on the knowledge stored in Q1, an upper layer is developed to trigger the engagement of the ultracapacitor based on Q2. In the learning process, Q1 and Q2 are updated during the real driving using the measured signals of states, actions, and rewards. The hierarchical Q-learning network is developed and evaluated following a full propulsion system model. By introducing the single-layer Q-learning based method and the rule-based method as two baselines, performance of the EV with the three control methods (i.e., two baseline and one proposed) are simulated under different driving cycles. The results show that the addition of an ultracapacitor in the electric vehicle reduces the battery capacity loss by 12%. The proposed hierarchical Q-learning network is shown superior to the two baseline methods by reducing 8% battery capacity loss. The vehicle range is slightly extended along with the battery life extension. Moreover, the proposed strategy is validated by considering different driving cycle and measurement noise. The proposed hierarchical strategy can be adapted and applied to reinforcement learning based energy management in different hybrid power systems.

Full Text