AbstractThis paper considers the real‐time spatio‐temporal electric vehicle charging navigation problem in a dynamic environment by utilizing a shortest path‐based reinforcement learning approach. In a data sharing system including transportation network, an electric vehicle (EV) and EV charging stations (EVCSs), it is aimed to determine the most convenient EVCS and the optimal path for reducing the travel, charging and waiting costs. To estimate the waiting times at EVCSs, Gaussian process regression algorithm is integrated using a real‐time dataset comprising of state‐of‐charge and arrival‐departure times of EVs. The optimization problem is modelled as a Markov decision process with unknown transition probability to overcome the uncertainties arising from time‐varying variables. A recently proposed on‐policy actor–critic method, phasic policy gradient (PPG) which extends the proximal policy optimization algorithm with an auxiliary optimization phase to improve training by distilling features from the critic to the actor network, is used to make EVCS decisions on the network where EV travels through the optimal path from origin node to EVCS by considering dynamic traffic conditions, unit value of EV owner and time‐of‐use charging price. Three case studies are carried out for 24 nodes Sioux‐Falls benchmark network. It is shown that phasic policy gradient achieves an average of 9% better reward compared to proximal policy optimization and the total time decreases by 7–10% when EV owner cost is considered.
Read full abstract