Abstract

To address the problems of underutilization of samples and unstable training for intelligent vehicle training in the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, a TD3 algorithm based on the Composite Prioritized Experience Replay (CPR-TD3) mechanism is proposed. It considers experience immediate reward value and Temporal Difference error (TD-error) based and respectively to construct priorities to rank the samples. Subsequently composite average ranking of the samples to recalculate the priorities for sampling, uses the collected samples to train the target network. Then introduces the minimum lane change distance and the variable headway time distance to improve the reward function. Finally, the improved algorithm is proved to be effective by comparing it with the traditional TD3 on the highway scenario, and the CPR-TD3 algorithm improves the training efficiency of intelligent vehicles.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call