Abstract

Deep Deterministic Policy Gradient (DDPG) is a promising reinforcement learning technique with the potential to resolve complicated tasks and handle high-dimensional state/action spaces. However, it suffers from sample inefficiency, requiring a high number of training samples. To speed up the training, we propose -annealing and Q-learning switching methods to aid the training of DDPG with Nonlinear Model Predictive Control (NMPC) controller to solve priority calculation and merging of autonomous vehicles at roundabouts. We further expand the Q-learning switch with double replay memory and Nash Q-value updates. The performance of these switching methods are compared to DDPG and demonstrate that Nash switch outperforms other methods. To reduce conservativeness, we test training using variable traffic density. We test three selection methods inside Q-learning and show constant threshold switch has at least ten times higher mean reward for 50 episodes training. We also compare Q-learning with NMPC and PID assistance and show that NMPC has 114% higher mean reward. We compare Q-learning switch and novel Nash switch method under noise-free and noisy input conditions to prove an increase of 35% mean reward and decrease of 4% std for Nash updates. We analyze efficacy of Q-learning and Nash switch approaches w.r.t NMPC and demonstrate comparable performance between Nash switch and NMPC. We juxtapose driving results of switch Q-learning and Nash switch with DDPG algorithm to prove Nash switch strategy has higher overall performance. Finally, we compare Nash switch's performance with DDPG for highway merging scenario which shows 159% higher mean reward.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call