Abstract

The application of deep reinforcement learning in driving policy learning for automated vehicles is limited by the difficulty of designing reward functions. Most existing inverse reinforcement learning (IRL) methods make a structure assumption on the reward function and omit the uncertainty of neural networks. In view of this, this paper proposes a novel deep Bayesian IRL method that addresses both the reward function designing and the uncertainty-measuring issues by learning an approximate posterior distribution over the reward function. Furthermore, we propose to train uncertainty-aware human-like driving policies by maximising the predicted reward and penalising its uncertainty. Finally, the proposed methods were validated in simulated highway driving scenarios. The results show that AVRL models uncertainty and learns reward functions significantly outperforming the existing IRL method applied in automated driving. It was also found that penalising the uncertainty of the reward function during policy training improves the success rate and human likeness of the learned policy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call