Abstract

Human driving decisions are the leading cause of road fatalities. Autonomous driving naturally eliminates such incompetent decisions and thus can improve traffic safety and efficiency. Deep reinforcement learning (DRL) has shown great potential in learning complex tasks. Recently, researchers investigated various DRL-based approaches for autonomous driving. However, exploiting multi-modal fusion to generate perception and motion prediction and then leveraging these predictions to train a latent DRL has not been targeted yet. To that end, we propose enhancing urban autonomous driving using multi-modal fusion with latent DRL. A single LIDAR sensor is used to extract bird's-eye view (BEV), range view (RV), and residual input images. These images are passed into LiCaNext, a real-time multi-modal fusion network, to produce accurate joint perception and motion prediction. Next, predictions are fed with another simple BEV image into the latent DRL to learn a complex end-to-end driving policy ensuring safety, efficiency, and comfort. A sequential latent model is deployed to learn more compact representations from inputs, leading to improved sampling efficiency for reinforcement learning. Our experiments are simulated on CARLA and evaluated against state-of-the-art DRL models. Results manifest that our method learns a better driving policy that outperforms other prevailing models. Further experiments are conducted to reveal the effectiveness of our proposed approach under different environments and varying weather conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call