End-to-end self-driving is a method that directly maps raw visual images to vehicle control signals using deep convolutional neural network (CNN). Although prediction of steering angle has achieved good result in single task, the current approach does not effectively simultaneously predict the steering angle and the speed. In this paper, various end-to-end multi-task deep learning networks using deep convolutional neural network combined with long short-term memory recurrent neural network (CNN-LSTM) are designed and compared, which could obtain not only the visual spatial information but also the dynamic temporal information in the driving scenarios, and improve steering angle and speed predictions. Furthermore, two auxiliary tasks based on semantic segmentation and object detection are proposed to improve the understanding of driving scenarios. Experiments are conducted on the public Udacity dataset and a newly collected Guangzhou Automotive Cooperate dataset. The results show that the proposed network architecture could predict steering angles and vehicle speed accurately. In addition, the impact of multi-auxiliary tasks on the network performance is analyzed by visualization method, which shows the salient map of network. Finally, the proposed network architecture has been well verified on the autonomous driving simulation platform Grand Theft Auto V (GTAV) and experimental road with an average takeover rate of two times per 10 km.
Read full abstract