Abstract

The current intelligent driving system does not consider the selective attention mechanism of drivers, and it cannot completely replace the drivers to extract effective road information. A Driver Visual Attention Network (DVAN), which is based on deep learning attention model, is proposed in our paper, in order to solve this problem. The DVAN is aimed at extracting the key information affecting the driver’s operation by predicting the driver’s attention points. It completes the fast localization and extraction of road information that is most interesting to drivers by merging local apparent features and contextual visual information. Meanwhile, a Cross Convolutional Neural Network (C-CNN) is proposed in order to ensure the integrity of the extracted information. Here, we verify the network on the KITTI dataset, which is the largest computer vision algorithm evaluation data set in the world’s largest autonomous driving scenario. Our results show that the DVAN can quickly locate and identify the target that the driver is most interested in a picture, and the average accuracy of prediction is 96.3%. This will provide useful theoretical basis and technical methods that are related to visual perception for intelligent driving vehicles, driving training and assisted driving systems in the future.

Highlights

  • In recent years, many society and livelihood issues, such as traffic safety, congestion, pollution, and energy consumption are accompanied with the continuous increase of car ownership and traffic flow

  • Our method redistributes the label’s classification labels into three categories of Car, Cyclist, and Pedestrian in order to more conform to the visual attention mechanism of drivers, in which Car, Van, and Truck are all merged into Car, Pedestrian is merged, and Person (Sit-ting) is Pedestrian, and Tram and Misc are directly removed

  • It can be clearly seen from graph (a) that the Frames Per Second (FPS) of our method is slightly lower than that of Yolov2, and it is about the same as that of Yolov3, which is sufficient for meeting the real-time requirements

Read more

Summary

Introduction

Many society and livelihood issues, such as traffic safety, congestion, pollution, and energy consumption are accompanied with the continuous increase of car ownership and traffic flow. Kim H et al [17] proposed a target detection model in the road driving environment, which was migrated from SSD on KITTI data set. Traffic driving videos, which respectively record the eye movement data of eight drivers during real driving, and each only video contains one driver’s eye movement data information In subsequent work, they used different computer vision models to train on their data set to predict the driver’s attention [19,20]. For the study of the driver’s visual attention mechanism in the driving scenes, the eye movement data in each video only contains a single driver, which might cause some images that are related to traffic driving to be lost due to the individual differences of the driver information. We analyze the prediction results through the KITTI data set

Driver Visual Attention Network
Dataset Description
Experimental Details
The Setting of Loss Function
X t t 2 t t
Experimental Results and Analysis
Validation
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call