Abstract

Accurate recognition of nondriving activity (NDA) is important for the design of intelligent human machine interface to achieve a smooth and safe control transition in the conditionally automated driving vehicle. However, some characteristics of such activities like limited-extent movement and similar background pose a challenge to the existing 3-D convolutional neural network based action recognition methods. In this article, we propose a dual-stream 3-D residual network, named DS3D residual network (ResNet), to enhance the learning of spatio-temporal representation and improve the activity recognition performance. Specifically, a parallel two-stream structure is introduced to focus on the learning of short-time spatial representation and small-region temporal representation. A two-feed driver behavior monitoring framework is further build to classify four types of NDAs and two types of driving behavior based on the driver's head and hand movement. A novel NDA dataset has been constructed for the evaluation, where the proposed DS3D ResNet achieves 83.35% average accuracy, at least 5% above three selected state-of-the-art methods. Furthermore, this study investigates the spatio-temporal features learned in the hidden layer through the saliency map, which explains the superiority of the proposed model on the selected NDAs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call