Abstract

Understanding driver activity is vital for in-vehicle systems that aim to reduce the incidence of car accidents rooted in cognitive distraction. Automating real-time behavior recognition while ensuring actions classification with high accuracy is however challenging, given the multitude of circumstances surrounding drivers, the unique traits of individuals, and the computational constraints imposed by in-vehicle embedded platforms. Prior work fails to jointly meet these runtime/accuracy requirements and mostly rely on a single sensing modality, which in turn can be a single point of failure. In this paper, we harness the exceptional feature extraction abilities of deep learning and propose a dedicated Interwoven Deep Convolutional Neural Network (InterCNN) architecture to tackle the problem of accurate classification of driver behaviors in real-time. The proposed solution exploits information from multi-stream inputs, i.e., in-vehicle cameras with different fields of view and optical flows computed based on recorded images, and merges through multiple fusion layers abstract features that it extracts. This builds a tight ensembling system, which significantly improves the robustness of the model. In addition, we introduce a temporal voting scheme based on historical inference instances, to enhance the classification accuracy. Experiments conducted with a dataset that we collect in a mock-up car environment demonstrate that the proposed InterCNN with MobileNet convolutional blocks can classify 9 different behaviors with 73.97% accuracy, and 5 'aggregated' behaviors with 81.66% accuracy. We further show that our architecture is highly computationally efficient, as it performs inferences within 15ms, which satisfies the real-time constraints of intelligent cars. Nevertheless, our InterCNN is robust to lossy input, as the classification remains accurate when two input streams are occluded.

Highlights

  • D RIVER’s cognitive distraction is a major cause of unsafe driving, which leads to severe car accidents every year [1]

  • 3) We demonstrate that our InterCNNs with MobileNet blocks and a temporal voting scheme, which enhances accuracy by leveraging historical inferences, can classify 9 different behaviors with 73.97% accuracy, and 5 aggregated behaviors with 81.66% accuracy

  • Training is performed for approximately 10 days on a computing cluster with 18 nodes, each equipped with two Intel Xeon E5-2620 processors (24 logical cores) clocked at 2.1 GHz, 64 GB of RAM and a mix of multiple NVIDIA TITAN X and Tesla K40M Graphics processing units (GPUs), each with 12GB of memory

Read more

Summary

INTRODUCTION

D RIVER’s cognitive distraction is a major cause of unsafe driving, which leads to severe car accidents every year [1]. Recent works that adopt deep learning to solve the driver activity recognition problem, including [15]–[21], suffer from at least one of the following limitations: (i) they do not quantify the inference times of the solution proposed, which is critical to real-life car systems, or exhibit runtimes that are not affordable in practice; (ii) often struggle to classify individual actions with very high accuracy; and (iii) rely on a single sensing modality (video feed) for detection, which can become a single point of failure, challenging the practical efficacy of the classifier To tackle these problems, in this paper we design a driver behavior recognition system that uniquely combines different convolutional-type neural models, through which we accurately perform this task in real-time, relying on multiple inputs.

RELATED WORK
THE INTERCNN ARCHITECTURE
CNN BLOCKS EMPLOYED
EXPERIMENTS
TEMPORAL VOTING
Findings
CONCLUSIONS AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.