Abstract

Automatic recognition and prediction of in-vehicle human activities has a significant impact on the next generation of driver assistance and intelligent autonomous vehicles. In this article, we present a novel single image driver action recognition algorithm inspired by human perception that often focuses selectively on parts of the images to acquire information at specific places which are distinct to a given task. Unlike existing approaches, we argue that human activity is a combination of pose and semantic contextual cues. In detail, we model this by considering the configuration of body joints, their interaction with objects being represented as a pairwise relation to capture the structural information. Our body-pose and body-object interaction representation is built to be semantically rich and meaningful, which is highly discriminative even though it is coupled with a basic linear SVM classifier. We also propose a Multi-stream Deep Fusion Network (MDFN) for combining high-level semantics with CNN features. Our experimental results demonstrate that the proposed approach significantly improves the drivers’ action recognition accuracy on two exacting datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call