Abstract

It is of great importance to monitor the driver's status to achieve an intelligent and safe take-over transition in the level 3 automated driving vehicle. We present a camera-based system to recognise the non-driving activities (NDAs) which may lead to different cognitive capabilities for take-over based on a fusion of spatial and temporal information. The region of interest (ROI) is automatically selected based on the extracted masks of the driver and the object/device interacting with. Then, the RGB image of the ROI (the spatial stream) and its associated current and historical optical flow frames (the temporal stream) are fed into a two-stream convolutional neural network (CNN) for the classification of NDAs. Such an approach is able to identify not only the object/device but also the interaction mode between the object and the driver, which enables a refined NDA classification. In this paper, we evaluated the performance of classifying 10 NDAs with two types of devices (tablet and phone) and 5 types of tasks (emailing, reading, watching videos, web-browsing and gaming) for 10 participants. Results show that the proposed system improves the averaged classification accuracy from 61.0% when using a single spatial stream to 90.5%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call