JHPFA-Net: Joint Head Pose and Facial Action Network for Driver Yawning Detection Across Arbitrary Poses in Videos

Chunsheng Liu,Hui Liu,Faliang Chang,Yansha Lu,Hengqiang Huan

doi:10.1109/tits.2023.3285923

Abstract

Yawning detection is a key means in driver fatigue detection, which suffers difficulties including head poses, facial expressions, illumination variations, occlusions, etc. Yet, most previous methods mainly focus on frontal faces, and are deficient to deal with different facial actions under arbitrary poses in the actual driving environment. In this study, we propose a novel Joint Head Pose and Facial Action Network (JHPFA-Net) for driver yawning detection across arbitrary poses in videos, with three main parts including a Geometric-based Key-frame Selection Module (GK-Module), a Face Frontalization with Warp Attention Module (FF-Module) and a dual-channel classifier for Head Pose <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\&$</tex-math> </inline-formula> Facial Action Fusion Module (HF-Module). Firstly, the GK-Module is proposed to extract geometric vectors and to construct a two-stage judgment mechanism, with the purpose of dealing with frame redundancy and improving the efficiency of JHPFA-Net structure. Secondly, distinguished with existing methods, the FF-Module is proposed to synthesize photo-realistic frontal faces, which can be used for capturing the facial actions under arbitrary poses. Finally, the HF-Module is proposed to fuse head pose attributes and facial modalities together, for the purpose of achieving pose-invariant detection and improving accuracy. Extensive experiments show that the proposed JHPFA-Net achieves state-of-the-art results comparing with some representative methods on the public YawDD benchmark, and it performs well in real-time application.

Full Text