Efficient face detection and tracking in video sequences based on deep learning

Guangyong Zheng,Yuming Xu

doi:10.1016/j.ins.2021.03.027

Abstract

Video-based face detection and tracking technology has been widely used in video surveillance, safe driving, and medical diagnosis. In video sequences, most existing face detection and tracking methods face interference caused by occlusion, ambient illumination, and changes in human posture. To accurately track human faces in video sequences, we propose an efficient face detection and tracking framework based on deep learning, which includes a SENResNet face detection model and a Regression Network-based Face Tracking (RNFT) model. Firstly, the SENResNet model integrates the Squeeze and Excitation Network (SEN) with the Residual Neural Network (ResNet). To solve the problem that deep neural networks are difficult to train, we use ResNet to overcome the problem of gradient disappearance in deep network training. To fuse the features of each channel during the convolution operation, we further integrate the SEN module into the SENResNet model. SENResNet accurately detects facial information in each frame and extracts the position of the target face, thereby providing an initialization window for face tracking. Then, the RNFT model extracts facial features from adjacent frames and predict the position of the target face in the next frame. To address the problem of feature scaling, we add a correction network to the RNFT model. The improved RNFT model extracts the rectangular frame of the target face in the previous frame and strengthens the perception of feature scaling, thereby improving its accuracy. Extensive experimental results on public facial and video datasets show that the proposed SENResNet and RNFT models are superior to the state-of-the-art comparison methods in terms of accuracy and performance.

Full Text