A Lightweight Face Detector via Bi-Stream Convolutional Neural Network and Vision Transformer

Zekun Zhang,Shijie Wang,Qingqing Chao,Teng Yu

doi:10.3390/info15050290

Zekun Zhang, Shijie Wang + Show 2 more

Open Access

PDF Available

https://doi.org/10.3390/info15050290

Copy DOI

Export

Save

Cite

Journal: Information	Publication Date: May 20, 2024
License type: CC BY 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Lightweight convolutional neural networks are widely used for face detection due to their ability to learn local representations through spatial induction bias and translational invariance. However, convolutional face detectors have limitations in detecting faces under challenging conditions like occlusion, blurring, or changes in facial poses, primarily attributed to fixed-size receptive fields and a lack of global modeling. Transformer-based models have advantages on learning global representations but are insensitive to capture local patterns. To address these limitations, we propose an efficient face detector that combines convolutional neural network and transformer architectures. We introduce a bi-stream structure that integrates convolutional neural network and transformer blocks within the backbone network, enabling the preservation of local pattern features and the extraction of global context. To further preserve the local details captured by convolutional neural networks, we propose a feature enhancement convolution block in a hierarchical backbone structure. Additionally, we devise a multiscale feature aggregation module to enhance obscured and blurred facial features. Experimental results demonstrate that our method has achieved improved lightweight face detection accuracy with an average precision of 95.30%, 94.20%, and 87.56% across the easy, medium, and hard subdatasets of WIDER FACE, respectively. Therefore, we believe our method will be a useful supplement to the collection of current artificial intelligence models and benefit the engineering applications of face detection.

Full Text