A Fast and Lightweight Method with Feature Fusion and Multi-Context for Face Detection

Lei Zhang,Xiaoli Zhi

doi:10.3390/fi10080080

Abstract

Convolutional neural networks (CNN for short) have made great progress in face detection. They mostly take computation intensive networks as the backbone in order to obtain high precision, and they cannot get a good detection speed without the support of high-performance GPUs (Graphics Processing Units). This limits CNN-based face detection algorithms in real applications, especially in some speed dependent ones. To alleviate this problem, we propose a lightweight face detector in this paper, which takes a fast residual network as backbone. Our method can run fast even on cheap and ordinary GPUs. To guarantee its detection precision, multi-scale features and multi-context are fully exploited in efficient ways. Specifically, feature fusion is used to obtain semantic strongly multi-scale features firstly. Then multi-context including both local and global context is added to these multi-scale features without extra computational burden. The local context is added through a depthwise separable convolution based approach, and the global context by a simple global average pooling way. Experimental results show that our method can run at about 110 fps on VGA (Video Graphics Array)-resolution images, while still maintaining competitive precision on WIDER FACE and FDDB (Face Detection Data Set and Benchmark) datasets as compared with its state-of-the-art counterparts.

Highlights

Face detection is a key step in many visual applications, such as face verification, face tracking and etc
Though there are some recent real time methods [6,11], they can only obtain this fast speed by the support of high performance GPUs
To obtain a detection precision that could be comparable to the computation intensive convolutional network based methods, we employ multi-scale features and multi-context through some efficient ways

Summary

Introduction

Face detection is a key step in many visual applications, such as face verification, face tracking and etc. These methods mainly take computation intensive convolutional networks such as VGG-16 [7] or ResNet (Residual Network)-101 [8] as backbone These networks are very powerful, their big computation workload causes poor detection speed and constrains their applicability in real life. Nvidia has introduced a Pascal-powered Jetson TX2 computer for real life applications, but its computing power is 1.5 TFLOPs compared to Titan X’s 11 TFLOPs, which is still too weak to support recent CNN-based methods without much accuracy loss. To address this issue, we propose a fast and lightweight face detector. To obtain a detection precision that could be comparable to the computation intensive convolutional network based methods, we employ multi-scale features and multi-context through some efficient ways

Methods

Results

Conclusion