Abstract

Face detection and alignment in unconstrained environment is always deployed on edge devices which have limited memory storage and low computing power. This paper proposes a one-stage method named CenterFace to simultaneously predict facial box and landmark location with real-time speed and high accuracy. The proposed method also belongs to the anchor-free category. This is achieved by (a) learning face existing possibility by the semantic maps, (b) learning bounding box, offsets, and five landmarks for each position that potentially contains a face. Specifically, the method can run in real time on a single CPU core and 200 FPS using NVIDIA 2080TI for VGA-resolution images and can simultaneously achieve superior accuracy (WIDER FACE Val/Test-Easy: 0.935/0.932, Medium: 0.924/0.921, Hard: 0.875/0.873, and FDDB discontinuous: 0.980 and continuous: 0.732).

Highlights

  • Face detection and alignment is one of the fundamental issues in computer vision and pattern recognition and is often deployed in mobile and embedded devices. ese devices typically have limited memory storage and low computing power. erefore, it is necessary to predict the position of the face box and the landmark at the same time, and it is excellent in speed and precision

  • Compared with the two-stage method, the one-stage method is more efficient and has higher recall rate, but it tends to achieve a higher false positive rate and to compromise the localization accuracy. en, Hu and Ramanan [3] used a two-stage approach to the Region Proposal Networks (RPN) [1] to detect faces directly, while SSH [4] and S3FD [5] developed a scale-invariant network in a single network to detect faces with mutiscale from different layers

  • CenterFace represents the face through the center point of the face box, and face size and facial landmark are regressed directly from image features of the center location. erefore, only one layer in the pyramid is used for face detection and alignment

Read more

Summary

Introduction

Face detection and alignment is one of the fundamental issues in computer vision and pattern recognition and is often deployed in mobile and embedded devices. ese devices typically have limited memory storage and low computing power. erefore, it is necessary to predict the position of the face box and the landmark at the same time, and it is excellent in speed and precision.With the great breakthrough of convolutional neural networks (CNN), face detection has achieved remarkable progress in recent years. Previous face detection methods have inherited the paradigm of anchor-based generic object detection frameworks, which can be divided into two categories: two-stage method (Faster-RCNN [1]) and one-stage method (SSD [2]). Compared with the two-stage method, the one-stage method is more efficient and has higher recall rate, but it tends to achieve a higher false positive rate and to compromise the localization accuracy. In order to improve the overlap between anchor boxes and ground truth, a face detector usually requires a large number of dense anchors to achieve a good recall rate. The anchor is a hyperparameter design that is statistically calculated from a particular dataset, so it is not always feasible to other applications, which goes against the generality

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.