Abstract

Facial landmark detection is a crucial preprocessing step in many applications that process facial images. Deep-learning-based methods have become mainstream and achieved outstanding performance in facial landmark detection. However, accurate models typically have a large number of parameters, which results in high computational complexity and execution time. A simple but effective facial landmark detection model that achieves a balance between accuracy and speed is crucial. To achieve this, a lightweight, efficient, and effective model is proposed called the efficient face alignment network (EfficientFAN) in this article. EfficientFAN adopts the encoder-decoder structure, with a simple backbone EfficientNet-B0 as the encoder and three upsampling layers and convolutional layers as the decoder. Moreover, deep dark knowledge is extracted through feature-aligned distillation and patch similarity distillation on the teacher network, which contains pixel distribution information in the feature space and multiscale structural information in the affinity space of feature maps. The accuracy of EfficientFAN is further improved after it absorbs dark knowledge. Extensive experimental results on public datasets, including 300 Faces in the Wild (300W), Wider Facial Landmarks in the Wild (WFLW), and Caltech Occluded Faces in the Wild (COFW), demonstrate the superiority of EfficientFAN over state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call