Facial landmark detection is an essential task in face-processing techniques. Traditional methods however require expensive pixel-level labels. Semi-supervised facial landmark detection has been explored as an alternative but previous approaches only focus on training-oriented issues (e.g., noisy pseudo-labels in the semi-supervised learning), neglecting task-oriented issues (i.e., the quantization error in the landmark detection). We argue that semi-supervised landmark detectors should resolve the two technical issues simultaneously. Through a simple experiment, we found that task- and training-oriented solutions may negatively influence each other, thus eliminating their negative interactions is important. To this end, we devise a new heatmap regression framework via hybrid representation, namely HybridMatch.We utilize both 1-D and 2-D heatmap representations. Here, the 1-D and 2-D heatmap help alleviate the task-oriented and the training-oriented issues, respectively. To exploit the advantages of our hybrid representation, we introduce curriculum learning; relying more on the 2-D heatmap at the early training stage and gradually increasing the effects of the 1-D heatmap. By resolving the two issues simultaneously, we can capture more precise landmark points than existing methods with only a few annotated data. Extensive experiments show that HybridMatch achieves state-of-the-art performance on three benchmark datasets, especially showing 26.3% NME improvement over the existing method in the 300-W <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">full</i> set at 5% data ratio. Surprisingly, our method records a comparable performance, 5.04 ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">challenging</i> set in the 300-W) to the fully-supervised facial landmark detector 5.03. The remarkable performance of HybridMatch shows its potential as a practical alternative to the fully-supervised model.
Read full abstract