Abstract

Multiple in-the-wild face processing approaches suffer from degraded performance due to, among other factors, facial expressions, occlusions, accessories, and variations in lighting and in the head pose. To better understand these issues, four different face parts, each eye, the nose, and the mouth, are studied for performing face processing tasks in challenging environments. Additionally, an automatic pipeline based on convolutional neural networks is proposed for detecting the available regions, processing them, and combining the results generated from each, resulting in a robust solution. The pipeline is evaluated on two common face processing tasks: head pose estimation, and gender recognition. Experiments are performed using two different object detectors, five popular, and one custom convolutional network architecture, for the classification step, and two datasets, one for each task, with different overall difficulty, representing a wide range in the unconstrained scenario spectrum. Results are detailed for each region and their combination, comparisons are performed against the state-of-the-art and in-depth discussions are provided. In particular, experiments indicate that the nose and the mouth play a major role in challenging scenarios, due to their robustness to self occlusion. The complete pipeline outperforms state-of-the-art works when estimating the head pose in unconstrained scenarios, and achieves competitive performance for recognizing gender. By evaluating each region separately, degraded parts are excluded from processing, favoring the use of reliable face information, resulting in increased performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call