Automatic Facial Aesthetic Prediction Based on Deep Learning with Loss Ensembles

Adnan Mohsin Abdulazeez,Jwan Najeeb Saeed,Dheyaa Ahmed Ibrahim

doi:10.3390/app13179728

Abstract

Deep data-driven methodologies have significantly enhanced the automatic facial beauty prediction (FBP), particularly convolutional neural networks (CNNs). However, despite its wide utilization in classification-based applications, the adoption of CNN in regression research is still constrained. In addition, biases in beauty scores assigned to facial images, such as preferences for specific, ethnicities, or age groups, present challenges to the effective generalization of models, which may not be appropriately addressed within conventional individual loss functions. Furthermore, regression problems commonly employ L2 loss to measure error rate, and this function is sensitive to outliers, making it difficult to generalize depending on the number of outliers in the training phase. Meanwhile, L1 loss is another regression-loss function that penalizes errors linearly and is less sensitive to outliers. The Log-cosh loss function is a flexible and robust loss function for regression problems. It provides a good compromise between the L1 and L2 loss functions. The Ensemble of multiple loss functions has been proven to improve the performance of deep-learning models in various tasks. In this work, we proposed to ensemble three regression-loss functions, namely L1, L2, and Log-cosh, and subsequently averaging them to create a new composite cost function. This strategy capitalizes on the unique traits of each loss function, constructing a unified framework that harmonizes outlier tolerance, precision, and adaptability. The proposed loss function’s effectiveness was demonstrated by incorporating it with three pretrained CNNs (AlexNet, VGG16-Net, and FIAC-Net) and evaluating it based on three FBP benchmarks (SCUT-FBP, SCUT-FBP5500, and MEBeauty). Integrating FIAC-Net with the proposed loss function yields remarkable outcomes across datasets due to its pretrained task of facial-attractiveness classification. The efficacy is evident in managing uncertain noise distributions, resulting in a strong correlation between machine- and human-rated aesthetic scores, along with low error rates.

Full Text