Abstract
Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.
Highlights
Mobile devices supply users with the possibility of instantly taking photos and uploading them to social media platforms
The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without knowledge about ground truth and to answer the question ”Is ground truth data required to train an effective face verification system?” This issue is very important in practice: manual or even semi-manual face image labelling is very time-consuming, as a face data set might consist of hundreds of thousands of images
Among the most important packages that were used are Tensorflow 2.1 for machine learning with configured GPU support, in order to speed up network training, mtcnn 0.1.0 for face detection and segmentation, the deep neural networks (DNN)
Summary
Mobile devices supply users with the possibility of instantly taking photos and uploading them to social media platforms. In real-life scenarios, due to a large amount of data, face verification systems deal with unlabeled images in which identities. Convolutional Neural Networks (CNN) are the state-of-the-art approach for generating numerical vectors that represents faces (so-called embeddings), which are later used as input for clustering and classification algorithms. The process of training of such a network is a supervised procedure [2] (i.e., the input data need to have labeled identities), novel papers have introduced some heuristics that allow this important limitation to be partially overcome [3]. After generating a face embedding, face verification systems utilize classification algorithms to assign faces to identities. Among most common classification approaches are k-nearest neighbors (KNN) [4,5], fully connected Neural Networks (NN) [6], and Support Vector Machines (SVM) [7]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.