Face recognition with deep learning is generally approached as a problem of capacity. The field has seen progressively deeper, more complex models or larger, more highly variant data sets. The data sets can be problematic, as they are often scraped indiscriminately from the internet. This results in an uncertain, and often heavily unbalanced distribution of race, gender, age and other aspects of the subjects, which is then manifested in the decisions of the models trained on them. The carbon footprint of machine learning is a concern. A real push is developing to reduce the energy consumption of machine learning as we strive for a more eco-friendly society. In addition, due to many instances of misuse by law enforcement and other agencies, unbiased models for face recognition are now fundamental to the practical application of the field. We present an approach using the state of the art Vision Transformer and Early Exits for reducing compute budget without significantly affecting performance. We develop a system for face recognition and identification with a closed-set gallery and show that with a small reduction in performance, a reasonable reduction in compute cost can be obtained using our method. Second, we investigate how these early Exits interact with the bias model through a robust evaluation of matching scores on a racially balanced data set. We show that matching scores vary heavily between cohorts, and these variations are magnified at the earlier exits.
Read full abstract