Demographic Fairness in Multimodal Biometrics: A Comparative Analysis on Audio-Visual Speaker Recognition Systems

Gianni Fenu,Mirko Marras

doi:10.1016/j.procs.2021.12.236

Abstract

In urban scenarios, biometric recognition technologies are being increasingly adopted to empower citizens with a secure and usable access to personalized services. Given the challenging environmental scenarios, combining evidence from multiple biometrics at a certain step of the recognition pipeline has been often proved to increase the performance of the biometric-enabled recognition system. Despite the increasing accuracy achieved so far, it still remains under-explored how the adopted biometric fusion policy impacts on the quality of the decisions made by the biometric system, depending on the demographic characteristics of the citizen under consideration. In this paper, we investigate the extent to which state-of-the-art multimodal recognition systems based on facial and vocal biometrics are susceptible to unfairness towards legally-protected groups of individuals, characterized by a common sensitive attribute. Specifically, we present a comparative analysis of the performance across groups for two deep learning architectures tailored for facial and vocal recognition, under seven fusion policies that cover different pipeline steps (feature, model, score and decision). Experiments show that, compared to the unimodal systems alone and the other fusion policies, the multimodal system obtained via a fusion at the model step leads to the highest overall accuracy and the lowest disparity across groups.

Full Text