Exploring racial and gender disparities in voice biometrics

Xingyu Chen,Srirangaraj Setlur,Wenyao Xu,Zhengxiong Li

doi:10.1038/s41598-022-06673-y

Abstract

Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. These disparities have been explored in existing biometrics systems such as facial biometrics (identifying individuals based on facial attributes). However, such ethical issues remain largely unexplored in voice biometric systems that are very popular and extensively used globally. Using a corpus of non-speech voice records featuring a diverse group of 300 speakers by race (75 each from White, Black, Asian, and Latinx subgroups) and gender (150 each from female and male subgroups), we explore and reveal that racial subgroup has a similar voice characteristic and gender subgroup has a significant different voice characteristic. Moreover, non-negligible racial and gender disparities exist in speaker identification accuracy by analyzing the performance of one commercial product and five research products. The average accuracy for Latinxs can be 12% lower than Whites (p < 0.05, 95% CI 1.58%, 14.15%) and can be significantly higher for female speakers than males (3.67% higher, p < 0.05, 95% CI 1.23%, 11.57%). We further discover that racial disparities primarily result from the neural network-based feature extraction within the voice biometric product and gender disparities primarily due to both voice inherent characteristic difference and neural network-based feature extraction. Finally, we point out strategies (e.g., feature extraction optimization) to incorporate fairness and inclusive consideration in biometrics technology.

Highlights

Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently
To evaluate the voice inherent characteristics under demographic factors, we investigate the essential voice properties for each race and gender of the voices in our matched datasets regarding 15 representative fundamental voice metrics: Formants F requency[9], Mel Frequency Cepstral Coefficients (MFCC)[10], Pitch onsets[11], Root Mean Square (RMS)[12], Roll-Off13, Centroid[14], Spectral entropy[15], PDF entropy[16], Permutation entropy[17], and SVD e ntropy[18]
300 different speakers are selected, including 150 female speakers and 150 male speakers, with an identical racial distribution. (Details can be found in the “Matched dataset” section.) in this way, we reveal the disparities in voice inherent characteristics among racial and gender subgroups and the corresponding disparity degree

Summary

Introduction

Systemic inequity in biometrics systems based on racial and gender disparities has received a lot of attention recently. To evaluate the voice inherent characteristics under demographic factors (racial and gender), we investigate the essential voice properties for each race and gender of the voices in our matched datasets regarding 15 representative fundamental voice metrics: Formants F requency[9], Mel Frequency Cepstral Coefficients (MFCC)[10], Pitch onsets[11], Root Mean Square (RMS)[12], Roll-Off13, Centroid[14], Spectral entropy[15], PDF entropy[16], Permutation entropy[17], and SVD e ntropy[18]. These fundamental metrics represent the essential and primary characteristics of the voice, which are the base for the voice biometrics system (see details in “Voice fundamental metrics” section). The matched dataset means the data samples in the dataset are paired up so that speakers in different subgroups share similar characteristics except for the one factor under investigation, which controls

Objectives

Methods

Results