Abstract

Automated speaker recognition is enabling personalized interactions with the voice-based interfaces and assistants part of the modern cyber-physical-social systems. Prior studies have unfortunately uncovered disparate impacts across demographic groups on the outcomes of speaker recognition systems and consequently proposed a range of countermeasures. Understanding why a speaker recognition system may lead to this disparate performance for different (groups of) individuals, going beyond mere data imbalance reasons and black-box countermeasures, is an essential yet under-explored perspective. In this paper, we propose an explanatory framework that aims to provide a better understanding of how speaker recognition models perform as the underlying voice characteristics on which they are tested change. With our framework, we evaluate two state-of-the-art speaker recognition models, comparing their fairness in terms of security, through a systematic analysis of the impact of more than twenty voice characteristics. Our findings include important takeaways to enable voice controlled cyber-physical-social systems for everyone. Source code and data are available at https://bit.ly/EA-PRLETTERS.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.