Abstract
Automated speech recognition (ASR) converts language into text and is used across a variety of applications to assist us in everyday life, from powering virtual assistants, natural language conversations, to enabling dictation services. While recent work suggests that there are racial disparities in the performance of ASR systems for speakers of African American Vernacular English, little is known about the psychological and experiential effects of these failures paper provides a detailed examination of the behavioral and psychological consequences of ASR voice errors and the difficulty African American users have with getting their intents recognized. The results demonstrate that ASR failures have a negative, detrimental impact on African American users. Specifically, African Americans feel othered when using technology powered by ASR—errors surface thoughts about identity, namely about race and geographic location—leaving them feeling that the technology was not made for them. As a result, African Americans accommodate their speech to have better success with the technology. We incorporate the insights and lessons learned from sociolinguistics in our suggestions for linguistically responsive ways to build more inclusive voice systems that consider African American users’ needs, attitudes, and speech patterns. Our findings suggest that the use of a diary study can enable researchers to best understand the experiences and needs of communities who are often misunderstood by ASR. We argue this methodological framework could enable researchers who are concerned with fairness in AI to better capture the needs of all speakers who are traditionally misheard by voice-activated, artificially intelligent (voice-AI) digital systems.
Highlights
With the advances in deep learning for speech, and natural speech and language processing, Automated speech recognition (ASR) systems have improved dramatically over the past several years and have become ubiquitous in everyday life
There is a gap in our understanding of how the insights, concepts and methods from sociolinguistics and social psychology ought to inform ASR research which this study aims to fill
As we go through our daily lives, we experience virtual assistants, automatic translators, digital dictation, and hands-free computing powered by ASR systems that are rarely free of the effects of bias
Summary
With the advances in deep learning for speech, and natural speech and language processing, ASR systems have improved dramatically over the past several years and have become ubiquitous in everyday life. Examples of ASR include virtual assistants, automatic translation, digital dictation, and handsfree computing. Given the rise of popularity of these voicebased systems, failures of ASR systems can pose serious risks to users. In crisis management situations, poor quality of speech input can pose real challenges for speech recognition systems (Vetulani et al, 2010). In the health context, being misunderstood by ASR systems can lead to patient harm (Topaz et al, 2018). The importance of being understood by speech recognition (and the consequences of being misunderstood) requires a closer investigation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.