Abstract
Speech technology harnessing information from a speaker's voice promises to enhance security and assist in everyday tasks. Automated speech recognition (ASR) converts spoken words into text, facilitating interaction with electronic devices. ASR is also increasingly used in education in programs that assess students' learning through interaction with computers. However, ASR may not work equally well for underrepresented accent groups. Multiple studies over the last several years (e.g., Tatman 2017, Koenecke et al., 2020) have shown that ASR performs particularly poorly on African American English (AAE). This performance drop is likely due to imbalances in accent representation in training data. Here, we assess vocal tract length adjustment as a data augmentation method for increasing representation of AAE speech in the training data, with the aim of improving ASR performance on AAE. We compare this data augmentation method to standard data augmentation methods (e.g., environmental).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have