Abstract

The transcription accuracy of automatic speech recognition (ASR) system may suffer when recognizing accented speech. The resulting bias in ASR system towards a specific accent due to under representation of that accent in the training dataset. Accent recognition of existing speech samples can help with the preparation of the training datasets, which is an important step toward closing the accent gap and eliminating biases in ASR system. For that we built a system to recognize accent from spoken speech data. In this study, we have explored some prosodic and vocal speech features as well as speaker embeddings for accent recognition on our custom English speech data that covers speakers from around the world with varying accents. We demonstrate that our selected speech features are more effective in recognizing nonnative accents. Additionally, we experimented with a hierarchical classification model for multi-level accent classification. To establish an accent hierarchy, we employed a bottom-up approach, combining regional accents and categorizing them as either native or non-native at the top level. Furthermore, we conducted a comparative study between flat classification and hierarchical classification using the accent hierarchy structure.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.