Abstract
These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.