Accent neutralization for speech recognition of non-native speakers

Kacper Radzikowski,Le Wang,Osamu Yoshie,Mateusz Forc,Robert Nowak

doi:10.1145/3366030.3366083

Abstract

These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.

Full Text