Abstract
This Cross-lingual knowledge sharing based acoustic modeling methods are usually used in Automatic Speech Recognition (ASR) of languages which do not have enough transcribed speech for acoustic model (AM) training. Conventional methods such as IPA based universal acoustic modeling have been proved to be effective under matched acoustic conditions, while usually poorly preformed when mismatch appears between the target language and the source languages. This paper proposes a method of multi-lingual unsupervised AM training for zero-resourced languages under mismatch conditions. The proposed method includes two main steps. In the first step, initial AM of the target low-resourced language was obtained using multi-task training method, in which original source language data and mapped source language data are jointly used. In the second step, AM of the target language is trained using automatically transcribed target language data, in the way of iteratively training new AMs and adapting the initial AMs. Experiments were conducted on a corpus with 100 hours untranscribed Japanese speech and 300 hours transcribed speech of other languages. The best result achieved by this paper is 51.75% character error rate (CER), which obtains 24.78% absolute reduction compared to baseline IPA system.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.