LRSpeech

Jin Xu,Sheng Zhao,Xu Tan,Tie-Yan Liu,Tao Qin,Yi Ren,Jian Li

doi:10.1145/3394486.3403331

Abstract

Speech synthesis (text to speech, TTS) and recognition (automatic speech recognition, ASR) are important speech tasks, and require a large amount of text and speech pairs for model training. However, there are more than 6,000 languages in the world and most languages are lack of speech training data, which poses significant challenges when building TTS and ASR systems for extremely low-resource languages. In this paper, we develop LRSpeech, a TTS and ASR system under the extremely low-resource setting, which can support rare languages with low data cost. LRSpeech consists of three key techniques: 1) pre-training on rich-resource languages and fine-tuning on low-resource languages; 2) dual transformation between TTS and ASR to iteratively boost the accuracy of each other; 3) knowledge distillation to customize the TTS model on a high-quality target-speaker voice and improve the ASR model on multiple voices. We conduct experiments on an experimental language (English) and a truly low-resource language (Lithuanian) to verify the effectiveness of LRSpeech. Experimental results show that LRSpeech 1) achieves high quality for TTS in terms of both intelligibility (more than $98%$ intelligibility rate) and naturalness (above 3.5 mean opinion score (MOS)) of the synthesized speech, which satisfy the requirements for industrial deployment, 2) achieves promising recognition accuracy for ASR, and 3) last but not least, uses extremely low-resource training data. We also conduct comprehensive analyses on LRSpeech with different amounts of data resources, and provide valuable insights and guidances for industrial deployment. We are currently deploying LRSpeech into a commercialized cloud speech service to support TTS on more rare languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LRSpeech

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

DNN-HMM Acoustic Modeling for Large Vocabulary Telugu Speech Recognition
Vishnu Vidyadhara Raju Vegesna ... Hari Krishna Vydana
-
Vishnu Vidyadhara Raju Vegesna, et. al.Vishnu Vidyadhara Raju Vegesna ... Hari Krishna Vydana
01 Jan 2017
01 Jan 2017

Automatic Speech Recognition Systems for Regional Languages in India
-
International Journal of Recent Technology and Engineering | VOL. 8
--
10 Aug 2019
International Journal of Recent Technology and Engineering | VOL. 8

Incorporating VAD into ASR System by Multi-task Learning
Meng Li ... Yan Xia
-
Meng Li, et. al.Meng Li ... Yan Xia
11 Dec 2022
11 Dec 2022

Uneven success: automatic speech recognition and ethnicity-related dialects
Alicia Beckford Wassink ... Isabel Bartholomew
Speech Communication | VOL. 140
Alicia Beckford Wassink, et. al.Alicia Beckford Wassink ... Isabel Bartholomew
12 Mar 2022
Speech Communication | VOL. 140

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LRSpeech

Abstract

Talk to us

Similar Papers