Abstract
In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. First, a maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. Second, the hidden layers of a deep neural network (DNN) are initialized using unsupervised pre-training of a multilingual deep belief network (DBN). The DNN is fine-tuned jointly using a modified cross entropy criterion that uses HMM state alignments from both target and source languages. Third, another DNN fine-tuning technique is explored where the training is performed in a sequential manner source language followed by the target language. Experiments conducted using varying amounts of target data indicate further improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.