Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

A Madhavaraj,A G Ramakrishnan

doi:10.1109/ncc.2019.8732237

Abstract

We present two approaches to improve the performance of automatic speech recognition (ASR) systems for Gujarati, Tamil and Telugu. In the first approach using data-pooling with phone mapping (DP-PM), a deep neural network (DNN) is trained to predict the senones for the target language; then we use the feature vectors and their alignments from other source languages to map the phones from the source to the target language. The lexicons of the source languages are then modified using this phone mapping and an ASR system for the target language is trained using both the target and the modified source data. This DP-PM approach gives relative improvements in word error rates (WER) of 5.1% for Gujarati, 3.1% for Tamil and 3.4% for Telugu, over the corresponding baseline figures. In the second approach using multi-task DNN (MT-DNN) modeling, we use feature vectors from all the languages and train a DNN with three output layers, each predicting the senones of one of the languages. Objective functions of the output layers are modified such that during training, only those DNN layers responsible for predicting the senones of a language are updated, if the feature vector belongs to that language. This MT-DNN approach achieves relative improvements in WER of 5.7%, 3.3% and 5.2% for Gujarati, Tamil and Telugu, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models
A. Paats ... I. Fridolin
Journal of Digital Imaging | VOL. 31
A. Paats, et. al.A. Paats ... I. Fridolin
30 Apr 2018
Journal of Digital Imaging | VOL. 31

Time scale modification and vocal tract length normalization for improving the performance of Tamil speech recognition system implemented using language independent segmentation algorithm
S Saraswathi ... T V Geetha
International Journal of Speech Technology | VOL. 9
S Saraswathi, et. al.S Saraswathi ... T V Geetha
01 Dec 2006
International Journal of Speech Technology | VOL. 9

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
Masoud Geravanchizadeh ... Meysam Bashirpour
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021
Masoud Geravanchizadeh, et. al.Masoud Geravanchizadeh ... Meysam Bashirpour
04 Aug 2021
EURASIP Journal on Audio, Speech, and Music Processing | VOL. 2021

Combined speech enhancement and auditory modelling for robust distributed speech recognition
Ronan Flynn ... Edward Jones
Speech Communication | VOL. 50
Ronan Flynn, et. al.Ronan Flynn ... Edward Jones
20 May 2008
Speech Communication | VOL. 50

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data-pooling and multi-task learning for enhanced performance of speech recognition systems in multiple low resourced languages

Abstract

Talk to us

Similar Papers