Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Mohammad Ali Humayun,Sohaib Hassan Khan,Syed Muslim Shah,Junaid Shuja,Ibrahim Hameed,Saad Bin Ahmed,Irfan Zafar

doi:10.3390/app9091956

Abstract

Automatic Speech Recognition, (ASR) has achieved the best results for English, with end-to-end neural network based supervised models. These supervised models need huge amounts of labeled speech data for good generalization, which can be quite a challenge to obtain for low-resource languages like Urdu. Most models proposed for Urdu ASR are based on Hidden Markov Models (HMMs). This paper proposes an end-to-end neural network model, for Urdu ASR, regularized with dropout, ensemble averaging and Maxout units. Dropout and ensembles are averaging techniques over multiple neural network models while Maxout are units in a neural network which adapt their activation functions. Due to limited labeled data, Semi Supervised Learning (SSL) techniques are also incorporated to improve model generalization. Speech features are transformed into a lower dimensional manifold using an unsupervised dimensionality-reduction technique called Locally Linear Embedding (LLE). Transformed data along with higher dimensional features is used to train neural networks. The proposed model also utilizes label propagation-based self-training of initially trained models and achieves a Word Error Rate (WER) of 4% less than that reported as the benchmark on the same Urdu corpus using HMM. The decrease in WER after incorporating SSL is more significant with an increased validation data size.

Highlights

Automatic Speech Recognition (ASR) can be a vital component in artificially-intelligent interactive systems
Word Error Rate (WER) down to 22% is achieved by the proposed Supervised Learning (SSL)-Neural Network (SSL-Neural Networks (NN)) model for speaker
WER down to 22% is achieved by the proposed SSL-Neural Network (SSL-NN) model for independent setup as compared to 25.42% achieved by Hidden Markov Models (HMMs) on the same corpus [3]

Summary

Introduction

Automatic Speech Recognition (ASR) can be a vital component in artificially-intelligent interactive systems. The unsupervised learning generally performs clustering, density estimation and dimension reduction tasks Utilizing both supervised and unsupervised techniques for data classification is called. Rate (WER) using HMM models for speech recognition on the corpus in speaker independent setup with 90% of speech used as training and 10% as test data [3]. This paper describes the performance of an end-to-end neural network-based speech recognition model tested on the same corpus. The model is tested using as low as 50% of the available corpus as training data for the first time and the performance does not deteriorate drastically with the limited training data portion because of SSL This is quite significant for low-resource languages like Urdu. The conclusion and scope for future work is presented at the end

Deep Learning

Semi-Supervised Learning

System Model

Neural

Results and and Analysis

Neural Network Architecture Analysis

Evaluation of LLE and Self-Training

Discussion and Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: May 13, 2019
Citations: 12	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Semi-supervised Deep Learning for Network Anomaly Detection
Yuanyuan Sun ... Yongming Wang
-
Yuanyuan Sun, et. al.Yuanyuan Sun ... Yongming Wang
01 Jan 2020
01 Jan 2020

Facies classification using semi-supervised deep learning with pseudo-labeling strategy
Asghar Saleem ... Joongmoo Byun
-
Asghar Saleem, et. al.Asghar Saleem ... Joongmoo Byun
10 Aug 2019
10 Aug 2019

Semi-Supervised Transfer Learning for Language Expansion of End-to-End Speech Recognition Models to Low-Resource Languages
Jiyeon Kim ... Chanwoo Kim
-
Jiyeon Kim, et. al.Jiyeon Kim ... Chanwoo Kim
13 Dec 2021
13 Dec 2021

Template-Based Continuous Speech Recognition
Mathias De Wachter ... Mike Matton
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15
Mathias De Wachter, et. al.Mathias De Wachter ... Mike Matton
01 May 2007
IEEE Transactions on Audio, Speech and Language Processing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Regularized Urdu Speech Recognition with Semi-Supervised Deep Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences