Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling

A Kumar,R.K Aggarwal

doi:10.1515/jisys-2018-0417

Abstract

Abstract This paper implements the continuous Hindi Automatic Speech Recognition (ASR) system using the proposed integrated features vector with Recurrent Neural Network (RNN) based Language Modeling (LM). The proposed system also implements the speaker adaptation using Maximum-Likelihood Linear Regression (MLLR) and Constrained Maximum likelihood Linear Regression (C-MLLR). This system is discriminatively trained by Maximum Mutual Information (MMI) and Minimum Phone Error (MPE) techniques with 256 Gaussian mixture per Hidden Markov Model(HMM) state. The training of the baseline system has been done using a phonetically rich Hindi dataset. The results show that discriminative training enhances the baseline system performance by up to 3%. Further improvement of ~7% has been recorded by applying RNN LM. The proposed Hindi ASR system shows significant performance improvement over other current state-of-the-art techniques.

Highlights

Automatic Speech Recognition (ASR) is the process of taking speech utterance and converting it into text sequence as close as possible
We found Gammatone Frequency Cepstral Coefficients (GFCC) features more robust in comparison to Mel Frequency Cepstral Coefficient (MFCC) and Perceptual Linear Predictive Analysis (PLP) features [15]
The results clearly show that the combination of MFCC+GFCC+Wavelet packet based ERB Cepstral features (WERBC) with Heteroscedastic Linear Discriminant Analysis (HLDA) transformation outperforms over all other feature combinations

Summary

Introduction

ASR is the process of taking speech utterance and converting it into text sequence as close as possible. There are various number of techniques available to extract the speech features such as Mel Frequency Cepstral Coefficient (MFCC) [12], Perceptual Linear Predictive Analysis (PLP) [20], Gammatone Frequency Cepstral Coefficients (GFCC) [43, 44], Linear Prediction Cepstral Coefficients (LPCC) [49], and wavelet-based feature extraction techniques [45] Among all these techniques, MFCC is more popular as it shows promising results in clean environment conditions, but the performance of MFCC deteriorates in noisy environmental conditions. MPE and MMI discriminative techniques were used to train the acoustic model, which gave significant performance gain. Integrated acoustic features significantly improve the accuracy over traditional features It discriminatively trains the integrated feature vector using MMI and MPE discriminative techniques. The remaining part of the paper is organized as follows: Section 2 explains the concept of different feature extraction techniques, speaker adaptation, discriminative techniques, and RNN LM.

Feature Extraction

Discriminative techniques

Proposed Architecture

Proposed integrated feature set

Discriminative training

Hindi Speech Corpus

Simulation details and experiment results

Performance analysis of multiple feature combination

System combination

Performance evaluation of different systems

Experiment with language modeling

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Intelligent Systems	Publication Date: Jul 28, 2020
Citations: 23	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Systems

Lead the way for us

Similar Papers

Discriminatively trained continuous Hindi speech recognition system using interpolated recurrent neural network language modeling
Mohit Dua ... R K Aggarwal
Neural Computing & Applications | VOL. 31
Mohit Dua, et. al.Mohit Dua ... R K Aggarwal
28 Apr 2018
Neural Computing & Applications | VOL. 31

Training RNN language models on uncertain ASR hypotheses in limited data scenarios
Imran Sheikh ... Irina Illina
Computer Speech & Language | VOL. 83
Imran Sheikh, et. al.Imran Sheikh ... Irina Illina
20 Aug 2023
Computer Speech & Language | VOL. 83

GFCC based discriminatively trained noise robust continuous ASR system for Hindi language
Mohit Dua ... Mantosh Biswas
Journal of Ambient Intelligence and Humanized Computing | VOL. 10
Mohit Dua, et. al.Mohit Dua ... Mantosh Biswas
07 May 2018
Journal of Ambient Intelligence and Humanized Computing | VOL. 10

Joint unsupervised adaptation of n-gram and RNN language models via LDA-based hybrid mixture modeling
Ryo Masumura ... Yushi Aono
-
Ryo Masumura, et. al.Ryo Masumura ... Yushi Aono
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discriminatively trained continuous Hindi speech recognition using integrated acoustic features and recurrent neural network language modeling

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Intelligent Systems