Abstract

Speech recognition technology is a multidisciplinary field, comprising signal processing, pattern recognition, acoustics, artificial intelligence, etc. Presently, speech recognition plays a vital role in human-computer interface in information technology. Due to the advancements of deep learning (DL) models, speech recognition system has received significant attention among researchers in several areas of speech recognition like mobile communication, voice recognition, and personal digital assistance. This paper presents an automated English speech recognition using dimensionality reduction and deep learning (AESR-DRDL) approach. The proposed AESR-DRDL technique involves a series of operations, namely, feature extraction, preprocessing, dimensionality reduction, and speech recognition. During feature extraction process, a hybridization of high-dimension rich feature vectors is derived from the speech as well as glottal-waveform signals by the use of MFCC, PLPC, and MVDR techniques. Besides, the high dimensionality of features can be reduced by the design of quasioppositional poor and rich optimization algorithm (QOPROA). Moreover, the Bidirectional Long Short-Term Memory (BiLSTM) technique is employed for speech recognition, and the optimal hyperparameter tuning of the Bidirectional Long Short-Term Memory technique can be chosen using Adagrad optimizer. For the dimensionality reduction technique, the quasioppositional poor and rich optimization algorithm (QOPROA) is applied. The performance validation of the AESR-DRDL technique is carried out against benchmark datasets, and the results reported the better performance of the AESR-DRDL technique compared to recent approaches. The AESR-DRDL technique has shown to be superior in terms of recovery time, with an average of 0.50 days. The AESR-DRDL method's overall performance has been validated using benchmark datasets, and the results show that it outperforms more current technique. Because of this, the AESR-DRDL approach can be used to recognize English speech.

Highlights

  • In order to conquer this challenge, deep learning (DL) architecture has gained much recognition in speech recognition research. e DL method is a subdivision of machine-learning (ML) technique, which employs a group of processes that try to model higher-level abstraction through a deep graph with various processing layers, consisting of many linear and nonlinear conversions [6]

  • An effective AESR-DRDL technique has been developed for the recognition of English speech signals. e proposed AESR-DRDL technique incorporates several stages of operations like preprocessing, feature extraction, quasioppositional poor and rich optimization algorithm (QOPROA)-based feature selection, Adagrad based hyperparameter optimization, and Bidirectional Long Short-Term Memory (BiLSTM) based speech recognition. e design of QOPROA model is used for reducing the dimensionality of the features and improving the recognition performance

  • An effective AESR-DRDL technique has been developed for the recognition of English speech signals. e proposed AESR-DRDL technique incorporates several stages of operations, namely, preprocessing, feature extraction, QOPROA-based feature selection, BiLSTM based speech recognition, and Adagrad based hyperparameter optimization

Read more

Summary

Introduction

Voice is widely employed and is considered of the significant data while interacting with people. E DL method is a subdivision of machine-learning (ML) technique, which employs a group of processes that try to model higher-level abstraction through a deep graph with various processing layers, consisting of many linear and nonlinear conversions [6] It is the capability of a system or software program to identify words spoken audibly and turn them into legible text that is known as speech recognition or voice-to-text technology. En, it uses TL method for using large number of English and monolingual Mandarin information for compensating the sparsity problem of code-switching tasks It is used in hybrid automatic speech recognition (ASR) systems to transfer knowledge from one language to another. Weng et al [10] presented attention-based sequence-tosequence method for end-to-end speech recognition system They proposed an input-feeding framework that feeds previous decoder hidden state data and context vector as input to the decoder. In [15], a feature representation learning architecture has been proposed in this study. is method is encompassing the usage of combination of various extracted feature representations with Compact Bilinear Pooling (CBP), Automated Speech Recognition (ASR), DNN as feature extractor, and last inference through optimized RNN classifiers

The Proposed Model
Feature Extraction
Dimensionality Reduction Using QOPROA
Speech Recognition Using Optimal BiLSTM Model
Experimental Validation
Findings
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call