Construction of English Speech Recognition Model by Fusing CNN and Random Deep Factorization TDNN

Shi Qiu

doi:10.1145/3597456

Abstract

In current society, speech recognition can perform a variety of functions, such as completing voice commands, enabling speech processing, spoken language translation and facilitating communication. Therefore, the study of speech recognition technology is of high value. However, current speech recognition techniques focus on among clearly expressed spoken words, which poses great challenges for recognition with spoken pronunciation or dialect pronunciation. Some scholars currently use a model combining time-delay neural networks and long and short-term memory networks to build speech recognition systems, but the performance in acoustic recognition is poor. Therefore, the study proposes a convolutional neural network (CNN), time-delay neural network (TDNN) and output-gate projected Gated recurrent by analyzing the deep neural network unit (OPGRU) combined with a composite English speech recognition model. The model can optimize the acoustic model after the introduction of CNN, and the model can accurately recognize pronunciation features and make the model have a wider recognition range. The proposed composite model is compared with the Word error rate (Wer) and runtime metrics in the Mozilla Common Voice dataset. The Wer result of the composite model is 23.42% and the running time is 1418 s. The Wer result of the composite model is 24.61% and the running time is 1385 s. Compared with the TDNN-OPGRU model, the Wer of the composite model decreases by 1.19% but the running time increases by 33 s. The accuracy of the composite model is higher than that of the TDNN-OPGRU model. From a comprehensive consideration, the speech recognition model accuracy has higher priority, so the composite model proposed in the study has better performance.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Construction of English Speech Recognition Model by Fusing CNN and Random Deep Factorization TDNN

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 23, 2023
Citations: 1

Similar Papers

Improved Tdnns Using Deep Kernels and Frequency Dependent Grid-RNNS
P C Woodland ... C Zhang
-
P C Woodland, et. al.P C Woodland ... C Zhang
01 Apr 2018
01 Apr 2018

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Rajesh Kumar Aggarwal ... Ankit Kumar
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Rajesh Kumar Aggarwal, et. al.Rajesh Kumar Aggarwal ... Ankit Kumar
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Performance Optimization of Speech Recognition System with Deep Neural Network Model
Wei Guan
Optical Memory and Neural Networks | VOL. 27
Wei Guan Wei Guan
01 Oct 2018
Optical Memory and Neural Networks | VOL. 27

Time Delay Neural Network for Myanmar Automatic Speech Recognition
Myat Aye Aye Aung ... Win Pa Pa
-
Myat Aye Aye Aung, et. al.Myat Aye Aye Aung ... Win Pa Pa
01 Feb 2020
01 Feb 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Construction of English Speech Recognition Model by Fusing CNN and Random Deep Factorization TDNN

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing