An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

Muhammad Babar Kamal,Faizan Ahmed Khan,Chitapong Wechtaisong,Muhammad Junaid Ali,Malik Muhammad Ali Shahid,Arfat Ahmad Khan,Muhammad Daud Kamal,Peerapong Uthansakul

doi:10.32604/cmc.2022.024590

Abstract

The deep learning advancements have greatly improved the performance of speech recognition systems, and most recent systems are based on the Recurrent Neural Network (RNN). Overall, the RNN works fine with the small sequence data, but suffers from the gradient vanishing problem in case of large sequence. The transformer networks have neutralized this issue and have shown state-of-the-art results on sequential or speech-related data. Generally, in speech recognition, the input audio is converted into an image using Mel-spectrogram to illustrate frequencies and intensities. The image is classified by the machine learning mechanism to generate a classification transcript. However, the audio frequency in the image has low resolution and causing inaccurate predictions. This paper presents a novel end-to-end binary view transformer-based architecture for speech recognition to cope with the frequency resolution problem. Firstly, the input audio signal is transformed into a 2D image using Mel-spectrogram. Secondly, the modified universal transformers utilize the multi-head attention to derive contextual information and derive different speech-related features. Moreover, a feed-forward neural network is also deployed for classification. The proposed system has generated robust results on Google's speech command dataset with an accuracy of 95.16% and with minimal loss. The binary-view transformer eradicates the eventuality of the over-fitting problem by deploying a multi-view mechanism to diversify the input data, and multi-head attention captures multiple contexts from the data's feature map.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computers, Materials & Continua	Publication Date: Jan 1, 2022
Citations: 10	License type: cc-by

R Discovery Prime

R Discovery Prime

An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

Abstract

Talk to us

Similar Papers

More From: Computers, Materials & Continua

Lead the way for us

Similar Papers

HEKWS: Privacy-Preserving Convolutional Neural Network-based Keyword Spotting with a Ciphertext Packing Technique
Daniel L Elworth ... Sunwoong Kim
-
Daniel L Elworth, et. al.Daniel L Elworth ... Sunwoong Kim
26 Sep 2022
26 Sep 2022

Neural Architecture Search for Keyword Spotting
Tong Mo ... Shangling Jui
-
Tong Mo, et. al.Tong Mo ... Shangling Jui
25 Oct 2020
25 Oct 2020

CNN Based Automatic Speech Recognition: A Comparative Study
Hilal Ilgaz ... Beyza Akkoyun
ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal | VOL. 13
Hilal Ilgaz, et. al.Hilal Ilgaz ... Beyza Akkoyun
27 Aug 2024
ADCAIJ: Advances in Distributed Computing and Artificial Intelligence Journal | VOL. 13

Gated Convolutional LSTM for Speech Commands Recognition
Dong Wang ... Xinye Lin
-
Dong Wang, et. al.Dong Wang ... Xinye Lin
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Innovative Approach Utilizing Binary-View Transformer for Speech Recognition Task

Abstract

Talk to us

Similar Papers

More From: Computers, Materials &amp; Continua

More From: Computers, Materials & Continua