A Bidirectional Context Embedding Transformer for Automatic Speech Recognition

Lyuchao Liao,Zhifeng Chen,Yongqiang Wang,Francis Afedzie Kwofie,Dongmei Hu,Yuyuan Lin,Guangjie Han

doi:10.3390/info13020069

Abstract

Transformers have become popular in building end-to-end automatic speech recognition (ASR) systems. However, transformer ASR systems are usually trained to give output sequences in the left-to-right order, disregarding the right-to-left context. Currently, the existing transformer-based ASR systems that employ two decoders for bidirectional decoding are complex in terms of computation and optimization. The existing ASR transformer with a single decoder for bidirectional decoding requires extra methods (such as a self-mask) to resolve the problem of information leakage in the attention mechanism This paper explores different options for the development of a speech transformer that utilizes a single decoder equipped with bidirectional context embedding (BCE) for bidirectional decoding. The decoding direction, which is set up at the input level, enables the model to attend to different directional contexts without extra decoders and also alleviates any information leakage. The effectiveness of this method was verified with a bidirectional beam search method that generates bidirectional output sequences and determines the best hypothesis according to the output score. We achieved a word error rate (WER) of 7.65%/18.97% on the clean/other LibriSpeech test set, outperforming the left-to-right decoding style in our work by 3.17%/3.47%. The results are also close to, or better than, other state-of-the-art end-to-end models.

Highlights

Automatic speech recognition (ASR) is the process whereby an algorithm is used to generate a sequence of words from a given speech signal
We propose to explore different options and implement an improved speech transformer that relies on a single decoder equipped with bidirectional context embedding (BCE) for bidirectional decoding
We explored different options and implemented Bidirectional Context Embedding Transformer (Bi-CET), an improved later began to rise with much longer sequences

Summary

Introduction

Automatic speech recognition (ASR) is the process whereby an algorithm is used to generate a sequence of words from a given speech signal. Traditional ASR systems usually consist of independent parts, such as an acoustic model, a pronunciation model, and a language model. These parts are trained separately and combined for model inference. The multi-head self-attention, which is a major component of the transformer, learns to directly connect related positions in the entire sequence. This allows the network to exploit longrange dependencies regardless of distance. This attention-based network has been found to be more parallelizable and can be trained faster than other end-to-end models, which are mostly based on recurrent neural networks (RNN)

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Jan 29, 2022
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Bidirectional Context Embedding Transformer for Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Theoretical Analysis of Diversity in an Ensemble of Automatic Speech Recognition Systems
Kartik Audhkhasi ... Shrikanth S Narayanan
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22
Kartik Audhkhasi, et. al.Kartik Audhkhasi ... Shrikanth S Narayanan
01 Mar 2014
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 22

Interaction between people with dysarthria and speech recognition systems: A review
Aisha Jaddoh ... Omer Rana
Assistive Technology | VOL. 35
Aisha Jaddoh, et. al.Aisha Jaddoh ... Omer Rana
16 Apr 2022
Assistive Technology | VOL. 35

An Investigation of Multilingual TDNN-BLSTM Acoustic Modeling for Hindi Speech Recognition
Ankit Kumar ... Rajesh Kumar Aggarwal
International Journal of Sensors, Wireless Communications and Control | VOL. 12
Ankit Kumar, et. al.Ankit Kumar ... Rajesh Kumar Aggarwal
01 Jan 2021
International Journal of Sensors, Wireless Communications and Control | VOL. 12

Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling
G Thimmaraja Yadava ... H S Jayanna
International Journal of Speech Technology | VOL. 23
G Thimmaraja Yadava, et. al.G Thimmaraja Yadava ... H S Jayanna
22 Jan 2020
International Journal of Speech Technology | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Bidirectional Context Embedding Transformer for Automatic Speech Recognition

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information