LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic Speech Recognition

Guangyong Wei,Shiren Li,Zhikui Duan,Xinmei Yu,Guangguang Yang

doi:10.1109/lsp.2023.3241558

Abstract

A module using sliding window with deformablity, abbreviated as SWD, has been proposed for local feature enhancement. In particular, the proposed SWD module adopts windows with variable size based on the depth of the embedded network layers. Moreover, the proposed SWD module is inserted into the Transformer network, referred as LFEformer, for automatic speech recognition. Such network is particularly good at capturing both local and global features, and this is beneficial for model improvement. It is worth mentioning that the local and global features are extracted by SWD module and the attention mechanism in Transformer network, respectively. The effectiveness of the LFEformer has been validated on three widely used datasets, which are Aishell-1, HKUST and WSJ ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">dev93</i> / <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">eval92</i> ). The experimental results demonstrate that 0.5% CER, 0.8% CER and 0.7%/0.3% WER improvement can be obtained in the correspondent datasets.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters

Lead the way for us

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2023
Citations: 3

Similar Papers

Joint Coding of Local and Global Deep Features in Videos for Visual Search.
Lin Ding ... Yonghong Tian
IEEE Transactions on Image Processing | VOL. 29
Lin Ding, et. al.Lin Ding ... Yonghong Tian
01 Jan 2020
IEEE Transactions on Image Processing | VOL. 29

Author response: A connectomics-based taxonomy of mammals
Laura E Suarez ... Yossi Yovel
-
Laura E Suarez, et. al.Laura E Suarez ... Yossi Yovel
10 Oct 2022
10 Oct 2022

Weakly Supervised Local-Global Attention Network for Facial Expression Recognition
Haifeng Zhang ... Wen Su
IEEE Access | VOL. 8
Haifeng Zhang, et. al.Haifeng Zhang ... Wen Su
01 Jan 2020
IEEE Access | VOL. 8

Human action recognition using bag of global and local Zernike moment features
Saleh Aly ... Asmaa Sayed
Multimedia Tools and Applications | VOL. 78
Saleh Aly, et. al.Saleh Aly ... Asmaa Sayed
15 May 2019
Multimedia Tools and Applications | VOL. 78

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

LFEformer: Local Feature Enhancement Using Sliding Window With Deformability for Automatic Speech Recognition

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters