Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer

Weiqi Jiang,Qiaosheng Guo,Chengli Sun,Jiayi Sun,Yan Leng,Feilong Chen,Jiankun Peng

doi:10.3390/electronics12061330

Abstract

In recent years, Transformer has shown great performance in speech enhancement by applying multi-head self-attention to capture long-term dependencies effectively. However, the computation of Transformer is quadratic with the input speech spectrograms, which makes it computationally expensive for practical use. In this paper, we propose a low complexity hierarchical frame-level Swin Transformer network (FLSTN) for speech enhancement. FLSTN takes several consecutive frames as a local window and restricts self-attention within it, reducing the complexity to linear with spectrogram size. A shifted window mechanism enhances information exchange between adjacent windows, so that window-based local attention becomes disguised global attention. The hierarchical structure allows FLSTN to learn speech features at different scales. Moreover, we designed the band merging layer and the band expanding layer for decreasing and increasing the spatial resolution of feature maps, respectively. We tested FLSTN on both 16 kHz wide-band speech and 48 kHz full-band speech. Experimental results demonstrate that FLSTN can handle speech with different bandwidths well. With very few multiply–accumulate operations (MACs), FLSTN not only has a significant advantage in computational complexity but also achieves comparable objective speech quality metrics with current state-of-the-art (SOTA) models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Mar 10, 2023
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer

Abstract

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses
Shengkui Zhao ... Bin Ma
-
Shengkui Zhao, et. al.Shengkui Zhao ... Bin Ma
06 Jun 2021
06 Jun 2021

Utilizing neural network and critical band processing for speech enhancement
Pei Chee Yong ... Kit Yan Chan
-
Pei Chee Yong, et. al.Pei Chee Yong ... Kit Yan Chan
01 Dec 2017
01 Dec 2017

Application of a perceptual speech quality metric for link adaptation in wireless systems
B Rohani ... H.-J Zepernick
-
B Rohani, et. al.B Rohani ... H.-J Zepernick
20 Sep 2004
20 Sep 2004

DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement
Nasir Saleem ... Abeer D Algarni
Applied Acoustics | VOL. 224
Nasir Saleem, et. al.Nasir Saleem ... Abeer D Algarni
08 Jul 2024
Applied Acoustics | VOL. 224

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer

Abstract

Talk to us

Similar Papers

More From: Electronics