A separation and interaction framework for causal multi-channel speech enhancement

Wenzhe Liu,Andong Li,Chengshi Zheng,Xiaodong Li

doi:10.1016/j.dsp.2022.103519

Abstract

Multi-channel speech enhancement aims at extracting the desired speech using a microphone array, which has many potential applications, such as video conferencing, automatic speech recognition, and hearing aids. Recently, deep learning-based spatial filters have achieved remarkable improvements over traditional beamformers, and the desired speech is often inferred directly using the noisy features without modeling the interference. In this work, a novel two-stage framework is proposed to extract the desired speech under the guidance of both the estimated interference and the desired signal. The resulting framework, called a Separation and Interaction Network (SI-Net), includes two components: the first module separates speech and interference coarsely, and the second sub-network serves as the post-processing module to suppress the residual noise and regenerate some missing speech components simultaneously under the guidance of previously estimated speech and interference characters. Because these two modules are both differentiable, the proposed framework can be trained in an end-to-end manner. In addition, a causal spatial-temporal attention module is designed to effectively model the inter-channel and inter-frame correlations simultaneously. Moreover, under this framework, we adopt the channel shuffle and gated fusion strategies for the interaction between speech and interference components to deliver the knowledge about both “where to suppress and where to enhance”. Experiments conducted on the simulated multi-channel speech dataset illustrate the superiority of the proposed framework over state-of-the-art baselines, while can still support real-time processing.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A separation and interaction framework for causal multi-channel speech enhancement

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing

Lead the way for us

Journal: Digital Signal Processing	Publication Date: Mar 10, 2022
Citations: 4

Similar Papers

A multi-channel speech enhancement framework for robust NMF-based speech recognition for speech-impaired users
Gert Dekkers ... Jort F Gemmeke
-
Gert Dekkers, et. al.Gert Dekkers ... Jort F Gemmeke
06 Sep 2015
06 Sep 2015

A joint time-space-frequency filtering framework for multichannel speech enhancement via complex-valued tensor representations
Xiangyu Jia ... Zhongfu Ye
Applied Acoustics | VOL. 145
Xiangyu Jia, et. al.Xiangyu Jia ... Zhongfu Ye
26 Oct 2018
Applied Acoustics | VOL. 145

MASS: Microphone Array Speech Simulator in Room Acoustic Environment for Multi-Channel Speech Coding and Enhancement
Rui Cheng ... Changchun Bao
Applied Sciences | VOL. 10
Rui Cheng, et. al.Rui Cheng ... Changchun Bao
21 Feb 2020
Applied Sciences | VOL. 10

Speaker Adaptation for Multichannel End-to-End Speech Recognition
Tsubasa Ochiai ... John Hershey
-
Tsubasa Ochiai, et. al.Tsubasa Ochiai ... John Hershey
01 Apr 2018
01 Apr 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A separation and interaction framework for causal multi-channel speech enhancement

Abstract

Talk to us

Similar Papers

More From: Digital Signal Processing