Sensing to Hear through Memory

Qian Zhang,Ke Liu,Dong Wang

doi:10.1145/3659598

Abstract

Speech enhancement on mobile devices is a very challenging task due to the complex environmental noises. Recent works using lip-induced ultrasound signals for speech enhancement open up new possibilities to solve such a problem. However, these multi-modal methods cannot be used in many scenarios where ultrasound-based lip sensing is unreliable or completely absent. In this paper, we propose a novel paradigm that can exploit the prior learned ultrasound knowledge for multi-modal speech enhancement only with the audio input and an additional pre-enrollment speaker embedding. We design a memory network to store the ultrasound memory and learn the interrelationship between the audio and ultrasound modality. During inference, the memory network is able to recall the ultrasound representations from audio input to achieve multi-modal speech enhancement without needing real ultrasound signals. Moreover, we introduce a speaker embedding module to further boost the enhancement performance as well as avoid the degradation of the recalling when the noise level is high. We adopt an end-to-end multi-task manner to train the proposed framework and perform extensive evaluations on the collected dataset. The results show that our method yields comparable performance with audio-ultrasound methods and significantly outperforms the audio-only methods.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies	Publication Date: May 13, 2024
Citations: 1	License type: mit

R Discovery Prime

R Discovery Prime

Sensing to Hear through Memory

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Lead the way for us

Similar Papers

Noise-management algorithm may improve speech intelligibility in noise
Francis K Kuk ... Carsten Paludan-Müller
The Hearing Journal | VOL. 59
Francis K Kuk, et. al.Francis K Kuk ... Carsten Paludan-Müller
01 Apr 2006
The Hearing Journal | VOL. 59

Speaker-Aware Target Speaker Enhancement by Jointly Learning with Speaker Embedding Extraction
Xuan Ji ... Dong Yu
-
Xuan Ji, et. al.Xuan Ji ... Dong Yu
01 May 2020
01 May 2020

Audio-visual speech enhancement using deep neural networks
Jen-Cheng Hou ... Ying-Hui Lai
-
Jen-Cheng Hou, et. al.Jen-Cheng Hou ... Ying-Hui Lai
01 Dec 2016
01 Dec 2016

Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments
Ahsan Adeel ... Amir Hussain
Information Fusion | VOL. 59
Ahsan Adeel, et. al.Ahsan Adeel ... Amir Hussain
19 Aug 2019
Information Fusion | VOL. 59

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Sensing to Hear through Memory

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies