Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement

Hang Chen,Jun Du,Yu Hu,Li-Rong Dai,Bao-Cai Yin,Chin-Hui Lee

doi:10.1016/j.neunet.2021.06.003

Abstract

In this paper, we propose a visual embedding approach to improve embedding aware speech enhancement (EASE) by synchronizing visual lip frames at the phone and place of articulation levels. We first extract visual embedding from lip frames using a pre-trained phone or articulation place recognizer for visual-only EASE (VEASE). Next, we extract audio-visual embedding from noisy speech and lip frames in an information intersection manner, utilizing a complementarity of audio and visual features for multi-modal EASE (MEASE). Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that our proposed subword based VEASE approach is more effective than conventional embedding at the word level. Moreover, visual embedding at the articulation place level, leveraging upon a high correlation between place of articulation and lip shapes, demonstrates an even better performance than that at the phone level. Finally the experiments establish that the proposed MEASE framework, incorporating both audio and visual embeddings, yields significantly better speech quality and intelligibility than those obtained with the best visual-only and audio-only EASE systems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Journal: Neural Networks	Publication Date: Jun 8, 2021
Citations: 15

Similar Papers

Kalman Filtering with Machine Learning Methods for Speech Enhancement

-

04 May 2021
04 May 2021

Multi-Task Joint Learning for Embedding Aware Audio-Visual Speech Enhancement
Chenxi Wang ... Hang Chen
-
Chenxi Wang, et. al.Chenxi Wang ... Hang Chen
11 Dec 2022
11 Dec 2022

Pitch pattern matching based speech enhancement
Dongmei Wang ... John H L Hansen
The Journal of the Acoustical Society of America | VOL. 141
Dongmei Wang, et. al.Dongmei Wang ... John H L Hansen
01 May 2017
The Journal of the Acoustical Society of America | VOL. 141

Deep Learning for Minimum Mean-Square Error and Missing Data Approaches to Robust Speech Processing

-

04 Dec 2020
04 Dec 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Correlating subword articulation with lip shapes for embedding aware audio-visual speech enhancement

Abstract

Talk to us

Similar Papers

More From: Neural Networks