Speaker Extraction With Co-Speech Gestures Cue

Zexu Pan,Haizhou Li,Xinyuan Qian

doi:10.1109/lsp.2022.3175130

Zexu Pan, Haizhou Li + Show 1 more

Open Access

https://doi.org/10.1109/lsp.2022.3175130

Copy DOI

Abstract

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech. There have been studies to use a pre-recorded speech sample or face image of the target speaker as the speaker cue. In human communication, co-speech gestures that are naturally timed with speech also contribute to speech perception. In this work, we explore the use of co-speech gestures sequence, e.g. hand and body movements, as the speaker cue for speaker extraction, which could be easily obtained from low-resolution video recordings, thus more available than face recordings. We propose two networks using the co-speech gestures cue to perform attentive listening on the target speaker, one that implicitly fuses the co-speech gestures cue in the speaker extraction process, the other performs speech separation first, followed by explicitly using the co-speech gestures cue to associate a separated speech to the target speaker. The experimental results show that the co-speech gestures cue is informative in associating with the target speaker.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Signal Processing Letters	Publication Date: Jan 1, 2022
Citations: 9	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Speaker Extraction With Co-Speech Gestures Cue

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters

Lead the way for us

Similar Papers

Selective Listening by Synchronizing Speech With Lips
Zexu Pan ... Ruijie Tao
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Zexu Pan, et. al.Zexu Pan ... Ruijie Tao
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

USEV: Universal Speaker Extraction With Visual Cue
Zexu Pan ... Meng Ge
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30
Zexu Pan, et. al.Zexu Pan ... Meng Ge
01 Jan 2021
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 30

Single Channel Target Speaker Extraction and Recognition with Speaker Beam
Marc Delcroix ... Keisuke Kinoshita
-
Marc Delcroix, et. al.Marc Delcroix ... Keisuke Kinoshita
01 Apr 2018
01 Apr 2018

L-SpEx: Localized Target Speaker Extraction
Meng Ge ... Haizhou Li
-
Meng Ge, et. al.Meng Ge ... Haizhou Li
23 May 2022
23 May 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Speaker Extraction With Co-Speech Gestures Cue

Abstract

Talk to us

Similar Papers

More From: IEEE Signal Processing Letters