Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

David Sodoyer,Jacob Klinkisch,Jean-Luc Schwartz,Christian Jutten,Laurent Girin

doi:10.1155/s1110865702207015

Abstract

We present a new approach to the source separation problem in the case of multiple speech signals. The method is based on the use of automatic lipreading, the objective is to extract an acoustic speech signal from other acoustic signals by exploiting its coherence with the speaker′s lip movements. We consider the case of an additive stationary mixture of decorrelated sources, with no further assumptions on independence or non-Gaussian character. Firstly, we present a theoretical framework showing that it is indeed possible to separate a source when some of its spectral characteristics are provided to the system. Then we address the case of audio-visual sources. We show how, if a statistical model of the joint probability of visual and spectral audio input is learnt to quantify the audio-visual coherence, separation can be achieved by maximizing this probability. Finally, we present a number of separation results on a corpus of vowel-plosive-vowel sequences uttered by a single speaker, embedded in a mixture of other voices. We show that separation can be quite good for mixtures of 2, 3, and 5 sources. These results, while very preliminary, are encouraging, and are discussed in respect to their potential complementarity with traditional pure audio separation or enhancement techniques.

Highlights

There exists an intrinsic coherence and even a complementarity between audition and vision for speech perception [1]
We propose a new approach in the case of speech signal separation
The audio spectra envelopes were estimated by 20-order linear prediction (LP) models, which were calculated synchronously with the video parameters, on 32 ms frames, by using

Summary

Introduction

There exists an intrinsic coherence and even a complementarity between audition and vision for speech perception [1]. Visual cues can compensate to a certain extent the deficiency of the auditory ones. This explains that the fusion of auditory and visual information meets a great success in several speech applications, mainly in speech recognition in noisy environments [3]. In a previous work [4], we tested a slightly different idea, we presented a prototype system which was able to exploit the visual input to enhance the audio signal corrupted by acoustic additive white noise. We propose a new approach in the case of speech signal separation. This approach, presented, is based on the use of the bimodality of speech and on the intrinsic coherence between audio and video speech.

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Nov 28, 2002
Citations: 56	License type: cc-by

R Discovery Prime

R Discovery Prime

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Speech extraction based on ICA and audio-visual coherence
D Sodoyer ... L Girin
-
D Sodoyer, et. al.D Sodoyer ... L Girin
01 Jan 2003
01 Jan 2003

Speech signals separation: a new approach exploiting the coherence of audio and visual speech
L Girin ... A Allard
-
L Girin, et. al.L Girin ... A Allard
03 Oct 2001
03 Oct 2001

Detection of Noise Signal in the Additive Mixture Based on the Second-Order Cumulant
Anton Iehorovych Bereznytskyi
Electronic and Acoustic Engineering | VOL. 4
Anton Iehorovych BereznytskyiAnton Iehorovych Bereznytskyi
30 Jun 2021
Electronic and Acoustic Engineering | VOL. 4

SYNFACE - a talking face telephone
Inger Karlsson ... Giampiero Salvi
-
Inger Karlsson, et. al.Inger Karlsson ... Giampiero Salvi
01 Sep 2003
01 Sep 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing