Deep Variational Generative Models for Audio-Visual Speech Separation

Viet-Nhat Nguyen,Mostafa Sadeghi,Elisa Ricci,Xavier Alameda-Pineda

doi:10.1109/mlsp52302.2021.9596406

Abstract

In this paper, we are interested in audio-visual speech separation given a single-channel audio recording as well as visual information (lips movements) associated with each speaker. We propose an unsupervised technique based on audio-visual generative modeling of clean speech. More specifically, during training, a latent variable generative model is learned from clean speech spectra using a variational auto-encoder (VAE). To better utilize the visual information, the posteriors of the latent variables are inferred from mixed speech (instead of clean speech) as well as the visual data. The visual modality also serves as a prior for latent variables, through a visual network. At test time, the learned generative model (both for speaker-independent and speaker-dependent scenarios) is combined with an unsupervised non-negative matrix factorization (NMF) variance model for background noise. All the latent variables and noise parameters are then estimated by a Monte Carlo expectation-maximization algorithm. Our experiments show that the proposed unsupervised VAE-based method yields better separation performance than NMF-based approaches as well as a supervised deep learning-based technique.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep Variational Generative Models for Audio-Visual Speech Separation

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A VARIANCE MODELING FRAMEWORK BASED ON VARIATIONAL AUTOENCODERS FOR SPEECH ENHANCEMENT
Simon Leglaive ... Radu Horaud
-
Simon Leglaive, et. al.Simon Leglaive ... Radu Horaud
01 Sep 2018
01 Sep 2018

Audio-Visual Speech Enhancement Using Conditional Variational Auto-Encoders
Mostafa Sadeghi ... Radu Horaud
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Mostafa Sadeghi, et. al.Mostafa Sadeghi ... Radu Horaud
01 Jan 2020
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

MENINGKATKAN KEMAMPUAN PEMAHAMAN DAN KOMUNIKASI MATEMATIS SISWA SEKOLAH MENENGAH ATAS MELALUI MODEL PEMBELAJARAN GENERATIF

-

01 Jan 2014
01 Jan 2014

Speech Dereverberation Using Variational Autoencoders
Deepak Baby ... Herve Bourlard
-
Deepak Baby, et. al.Deepak Baby ... Herve Bourlard
06 Jun 2021
06 Jun 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep Variational Generative Models for Audio-Visual Speech Separation

Abstract

Talk to us

Similar Papers