Model-Based Expectation-Maximization Source Separation and Localization

M.I Mandel,R.J Weiss,D Ellis

doi:10.1109/tasl.2009.2029711

Abstract

This paper describes a system, referred to as model-based expectation-maximization source separation and localization (MESSL), for separating and localizing multiple sound sources from an underdetermined reverberant two-channel recording. By clustering individual spectrogram points based on their interaural phase and level differences, MESSL generates masks that can be used to isolate individual sound sources. We first describe a probabilistic model of interaural parameters that can be evaluated at individual spectrogram points. By creating a mixture of these models over sources and delays, the multi-source localization problem is reduced to a collection of single source problems. We derive an expectation-maximization algorithm for computing the maximum-likelihood parameters of this mixture model, and show that these parameters correspond well with interaural parameters measured in isolation. As a byproduct of fitting this mixture model, the algorithm creates probabilistic spectrogram masks that can be used for source separation. In simulated anechoic and reverberant environments, separations using MESSL produced on average a signal-to-distortion ratio 1.6 dB greater and perceptual evaluation of speech quality (PESQ) results 0.27 mean opinion score units greater than four comparable algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Model-Based Expectation-Maximization Source Separation and Localization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing

Lead the way for us

Journal: IEEE Transactions on Audio, Speech, and Language Processing	Publication Date: Feb 1, 2010
Citations: 300

Similar Papers

Modelling interaural level and phase cues with Student's t-distribution for robust clustering in MESSL
Zeinab Zohny ... Jonathon Chambers
-
Zeinab Zohny, et. al.Zeinab Zohny ... Jonathon Chambers
01 Aug 2014
01 Aug 2014

Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance
Muhammad Salman Khan ... Jonathon Chambers
-
Muhammad Salman Khan, et. al.Muhammad Salman Khan ... Jonathon Chambers
01 Jul 2013
01 Jul 2013

Interference Reduction in Reverberant <newline/>Speech Separation With Visual <newline/>Voice Activity Detection
Qingju Liu ... Andrew J Aubrey
IEEE Transactions on Multimedia | VOL. 16
Qingju Liu, et. al.Qingju Liu ... Andrew J Aubrey
01 Oct 2014
IEEE Transactions on Multimedia | VOL. 16

Author response: Development of frequency tuning shaped by spatial cue reliability in the barn owl’s auditory midbrain
Keanu Shadron ... José Luis Peña
-
Keanu Shadron, et. al.Keanu Shadron ... José Luis Peña
30 Mar 2023
30 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Model-Based Expectation-Maximization Source Separation and Localization

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Audio, Speech, and Language Processing