Single microphone speech separation by diffusion-based HMM estimation

Yochay R Yeminy,Sharon Gannot,Yosi Keller

doi:10.1186/s13636-016-0094-9

Yochay R Yeminy, Sharon Gannot + Show 1 more

Open Access

https://doi.org/10.1186/s13636-016-0094-9

Copy DOI

Abstract

We present a novel non-iterative and rigorously motivated approach for estimating hidden Markov models (HMMs) and factorial hidden Markov models (FHMMs) of high-dimensional signals. Our approach utilizes the asymptotic properties of a spectral, graph-based approach for dimensionality reduction and manifold learning, namely the diffusion framework. We exemplify our approach by applying it to the problem of single microphone speech separation, where the log-spectra of two unmixed speakers are modeled as HMMs, while their mixture is modeled as an FHMM. We derive two diffusion-based FHMM estimation schemes. One of which is experimentally shown to provide separation results that compare with contemporary speech separation approaches based on HMM. The second scheme allows a reduced computational burden.

Highlights

Single-channel speech separation (SCSS) is one of the most challenging tasks in speech processing, where the aim is to unmix two or more concurrently speaking subjects, whose audio mixture is acquired by a single microphone
5 Experimental results The proposed hybrid FHMM (HFHMM) and dual Factorial hidden Markov model (FHMM) (DFHMM) schemes were experimentally verified by studying common state-of-theart speech separation tasks
The proposed schemes are compared to the separation scheme proposed by Roweis [36], the iterative FHMM-based estimator by Hu and Wang [38], and to the MIXMAX estimator by Radfar and Dansereau [35]

Summary

Introduction

Single-channel speech separation (SCSS) is one of the most challenging tasks in speech processing, where the aim is to unmix two or more concurrently speaking subjects, whose audio mixture is acquired by a single microphone. Single-channel speech separation was studied by several schools of thought, where computational auditory scene analysis (CASA) proved to be among the most effective. CASA-based methods are motivated by the ability of the human auditory system to separate acoustic events, even when using a single ear ( binaural hearing is advantageous). CASA techniques imitate the human auditory filtering known as cochlear filtering, where time-frequency bins of the speech mixture are clustered using psychoacoustic cues such as the pitch period, temporal continuity, onsets and offsets, etc. The clustering associates each time-frequency bin with a particular source. The time-frequency bins associated with the desired source are retained, while those associated with

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Audio, Speech, and Music Processing	Publication Date: Oct 18, 2016
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Single microphone speech separation by diffusion-based HMM estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing

Lead the way for us

Similar Papers

Factorial hidden Markov models and the generalized backfitting algorithm.
Robert A Jacobs ... Wenxin Jiang
Neural computation | VOL. 14
Robert A Jacobs, et. al.Robert A Jacobs ... Wenxin Jiang
01 Oct 2002
Neural computation | VOL. 14

Hidden Markov models with factored Gaussian mixtures densities
Hao-Zheng Li ... Xiang-Hua Zhu
Pattern Recognition | VOL. 38
Hao-Zheng Li, et. al.Hao-Zheng Li ... Xiang-Hua Zhu
05 Jul 2005
Pattern Recognition | VOL. 38

Infinite Factorial Unbounded-State Hidden Markov Model.
Isabel Valera ... Francisco J.R Ruiz
IEEE transactions on pattern analysis and machine intelligence | VOL. 38
Isabel Valera, et. al.Isabel Valera ... Francisco J.R Ruiz
09 Nov 2015
IEEE transactions on pattern analysis and machine intelligence | VOL. 38

Factorial HMMs for acoustic modeling
B Logan ... P Moreno
-
B Logan, et. al.B Logan ... P Moreno
12 May 1998
12 May 1998

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Single microphone speech separation by diffusion-based HMM estimation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Audio, Speech, and Music Processing