Abstract

Hands-free acquisition of speech is required in many human-machine interfaces and communication systems. The signals received by integrated microphones contain a desired speech signal, spatially coherent interfering signals, and background noise. In order to enhance the desired speech signal, state-of-the-art techniques apply data-dependent spatial filters which require the second order statistics (SOS) of the desired signal, the interfering signals and the background noise. As the number of sources and the reverberation time increase, the estimation accuracy of the SOS deteriorates, often resulting in insufficient noise and interference reduction. In this paper, a signal extraction framework with distributed microphone arrays is developed. An expectation maximization (EM)-based algorithm detects the number of coherent speech sources and estimates source clusters using time-frequency (TF) bin-wise position estimates. Subsequently, the second order statistics (SOS) are estimated using bin-wise speech presence probability (SPP) and a source probability for each source. Finally, a desired source is extracted using a minimum variance distortionless response (MVDR) filter, a multichannel Wiener filter (MWF) and a parametric multichannel Wiener filter (PMWF). The same framework can be employed for source separation, where a spatial filter is computed for each source considering the remaining sources as interferers. Evaluation using simulated and measured data demonstrates the effectiveness of the framework in estimating the number of sources, clustering, signal enhancement, and source separation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call