Abstract

This paper describes frequency-domain non-linear beamformers that can extract a target speech source from among multiple interfering speech sources when there are fewer microphones than sources (the under-determined case). Our approach models the data in each frequency bin via Gaussian mixture distributions, which can be learnt using the expectation maximisation (EM) algorithm. A non-linear beamformer is then developed, based on this model. The proposed non-linear beamformer is a non-linear weighted sum of linear minimum mean square error (MMSE) or minimum variance distortionless response (MVDR) beamformers. The resulting beamformer requires the direction of arrival of the target speech source to be known in advance, but the number of interferers does not need to be known or estimated. Simulations of the non-linear beamformers in under-determined mixtures with room reverberation confirm its capability to successfully separate speech sources.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call