Abstract

The performance of speech beamformers relies on a good estimation of the relative transfer function (RTF) between the captured clean speech at each microphone. Most of the proposed RTF estimators make assumptions about the clean speech statistics or need a joint estimation of the RTF and the signal statistics. In this work we propose a minimum mean square error (MMSE) estimation of the RTF in an extended Kalman filter (eKF) framework. Our method exploits the knowledge about the RTF and noise statistics with no assumptions about the clean speech statistics. The proposed approach is evaluated when employed in combination with minimum variance distortionless response (MVDR) beamforming in a dual-microphone smartphone. To this end, a database of simulated dual-channel noisy speech recordings on a smartphone was used. Experimental results show that our approach achieves the most accurate RTF estimates among the evaluated methods, yielding less speech distortion and better intelligibility while competitive perceptual quality performance is obtained.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call