Abstract

Frequency and location dependent components in the speech signal can be decoupled by signal processing in the spherical harmonic domain. In this paper, a sparsity based method for joint source localization and separation method using online dictionary learning is proposed. Conventional sparsity based methods utilize an overcomplete dictionary to find a sparse linear combination of dictionary atoms. Online dictionary learning discussed herein, addresses the joint localization and separation problem by learning the dictionary atoms based on stochastic approximation. The location dependent terms present in the dictionary atoms at various frequencies are then clustered to find a robust estimate of number of sources and their locations. Using these estimates, the sources are separated from the mixture. Experiments on speech source localization and separation are conducted at various SNR. Performance evaluation scores like RMSE, log spectral distance and perceptual mean opinion scores indicate reasonable improvement over conventional methods for speech source separation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call