Nowadays, some of the most successful sound source separation methods are based on the assumption of sparse sources. A large number of those separation solutions consist of two parts: the mixing matrix estimation and the separation stages. Concerning the first part, many sparsity-based separation methods rely on the use of clustering techniques to identify the samples of the mixtures due to each sound source. With certain types of sources, such as speech, the assumption of sparsity is questionable and so, these stages do not perform correctly.In this paper, we present a new mixing matrix estimation procedure to overcome sparsity-based methods separating speech sources. Our novel proposal establishes a geometric relationship between the mixing parameters using some available information about the microphone array, such as, number and type of microphones or the distance between them. Using this relationship, the complex estimation of the level differences is avoided. Results demonstrate that our proposal outperforms mixing matrix estimation solutions in terms of both speech separation quality and speech intelligibility.
Read full abstract