Abstract

This paper introduces an approach for online speech source clustering and separation, which is based on the utilization of the multichannel location information in a recursive expectation maximization (EM) algorithm. Specifically, the normalized multichannel speech-recording vector is employed as a feature vector and is modeled using Watson mixture model. The model parameters are determined by maximizing the data likelihood at every time-frequency slot in an online processing manner. Consequently, the proposed approach can continuously adjust the speech clusters. Promising results showing the advantage of the proposed approach over the batch EM algorithm in the case of two speakers with speaker movement are obtained.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.