Abstract
This paper presents a sound source (talker) localization method using only a single microphone based upon maximum likelihood. In our previous work, we proposed GMM (Gaussian Mixture Model) separation for estimation of the sound source direction, where the observed (reverberant) speech is separated into the acoustic transfer function and the clean speech GMM, and showed its effectiveness for the single-talker localization task. In this paper, we discuss a multi-talker localization method using GMM separation and model composition. Model composition is used to represent speech signals observed in a reverberant environment corresponding to every conceivable combination of positions of the sound sources, where composite models are obtained through composition of talker's speech model and acoustic transfer functions estimated using GMM separation. For each test data set, we find a maximum-likelihood model from among the composite models corresponding to each combination of talkers' positions. The effectiveness of this method has been confirmed by two-talker localization experiments performed in a room environment.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.