Abstract

It is very important for a hands-free speech interface to capture distant speech with high quality. A microphone array is an ideal candidate for this purpose. However, this approach requires localizing the target talker. Conventional talker localization methods in multiple sound source environments not only have difficulty localizing the multiple sound sources accurately, but also have difficulty localizing the target talker among known multiple sound source positions. To cope with these problems, we propose a new talker localization method consisting of two algorithms. One algorithm is for multiple sound source localization based on CSP (cross-power spectrum phase) analysis. The other algorithm is for sound source identification among localized multiple sound sources towards talker localization. We particularly focus on the latter statistical sound source identification among localized multiple sound sources with statistical speech and environmental sound models based on GMMs (Gaussian mixture models) and a microphone array towards talker localization. We especially evaluate the performance of the proposed algorithms with the RWCP sound scene database in real acoustic environments (RWCP-DB).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call