Abstract

This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.

Highlights

  • Electronic systems, such as mobile phones, global positioning systems (GPS), CD or VCD players, air conditioners, and so forth, are becoming increasingly popular in vehicles

  • Speech recognition suffers from environmental noises, explaining why speech enhancement approaches using multiple microphones [4,5,6,7] have been introduced to purify speech signals in noisy environments

  • Except for the issues mentioned above, a location detection method that can deal with the non-line-of-sight condition, which is common in vehicular environments, is necessary

Read more

Summary

INTRODUCTION

Electronic systems, such as mobile phones, global positioning systems (GPS), CD or VCD players, air conditioners, and so forth, are becoming increasingly popular in vehicles. The proposed system architecture can adapt the Gaussian mixture (GM) location models to the changes in online environmental noises even under low-SNR conditions. We may not want to or could not model all positions In this case, an unexpected speech signal which is not emitted from one of the modeled locations, such as the radio broadcasting from the in-car audio system and the speaker’s voices from unmodeled locations, could trigger the voice activity detector (VAD) in the system architecture, resulting in an incorrect detection of the speaker location. This work proposes a threshold-based location detection approach that utilizes the training signals and the trained GM location model parameters to determine a suitable length of testing sequence and obtain a threshold of the a posteriori probability for each location to resolve the two issues.

Overall system architecture
Frequency band divisions based on a uniform linear microphone array
GM location model description
GM location models training procedure and parameters estimation
Location detection method
EXPERIMENTAL RESULTS
CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.