Abstract
This work presents a robust speaker's location detection algorithm using a single linear microphone array that is capable of detecting multiple speech sources under the assumption that there exist nonoverlapped speech segments among sources. Namely, the overlapped speech segments are treated as uncertainty and are not used for detection. The location detection algorithm is derived from a previous work (2006), where Gaussian mixture models (GMMs) are used to model location-dependent and content and speaker-independent phase difference distributions. The proposed algorithm is proven to be robust against the complex vehicular acoustics including noise, reverberation, near-filed, far-field, line-of-sight, and non-line-of-sight conditions, and microphones' mismatch. An adaptive system architecture is developed to adjust the Gaussian mixture (GM) location model to environmental noises. To deal with unmodeled speech sources as well as overlapped speech signals, a threshold adaptation scheme is proposed in this work. Experimental results demonstrate high detection accuracy in a noisy vehicular environment.
Highlights
Electronic systems, such as mobile phones, global positioning systems (GPS), CD or VCD players, air conditioners, and so forth, are becoming increasingly popular in vehicles
Speech recognition suffers from environmental noises, explaining why speech enhancement approaches using multiple microphones [4,5,6,7] have been introduced to purify speech signals in noisy environments
Except for the issues mentioned above, a location detection method that can deal with the non-line-of-sight condition, which is common in vehicular environments, is necessary
Summary
Electronic systems, such as mobile phones, global positioning systems (GPS), CD or VCD players, air conditioners, and so forth, are becoming increasingly popular in vehicles. The proposed system architecture can adapt the Gaussian mixture (GM) location models to the changes in online environmental noises even under low-SNR conditions. We may not want to or could not model all positions In this case, an unexpected speech signal which is not emitted from one of the modeled locations, such as the radio broadcasting from the in-car audio system and the speaker’s voices from unmodeled locations, could trigger the voice activity detector (VAD) in the system architecture, resulting in an incorrect detection of the speaker location. This work proposes a threshold-based location detection approach that utilizes the training signals and the trained GM location model parameters to determine a suitable length of testing sequence and obtain a threshold of the a posteriori probability for each location to resolve the two issues.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.