Abstract

As a supporting technology in the field of human-computer interaction, speaker localization method has been a research hotspot in recent years. However, the existing single-mode speaker location methods cannot meet the requirements of stability and rapidity. For improve the accuracy and effective positioning range of speaker localization system, a multi-modal panoramic speaker localization method is proposed in this paper. Firstly, the coarse position of the speaker was preliminarily obtained by sound source localization. Then, according to the initial positioning of the speaker, whether the image used for face recognition need to be stitched is decided. What’s more, the position of speaker is calculated by face detection and calibration. Finally, the results of voice and image localization are fused by coordinate transformation. The proposal is tested on self-built experimental system, and the experimental results prove the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.