A Multi-modal Panoramic Speaker Localization Method

Y Z Long,Z T Liu,Y W Jian

doi:10.1109/cac53003.2021.9728498

Abstract

As a supporting technology in the field of human-computer interaction, speaker localization method has been a research hotspot in recent years. However, the existing single-mode speaker location methods cannot meet the requirements of stability and rapidity. For improve the accuracy and effective positioning range of speaker localization system, a multi-modal panoramic speaker localization method is proposed in this paper. Firstly, the coarse position of the speaker was preliminarily obtained by sound source localization. Then, according to the initial positioning of the speaker, whether the image used for face recognition need to be stitched is decided. What’s more, the position of speaker is calculated by face detection and calibration. Finally, the results of voice and image localization are fused by coordinate transformation. The proposal is tested on self-built experimental system, and the experimental results prove the effectiveness of the proposed method.

Full Text