Abstract

In video conferencing environments, it is very important to localize the talker. However, conventional audio signal‐based algorithms often suffer from audio interference, and conventional visual signal‐based algorithms fail in the presence of visual interference. To deal with these problems, this paper proposes a robust omnidirectional audio‐visual talker localization algorithm that not only exploits audio feature parameters, but also subordinately uses visual feature parameters. To achieve omnidirectional audio‐visual talker localization, paired‐omnidirectional microphones are employed as an audio sensor, and an omnidirectional camera is employed as a visual sensor. For robust talker localization, audio feature parameters are extracted using weighted cross‐power spectrum phase (CSP) analysis and CSP coefficient subtraction, and visual feature parameters are extracted using background subtraction and skin‐color detection. The talker is finally located by the fusing of weighted audio/visual feature parameters, and the weight of this feature parameter fusion is automatically controlled based on the reliable criterion of audio feature parameters. The results of localization experiments in an actual room revealed that the proposed audio‐visual talker localization algorithm is superior to that of conventional localizers using only audio or visual feature parameters, but not both. [Work supported by MEXT of Japan.]

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.