Abstract

While video analytics used in surveillance applications performs well in normal conditions, it may not work as accurately under adverse circumstances. Taking advantage of the complementary aspects of video and audio can lead to a more effective analytics framework resulting in increased system robustness. For example, sound scene analysis may indicate potential security risks outside field-of-view, pointing the camera in that direction. This paper presents a robust low-complexity method for two-microphone estimation of sound direction. While the source localization problem has been studied extensively, a reliable low-complexity solution remains elusive. The proposed direction estimation is based on the Generalized Cross-Correlation with Phase Transform (GCC-PHAT) method. The novel aspects of our approach include band-selective processing and inter-frame filtering of the GCC-PHAT objective function prior to peak detection. The audio bandwidth, microphone spacing, angle resolution, processing delay and complexity can all be adjusted depending on the application requirements. The described algorithm can be used in a multi-microphone configuration for spatial sound localization by combining estimates from microphone pairs. It has been implemented as a real-time demo on a modified TI DM8127 IP camera. The default 16 kHz audio sampling frequency requires about 5 MIPS processing power in our fixed-point implementation. The test results show robust sound direction estimation under a variety of background noise conditions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call