Abstract

This paper presents a general framework for localization of multiple sound sources in Cartesian coordinate with perception sensor network (PSN). PSN consists of Kinect sensor that has a color camera, a depth camera, and an internal microphone array and our experimental pan-tilt-zoom camera with attached microphone array. Sound localization with PSN is based on three-stage analysis. Short-time narrowband directional sound localization based on Phase difference of arrival (PDOA) is obtained in every time-frequency points, utilizing the sparseness assumption of audio mixtures. Multi-sensor directional localizations are transformed to Cartesian coordinate by a simple triangulation. The results are accumulated in all frequency bins for a block of frames and then clustered to obtain mid-term broadband localization. Furthermore, the framework is able to integrate any Bayesian filtering algorithms for long-term localization. Simulation results with four arrays (each has four microphone) show that the proposed framework successfully localize three simultaneous sources where the distance among sources is about one meter.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call