Abstract
Sound source localization from a flying drone is a challenging task due to the strong ego-noise from rotating motors and propellers as well as the movement of the drone and the sound sources. To address this challenge, we propose a deep-learning-based framework that integrates single-channel noise reduction and multichannel source localization. In this framework, we suppress the ego-noise and estimate a time–frequency soft ratio mask with a single-channel deep neural network (DNN). Then, we design two downstream multichannel source localization algorithms, based on steered response power (SRP-DNN) and time–frequency spatial filtering (TFS-DNN). The main novelty lies in the proposed TFS-DNN approach, which estimates the presence probability of the target sound at the individual time–frequency bins by combining the DNN-inferred soft ratio mask and the instantaneous direction of arrival (DOA) of the sound received by the microphone array. The time–frequency presence probability of the target sound is then used to design a set of spatial filters to construct a spatial likelihood map for source localization. By jointly exploiting spectral and spatial information, TFS-DNN robustly processes signals in short segments (e.g., 0.5 s) in dynamic and low signal-to-noise-ratio (SNR) scenarios (e.g., SNR −20 dB). Results on real and simulated data in a variety of scenarios (static sources, moving sources, and moving drones) indicate the advantage of TFS-DNN over competing methods, including SRP-DNN and the state-of-the-art TFS.
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have