Abstract
Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While a number of previous works has focused on static sound sources, in this paper we evaluate the performance of a deep learning classification system for localization of moving sound sources. In particular, we evaluate the effect of key parameters at the levels of feature extraction (e.g., STFT parameters) and model training (e.g., neural network architectures). We evaluate the performance of different settings in terms of precision and F-score, in a multi-class multi-label classification framework. In our previous work for localization of moving sound sources, we investigated feedforward neural networks under different acoustic conditions and STFT parameters, and showed that the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of the sources. In this paper, we extend the work to show that (1) window size does not affect the performance of static sources but highly affects the performance of moving sources, (2) sequence length has a significant effect on the performance of recurrent architectures, and (3) a temporal convolutional neural network can outperform both recurrent and feedforward networks for moving sound sources.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Instrumentation and Measurement
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.