Characterization of Moving Sound Sources Direction-of-Arrival Estimation Using Different Deep Learning Architectures

Jana Rusrus,Shervin Shirmohammadi,Martin Bouchard

doi:10.1109/tim.2023.3241983

Abstract

Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While a number of previous works has focused on static sound sources, in this paper we evaluate the performance of a deep learning classification system for localization of moving sound sources. In particular, we evaluate the effect of key parameters at the levels of feature extraction (e.g., STFT parameters) and model training (e.g., neural network architectures). We evaluate the performance of different settings in terms of precision and F-score, in a multi-class multi-label classification framework. In our previous work for localization of moving sound sources, we investigated feedforward neural networks under different acoustic conditions and STFT parameters, and showed that the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of the sources. In this paper, we extend the work to show that (1) window size does not affect the performance of static sources but highly affects the performance of moving sources, (2) sequence length has a significant effect on the performance of recurrent architectures, and (3) a temporal convolutional neural network can outperform both recurrent and feedforward networks for moving sound sources.

Full Text