This paper presents a deep learning model for sound source localization, which considers localization as a source direction classification problem. An approach is proposed based on the integration of sound intensity features and GCC-PHAT (Generalized cross-correlation - phase transform) features as input data for convolutional neural networks. Taking into account the modeling conditions, datasets were created for the purpose of training, validating and testing the model with spatial resolutions of 10º and 2 º. Simulation results demonstrated the effectiveness of the proposed model in localizing the source with high accuracy in a closed environment and in the presence of reverberation. The proposed model with a resolution of 10º outperformed the model fed with only sound intensity features as input features, achieving improvement in accuracy by 6,57% and in prediction accuracy by 2,86%, while the model with a resolution of 2º achieved an improvement in accuracy by 15,57% and in prediction accuracy by 2,04%.
Read full abstract