Abstract
In the last decade, urban noise pollution has become a significant environmental concern that can be mitigated with the help of audio detection algorithms for classifying different sources of noise and creating more informative noise maps. In this context, machine learning, particularly transfer learning, is an essential technology that enables accurate analysis of urban noise sources. However, the choice of the pre-trained model used to compute audio embeddings can significantly influence the performance of downstream classification tasks. This paper aims to compare the embeddings of various pre-trained models on different data collection campaigns in the context of the Sons al balcó project and quantify the robustness of audio representations. To achieve this, we develop metrics and statistically test the presence of distribution shifts in learned latent features. To evaluate the quality of the embeddings, we perform both qualitative and quantitative analysis using dimensionality reduction methods and assess the performance on downstream tasks using data from different collection campaigns. Results highlight major differences between general purpose and specific models. Our findings suggest the need for careful consideration during the choice of the pre-trained model to use in audio event detection applications.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: INTER-NOISE and NOISE-CON Congress and Conference Proceedings
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.