Artificial intelligence-based collaborative acoustic scene and event classification to support urban soundscape analysis and classification

Yuanbo Hou,Dick Botteldooren

doi:10.3397/in_2022_0974

Abstract

A human listener embedded in a sonic environment will rely on meaning given to sound events as well as on general acoustic features to analyse and appraise its soundscape. However, currently used measurable indicators for soundscape mainly focus on the latter and meaning is only included indirectly. Yet, today's artificial intelligence (AI) techniques allow to recognise a variety of sounds and thus assign meaning to them. Hence, we propose to combine a model for acoustic event classification trained on the large-scale environmental sound database AudioSet, with a scene classification algorithm that couples direct identification of acoustic features with these recognised sound for scene recognition. The combined model is trained on TUT2018, a database containing ten everyday scenes. Applying the resulting AI-model to the soundscapes of the world database without further training shows that the classification that is obtained correlates to perceived calmness and liveliness evaluated by a test panel. It also allows to unravel why an acoustic environment sounds like a lively square or a calm park by analysing the type of sounds and their occurrence pattern over time. Moreover, disturbance of the acoustic environment that is expected based on visual clues, by e.g. traffic can easily be recognised.

Full Text