In everyday life, people experience different soundscapes in which natural sounds, animal noises, and man-made sounds blend together. Although there have been several studies on the importance of recurring sound patterns in music and language, the relevance of this phenomenon in natural soundscapes is still largely unexplored. In this article, we study the repetition patterns of harmonic and transient sound events as potential cues for acoustic scene classification (ASC). In the first part of our study, our aim is to identify acoustic scene classes that exhibit characteristic sound repetition patterns concerning harmonic and transient sounds. We propose three metrics to measure the overall prevalence of sound repetitions as well as their repetition periods and temporal stability. In the second part, we evaluate three strategies to incorporate self-similarity matrices as an additional input feature to a convolutional neural network architecture for ASC. We observe the characteristic repetition of transient sounds in recordings of “park” and “street traffic” as well as harmonic sound repetitions in acoustic scene classes related to public transportation. In the ASC experiments, hybrid network architectures, which combine spectrogram features and features from sound recurrence analysis, show increased accuracy for those classes with prominent sound repetition patterns. Our findings provide additional perspective on the distinctions among acoustic scenes previously primarily ascribed in the literature to their spectral features.
Read full abstract