Abstract

The automatic content-based classification of complex and dynamic urban sound is an important aspect of various emerging applications, such as surveillance, urban soundscape understanding and noise source identification, therefore the research topic has gained a lot of attention in recent years. The aim of this paper is to develop efficient machine learning-based scheme for urban sound classification in real-life noise conditions. Unlike conventional sound event classification methods that mainly address local temporal-spectral patterns, we propose an aggregation scheme to combine both local and global acoustic features. For characterizing local patterns, we employ feature learning method to extract class-dependent temporal-spectral structures; on the other hand, long-term descriptive statistics are employed to exploit global features of sound events, e.g. variability and recurrence, which also carry rich discriminant information. In order to aggregate the heterogeneous acoustic information for classification, we introduce mixture of experts model (MoE) which effectively formulates relationship between local and global information. At validation stage, we conduct experiments on UrbanSound8K database which consists of 10 categories of urban sound events with 8732 real-world clips. It is noteworthy that the 10 classes of crowdsourced recordings, including air conditioner, car horn, children playing, dog bark, drilling, engine idling, gunshot, jackhammer, siren and street music, are most common urban sounds closely related to urban life. According to experimental results, the proposed scheme achieved superior performance compared with 3 other latest approaches and it can be a fundamental building block of various urban multimedia information processing systems that help to improve quality of life.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call