Abstract

Acoustic Scene Classification (ASC) is the task of assigning a semantic label for a given audio sample recorded in different acoustic environments. Sounds carry a significant information about everyday environment scenes, such as bus, tram, airport, concert hall, etc. Thus, extracting the sound signals of these acoustic scenes can be useful to detect and classify the audio signals. In this context, Detection and Classification of Acoustic Scenes and Events (DCASE) 2018 challenge provides a common framework for researchers to propose various approaches with an aim to extract this information present in different acoustical environments. In this paper, to capture the discriminative information between different acoustic scenes, Teager energies with both mel and linear scales are used. These are computed by applying Teager Energy Operator (TEO) on a narrowband filtered signal and are modeled with convolutional neural network (CNN) for detecting and classifying the acoustic scenes or events. The results obtained on the development set gave an overall accuracy of 67.3% using recommended cross-validation setup and thus, overcoming the performance of baseline by 6.3%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call