Abstract

Audio event classification is an important task for several applications such as surveillance, audio, video and multimedia retrieval etc. There are approximately 340 million people with hearing loss who can’t perceive events happening around them. This paper establishes the CURE dataset which contains curated set of specific audio events most relevant for people with hearing loss. It is formatted as 5 sec sound recordings derived from the Freesound project. We propose a ladder network based audio event classifier. We adopted the state-of-the-art convolutional neural network (CNN) embeddings as audio features for this task. We start with signal and feature normalization that aims to reduce the mismatch between different recordings scenarios. Initially, a CNN is trained on weakly labeled Audioset data. Next, the pre-trained model is adopted as feature extractor for proposed CURE corpus. We also explore the performance of extreme learning machine (ELM) and use support vector machine (SVM) as baseline classifier. As a second evaluation set we incorporate ESC-50. Results and discussions validate the superiority of Ladder network over ELM and SVM classifier in terms of robustness and increased classification accuracy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.