Abstract
In this paper, we propose a generic emergency detection system using only the sound produced in the environment. For this task, we employ multiple audio feature extraction techniques like the mel-frequency cepstral coefficients, gammatone frequency cepstral coefficients, constant Q-transform and chromagram. After feature extraction, a deep convolutional neural network (CNN) is used to classify an audio signal as a potential emergency situation or not. The entire model is based on our previous work that sets the new state of the art in the environment sound classification (ESC) task (Our paper is under review in the IEEE/ACM Transactions on Audio, Speech and Language Processing and also available here https://arxiv.org/abs/1908.11219.) We combine the benchmark ESC datasets: UrbanSound8K and ESC-50 (ESC-10 is a subset of ESC-50) and reduce the problem to a binary classification problem. This is done by aggregating sound classes such as sirens, fire crackling, glass breaking, gunshot as the emergency class and others as normal. Even though there are only two classes to distinguish, they are highly imbalanced. To overcome this difficulty, we introduce class weights in calculating the loss while training the model. Our model is able to achieve \(99.56\%\) emergency detection accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.