Abstract

In recent decades, surveillance and home security systems based on video analysis have been proposed for the automatic detection of abnormal situations. Nevertheless, in several real applications, it may be easier to detect a given event from audio information, and the use of audio surveillance systems can greatly improve the robustness and reliability of event detection. In this paper, a novel system for the detection of polyphonic urban noise is proposed for on-campus audio surveillance. The system aggregates different acoustic features to improve the classification accuracy of urban noise. A combination model composed of a capsule neural network (CapsNet) and recurrent neural network (RNN) is employed as the classifier. CapsNet overcomes some limitations of convolutional neural networks (CNNs), such as the loss of position information after max-pooling, and the RNN mainly models the temporal dependency of context information. The combination of these networks further improves the accuracy and robustness of polyphonic sound events detection. Moreover, a monitoring platform is designed to visualize noise maps and acoustic event information. The deployment architecture of the system is used in real environments, and experiments were also conducted on two public datasets. The results demonstrate that the proposed method is superior to existing state-of-art methods for the polyphonic sound event detection task.

Highlights

  • In real environments, visual information is generally not sufficient to reliably convey what occurs in a city, for example, a car horn in a no-honking area that is undetectable from video streams can be detected by audio analysis systems

  • An analysis was conducted on the classification outcomes of the system, and the information of the noise and events were mapped to the monitoring platform

  • A novel audio detection system was designed in this study, and a total of 100 wireless sensor nodes were deployed to monitor abnormal events on a university campus in real-time

Read more

Summary

Introduction

Visual information is generally not sufficient to reliably convey what occurs in a city, for example, a car horn in a no-honking area that is undetectable from video streams can be detected by audio analysis systems. By using only one mono microphone and one camera to integrate visual and audio data into the scene analysis, automatic surveillance systems’ detection ability can be enhanced [1]. Audio surveillance is critical for the detection of urban noise in real environments, there remain numerous problems of anomalous sound detection. Abnormal sounds will be superimposed on great levels of background noise, in some cases, occurring far away from the microphone, leading to very low signal-to-noise ratios (SNRs). The processing of RNN FC zi FC z j FC zk FC zl

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call