Abstract

Monitoring of human and social activities is becoming increasingly pervasive in our living environment for public security and safety applications. The recognition of suspicious events is important in both indoor and outdoor environments, such as child-care centers, smart-homes, old-age homes, residential areas, office environments, elevators, and smart cities. Environmental audio scene and sound event recognition are the fundamental tasks involved in many audio surveillance applications. Although numerous approaches have been proposed, robust environmental audio surveillance remains a huge challenge due to various reasons, such as various types of overlapping audio sounds, background noises, and lack of universal and multi-modal datasets. The goal of this article is to review various features of representing audio scenes and sound events and provide appropriate machine learning algorithms for audio surveillance tasks. Benchmark datasets are categorized based on the real-world scenarios of audio surveillance applications. To have a quantitative understanding, some of the state-of-the-art approaches are evaluated based on two benchmark datasets for audio scenes and sound event recognition tasks. Finally, we outline the possible future directions for improving the recognition of environmental audio scenes and sound events.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call