Abstract

Machine Learning applied to Automatic Audio Surveillance has been attracting increasing attention in recent years. In spite of several investigations based on a large number of different approaches, little attention had been paid to the environmental temporal evolution of the input signal. In this work, we propose an exploration in this direction comparing the temporal correlations extracted at the feature level with the one learned by a representational structure. To this aim we analysed the prediction performances of a Recurrent Neural Network architecture varying the length of the processed input sequence and the size of the time window used in the feature extraction. Results corroborated the hypothesis that sequential models work better when dealing with data characterized by temporal order. However, so far the optimization of the temporal dimension remains an open issue.

Highlights

  • Automatic Sound Recognition (ASR) can be subdivided into two main categories: speech and nonspeech problems

  • We reported the real-time window I obtained by varying the window size w and the length s of the sequence processed by the encoding Long-Short Term Memory (LSTM) module

  • We report the Area under the ROC (AUC) obtained in each setting to evaluate the general behaviour of each model

Read more

Summary

Introduction

Automatic Sound Recognition (ASR) can be subdivided into two main categories: speech and nonspeech problems. In the Machine Learning literature the term ASR is usually referred to the second area, whereas the first one is a separated field of research that has been investigated in-depth in the last years. Restricting our attention to classification problems, these are divided into two major categories: when the classification aims at detect different events from a categorization decided in advance [6][7], or when the task is to distinguish normal events from abnormal ones. The second assumption seems more general in typical surveillance and system monitoring scenarios, and it is usually referred to as Novelty Detection (ND) [8]. The general idea is to model a function z(x) by training a learning agent on data considered to be normal, for which a large number of examples is usually available. Different techniques have been investigated, employing different classifiers and possible sets

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.