Abstract

The detection and classification of acoustic events in various environments is an important task. Its applications range from multimedia analysis to surveillance of humans or even animal life. Several of these tasks require the capability of online processing. Besides many approaches that tackle the task of acoustic event detection, methods that are based on the well known bag-of-features principle also emerged into the field. Acoustic features are calculated for all frames in a given time window. Then, applying the bag-of-features concept, these features are quantized with respect to a learned codebook and a histogram representation is computed. Bag-of-features approaches are particularly interesting for online processing as they have a low computational cost. In this paper, the bag-of-features principle and various extensions are reviewed, including soft quantization, supervised codebook learning, and temporal modeling. Furthermore, Mel and Gammatone frequency cepstral coefficients that originate from psychoacoustic models are used as the underlying feature set for the bag-of-features. The possibility of fusing the results of multiple channels in order to improve the robustness is shown. Two databases are used for the experiments: The DCASE 2013 office live dataset and the ITC-IRST multichannel dataset.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call