Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Taras Butko,Xavier Giró,Carlos Segura,Javier Hernando,Cristian Canton-Ferrer,Climent Nadeu,Josep R Casas

doi:10.1155/2011/485738

Taras Butko, Xavier Giró + Show 5 more

Open Access

https://doi.org/10.1155/2011/485738

Copy DOI

Abstract

Acoustic event detection (AED) aims at determining the identity of sounds and their temporal position in audio signals. When applied to spontaneously generated acoustic events, AED based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps. Actually, temporal overlaps accounted for more than 70% of errors in the real-world interactive seminar recordings used in CLEAR 2007 evaluations. In this paper, we improve the recognition rate of acoustic events using information from both audio and video modalities. First, the acoustic data are processed to obtain both a set of spectrotemporal features and the 3D localization coordinates of the sound source. Second, a number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events. A feature-level fusion strategy is used, and a parallel structure of binary HMM-based detectors is employed in our work. The experimental results show that information from both the microphone array and video cameras is useful to improve the detection rate of isolated as well as spontaneously generated acoustic events.

Highlights

The detection of the acoustic events (AEs) naturally produced in a meeting room may help to describe the human and social activity
When applied to spontaneously generated acoustic events, Acoustic event detection (AED) based only on audio information shows a large amount of errors, which are mostly due to temporal overlaps
A number of features are extracted from video recordings by means of object detection, motion analysis, and multicamera person tracking to represent the visual counterpart of several acoustic events

Summary

Introduction

The detection of the acoustic events (AEs) naturally produced in a meeting room may help to describe the human and social activity. In a meeting/lecture context, we may associate a chair moving or door noise to its start or end, cup clinking to a coffee break, or footsteps to somebody entering or leaving Some of these AEs are tightly coupled with human behaviors or psychological states: paper wrapping may denote tension; laughing, cheerfulness; yawning in the middle of a lecture, boredom; keyboard typing, distraction from the main activity in a meeting; clapping during a speech, approval. The overlap problem may be tackled by developing more efficient algorithms either at the signal level using source separation techniques like independent component analysis [8]; at feature level, by means of using specific features [9] or at the model level [10] Another approach is to use an additional modality that is less sensitive to the overlap phenomena present in the audio signal.

Database and Metrics

Audio Feature Extraction

Video Feature Extraction

Multimodal Acoustic Event Detection

Experiments

Findings

Conclusions and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Feb 13, 2011
Citations: 39	License type: cc-by

R Discovery Prime

R Discovery Prime

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features
Umair Zafar Khan ... Abdul Wahid
-
Umair Zafar Khan, et. al.Umair Zafar Khan ... Abdul Wahid
21 Nov 2016
21 Nov 2016

Improving detection of acoustic events using audiovisual data and feature level fusion
T Butko ... J R Casas
-
T Butko, et. al.T Butko ... J R Casas
06 Sep 2009
06 Sep 2009

Adaptive Multi-Scale Detection of Acoustic Events
Wenhao Ding ... Liang He
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28
Wenhao Ding, et. al.Wenhao Ding ... Liang He
22 Nov 2019
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 28

Global statistical features-based approach for Acoustic Event Detection
S.L Jayalakshmi ... R Nedunchelian
Applied Acoustics | VOL. 139
S.L Jayalakshmi, et. al.S.L Jayalakshmi ... R Nedunchelian
26 Apr 2018
Applied Acoustics | VOL. 139

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing