Improving acoustic event detection using generalizable visual features and multi-modality modeling

Po-Sen Huang,Xiaodan Zhuang,Mark Hasegawa-Johnson

doi:10.1109/icassp.2011.5946412

Abstract

Acoustic event detection (AED) aims to identify both timestamps and types of multiple events and has been found to be very challenging. The cues for these events often times exist in both audio and vision, but not necessarily in a synchronized fashion. We study improving the detection and classification of the events using cues from both modalities. We propose optical flow based spatial pyramid histograms as a generalizable visual representation that does not require training on labeled video data. Hidden Markov models (HMMs) are used for audio-only modeling, and multi-stream HMMs or coupled HMMs (CHMM) are used for audio-visual joint modeling. To allow the flexibility of audio-visual state asynchrony, we explore effective CHMM training via HMM state-space mapping, parameter tying and different initialization schemes. The proposed methods successfully improve acoustic event classification and detection on a multimedia meeting room dataset containing eleven types of general non-speech events without using extra data resource other than the video stream accompanying the audio observations. Our systems perform favorably compared to previously reported systems leveraging ad-hoc visual cue detectors and localization information obtained from multiple microphones.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving acoustic event detection using generalizable visual features and multi-modality modeling

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Detection of Acoustic Events by using MFCC and Spectro-Temporal Gabor Filterbank Features
Umair Zafar Khan ... Arslan Shaukat
-
Umair Zafar Khan, et. al.Umair Zafar Khan ... Arslan Shaukat
21 Nov 2016
21 Nov 2016

A coupled HMM approach to video-realistic speech animation
Lei Xie ... Zhi-Qiang Liu
Pattern Recognition | VOL. 40
Lei Xie, et. al.Lei Xie ... Zhi-Qiang Liu
18 Jan 2007
Pattern Recognition | VOL. 40

Global statistical features-based approach for Acoustic Event Detection
S.L Jayalakshmi ... R Nedunchelian
Applied Acoustics | VOL. 139
S.L Jayalakshmi, et. al.S.L Jayalakshmi ... R Nedunchelian
26 Apr 2018
Applied Acoustics | VOL. 139

Spectro-Temporal Gabor Filterbank Features for Acoustic Event Detection
Jens Schroder ... Jorn Anemuller
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 23
Jens Schroder, et. al.Jens Schroder ... Jorn Anemuller
01 Dec 2015
IEEE/ACM Transactions on Audio, Speech, and Language Processing | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving acoustic event detection using generalizable visual features and multi-modality modeling

Abstract

Talk to us

Similar Papers