A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

Regunathan Radhakrishnan,Isao Otsuka,Ajay Divakaran,Ziyou Xiong

doi:10.1155/asp/2006/89013

Abstract

We propose a content-adaptive analysis and representation framework to discover events using audio features from multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that interesting events in unscripted multimedia occur sparsely in a background of usual or uninteresting events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that highlight events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select key audio classes that are indicative of events of interest in the chosen domain.

Highlights

The goals of multimedia content summarization are twofold
We present the results of the proposed framework with two different content genres mainly using lowlevel audio features and semantic audio classification labels at the “8 ms frame level” and “one-second level.”
A 2-component Gaussian mixture models (GMMs) was used to model the PDF of the low-level audio features in the 8-second context

Summary

INTRODUCTION

The goals of multimedia content summarization are twofold. One is to capture the essence of the content in a succinct manner and the other is to provide a top-down access into the content for browsing. Based on the detection of such domain-specific key audio-visual objects (audio-visual markers) that are indicative of the “highlight” or “interesting” events, we proposed a hierarchical representation for unscripted content as shown in Figure 2 [15]. The rest of the representation units require the use of domain knowledge in the form of supervised audio-visual object detectors that are correlated with events of interest. This necessitates a separate analysis framework for each domain in which the key audiovisual objects are chosen based on intuition.

PROPOSED FRAMEWORK

OUTLIER SUBSEQUENCE DETECTION IN TIME SERIES

Problem formulation

Segmentation using eigenvector analysis of affinity matrices

Proposed outlier subsequence detection in time series

Results with synthetic time series data

Performance of the normalized cut for Case 2

Comparison with other clustering approaches for Case 2

Performance of normalized cut for Case 3

Hierarchical clustering using normalized cut for Case 4

RANKING OUTLIERS FOR SUMMARIZATION

Kernel density estimation

Difference between

Confidence measure for outliers with GMM and HMM models for the contexts

Using confidence measures to rank outliers

EXPERIMENTAL RESULTS

Results with sports audio content

Outlier subsequence detection from the extracted program segments

Background with cars passing intersection normally

Results with surveillance audio content

Results with elevator surveillance audio

Results with traffic intersection surveillance audio

SYSTEMATIC CHOICE OF KEY AUDIO CLASSES

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Jan 29, 2006
Citations: 45	License type: cc-by

R Discovery Prime

R Discovery Prime

A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

A time series clustering based framework for multimedia mining and summarization using audio features
Regunathan Radhakrishnan ... Ajay Divakaran
-
Regunathan Radhakrishnan, et. al.Regunathan Radhakrishnan ... Ajay Divakaran
15 Oct 2004
15 Oct 2004

Chapter 4 - Video Structure Discovery Using Unsupervised Learning
Ziyou Xiong ... Ajay Divakaran
A Unified Framework for Video Summarization, Browsing & Retrieval | VOL. -
Ziyou Xiong, et. al.Ziyou Xiong ... Ajay Divakaran
01 Jan 2006
A Unified Framework for Video Summarization, Browsing & Retrieval | VOL. -

An EEG-fNIRS neurovascular coupling analysis method to investigate cognitive-motor interference
Jianeng Lin ... Jianda Han
Computers in Biology and Medicine | VOL. 160
Jianeng Lin, et. al.Jianeng Lin ... Jianda Han
06 May 2023
Computers in Biology and Medicine | VOL. 160

LDER: a classification framework based on ERP enhancement in RSVP task
Yujie Cui ... Hao Tang
Journal of Neural Engineering | VOL. 20
Yujie Cui, et. al.Yujie Cui ... Hao Tang
01 Jun 2023
Journal of Neural Engineering | VOL. 20

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing