Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos,Georgios Skoumas,Alexandros Potamianos,Yannis Avrithis,Petros Maragos,Athanasia Zlatintsi,Konstantinos Rapantzikos

doi:10.1109/tmm.2013.2267205

Abstract

Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia

Lead the way for us

Journal: IEEE Transactions on Multimedia	Publication Date: Nov 1, 2013
Citations: 296

Similar Papers

Video event detection and summarization using audio, visual and text saliency
G Evangelopoulos ... A Potamianos
-
G Evangelopoulos, et. al.G Evangelopoulos ... A Potamianos
01 Apr 2009
01 Apr 2009

SATSal: A Multi-Level Self-Attention Based Architecture for Visual Saliency Prediction
Marouane Tliba ... Arzu Coltekin
IEEE Access | VOL. 10
Marouane Tliba, et. al.Marouane Tliba ... Arzu Coltekin
01 Jan 2021
IEEE Access | VOL. 10

Decoding Successive Computational Stages of Saliency Processing
Carsten Bogler ... John-Dylan Haynes
Current Biology | VOL. 21
Carsten Bogler, et. al.Carsten Bogler ... John-Dylan Haynes
29 Sep 2011
Current Biology | VOL. 21

Spatial orienting in complex audiovisual environments
Davide Nardo ... Emiliano Macaluso
Human Brain Mapping | VOL. 35
Davide Nardo, et. al.Davide Nardo ... Emiliano Macaluso
24 Apr 2013
Human Brain Mapping | VOL. 35

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Multimedia