Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

Ioannis Mademlis,Nikos Nikolaidis,Anastasios Tefas,Ioannis Pitas

doi:10.1109/tip.2016.2615289

Ioannis Mademlis, Nikos Nikolaidis + Show 2 more

Open Access

https://doi.org/10.1109/tip.2016.2615289

Copy DOI

Abstract

Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue.

Highlights

In recent years, the emergence of massive digital video data and their easy global availability, e.g., through popular on-line and mobile Internet channels, has heavily impacted Western societies and accelerated the transformation of their culture into a visual one [1]
Image registration and spatiotemporal motion modelling are employed in videos depicting human actions, in order to summarize them with a single artificial image which is representative of an entire video sequence and expresses a still representation of the dominant motion [33]
In the final stage of the proposed video summarization pipeline, a previously developed depth jump cut detection and characterization algorithm is applied on the produced video skim and a depth continuity characterization is derived per frame [60]

Summary

INTRODUCTION

The emergence of massive digital video data and their easy global availability, e.g., through popular on-line and mobile Internet channels, has heavily impacted Western societies and accelerated the transformation of their culture into a visual one [1]. The remaining ones are temporally expanded to key-segments, which are subsequently concatenated, in order to form a stereoscopic video skim The latter is post-processed in four ways. The retention percentage parameter is again employed in a proposed Multimodal Shot Pruning (MSP) process, which discards key-segments from the derived video skim, based on which shot they belong to and on pre-existing information about temporal speech (audio) and face (visual) appearance segments. This process is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach.

Video Summarization

Disparity Estimation and Stereoscopic Video Summarization

Statistical Stereoscopic Video Description for Key-Frame Extraction

Initial Video Skim Construction

Elimination of Disorienting Editing Effects

Elimination of Stereoscopic Video Defects

EVALUATION

Method

CONCLUSIONS

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Image Processing	Publication Date: Oct 5, 2016
Citations: 78	License type: other-oa

R Discovery Prime

R Discovery Prime

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing

Lead the way for us

Similar Papers

Stereoscopic video description for key-frame extraction in movie summarization
Ioannis Mademlis ... Ioannis Pitas
-
Ioannis Mademlis, et. al.Ioannis Mademlis ... Ioannis Pitas
01 Aug 2015
01 Aug 2015

Unmanned Aerial Vehicle Position Estimation Augmentation Using Optical Flow Sensor
Xiang Li ... Cong Hu
IEEE Sensors Journal | VOL. 23
Xiang Li, et. al.Xiang Li ... Cong Hu
01 Jul 2023
IEEE Sensors Journal | VOL. 23

Deep Learning Assists Surveillance Experts: Toward Video Data Prioritization
Tanveer Hussain ... Khan Muhammad
IEEE Transactions on Industrial Informatics | VOL. 19
Tanveer Hussain, et. al.Tanveer Hussain ... Khan Muhammad
01 Jul 2023
IEEE Transactions on Industrial Informatics | VOL. 19

AudioVisual Video Summarization.
Bin Zhao ... Xuelong Li
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34
Bin Zhao, et. al.Bin Zhao ... Xuelong Li
01 Aug 2023
IEEE Transactions on Neural Networks and Learning Systems | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Transactions on Image Processing