Multimodal extraction of events and of information about the recording activity in user generated videos

Francesco Cricri,Igor D D Curcio,Moncef Gabbouj,Sujeet Mate,Kostadin Dabov

doi:10.1007/s11042-012-1085-1

Francesco Cricri, Igor D D Curcio + Show 3 more

Open Access

https://doi.org/10.1007/s11042-012-1085-1

Copy DOI

Abstract

In this work we propose methods that exploit context sensor data modalities for the task of detecting interesting events and extracting high-level contextual information about the recording activity in user generated videos. Indeed, most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, compasses, GPS receivers, etc. Data captured by these sensors during the media acquisition have already been used to limit camera degradations such as shake and also to provide some basic tagging information such as the location. However, exploiting the sensor-recordings modality for subsequent higher-level information extraction such as interesting events has been a subject of rather limited research, further constrained to specialized acquisition setups. In this work, we show how these sensor modalities allow inferring information (camera movements, content degradations) about each individual video recording. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts) are available. For this kind of scenarios we jointly analyze these multiple video recordings and their associated sensor modalities in order to extract higher-level semantics of the recorded media: based on the orientation of cameras we identify the region of interest of the recorded scene, by exploiting correlation in the motion of different cameras we detect generic interesting events and estimate their relative position. Furthermore, by analyzing also the audio content captured by multiple users we detect more specific interesting events. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.

Highlights

In the past decade there has been an enormous growth in both the amount of user generated multimedia content and in the means for its sharing
In particular the sensor data streams were stored in text/plain format, as there are no standardized formats for storing this type of data together with the associated video content
In this paper we presented a set of multimodal analysis methods for detecting interesting events and obtaining high-level contextual information in user generated videos

Summary

Introduction

In the past decade there has been an enormous growth in both the amount of user generated multimedia content (images, video and audio) and in the means for its sharing (over the Internet). This was enabled by rapid advances in the multimedia recording capabilities of mobile devices and by the growth of social networking services. In order to effectively retrieve such events or objects, indexing the multimedia content is a necessary preliminary step. Automatic techniques are preferable when the amount of media items to annotate is large In both manual and automatic cases a preliminary step is required before indexing the media content: the analysis of the content itself in order to understand the semantics that are present therein

Objectives

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Multimedia Tools and Applications	Publication Date: May 5, 2012
Citations: 39	License type: cc-by

R Discovery Prime

R Discovery Prime

Multimodal extraction of events and of information about the recording activity in user generated videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications

Lead the way for us

Similar Papers

Multimodal Event Detection in User Generated Videos
Francesco Cricri ... Moncef Gabbouj
-
Francesco Cricri, et. al.Francesco Cricri ... Moncef Gabbouj
01 Dec 2011
01 Dec 2011

Camera placement optimization for sports filming
Damian J Figueroa ... Erik P Blasch
-
Damian J Figueroa, et. al.Damian J Figueroa ... Erik P Blasch
12 Apr 2021
12 Apr 2021

Adaptive cartesian motion control approach for a surgical robotic cameraman
V.F Munoz ... J Morales
-
V.F Munoz, et. al.V.F Munoz ... J Morales
01 Jan 2004
01 Jan 2004

Vehicle geo-localization based on IMM-UKF data fusion using a GPS receiver, a video camera and a 3D city model
Maya Dawood ... Denis Pomorski
-
Maya Dawood, et. al.Maya Dawood ... Denis Pomorski
01 Jun 2011
01 Jun 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multimodal extraction of events and of information about the recording activity in user generated videos

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Multimedia Tools and Applications