Abstract

In this work we propose methods that exploit context sensor data modalities for the task of detecting interesting events and extracting high-level contextual information about the recording activity in user generated videos. Indeed, most camera-enabled electronic devices contain various auxiliary sensors such as accelerometers, compasses, GPS receivers, etc. Data captured by these sensors during the media acquisition have already been used to limit camera degradations such as shake and also to provide some basic tagging information such as the location. However, exploiting the sensor-recordings modality for subsequent higher-level information extraction such as interesting events has been a subject of rather limited research, further constrained to specialized acquisition setups. In this work, we show how these sensor modalities allow inferring information (camera movements, content degradations) about each individual video recording. In addition, we consider a multi-camera scenario, where multiple user generated recordings of a common scene (e.g., music concerts) are available. For this kind of scenarios we jointly analyze these multiple video recordings and their associated sensor modalities in order to extract higher-level semantics of the recorded media: based on the orientation of cameras we identify the region of interest of the recorded scene, by exploiting correlation in the motion of different cameras we detect generic interesting events and estimate their relative position. Furthermore, by analyzing also the audio content captured by multiple users we detect more specific interesting events. We show that the proposed multimodal analysis methods perform well on various recordings obtained in real live music performances.

Highlights

  • In the past decade there has been an enormous growth in both the amount of user generated multimedia content and in the means for its sharing

  • In particular the sensor data streams were stored in text/plain format, as there are no standardized formats for storing this type of data together with the associated video content

  • In this paper we presented a set of multimodal analysis methods for detecting interesting events and obtaining high-level contextual information in user generated videos

Read more

Summary

Introduction

In the past decade there has been an enormous growth in both the amount of user generated multimedia content (images, video and audio) and in the means for its sharing (over the Internet). This was enabled by rapid advances in the multimedia recording capabilities of mobile devices and by the growth of social networking services. In order to effectively retrieve such events or objects, indexing the multimedia content is a necessary preliminary step. Automatic techniques are preferable when the amount of media items to annotate is large In both manual and automatic cases a preliminary step is required before indexing the media content: the analysis of the content itself in order to understand the semantics that are present therein

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.