Abstract

In this paper, a multi-modal context-aware approach to semantic video analysis is presented. Overall, the examined video sequence is initially segmented into shots and for every resulting shot appropriate color, motion and audio features are extracted. Then, Hidden Markov Models (HMMs) are employed for performing an initial association of each shot with the semantic classes that are of interest separately for each modality. Subsequently, a graphical modeling-based approach is proposed for jointly performing modality fusion and temporal context exploitation. Novelties of this work include the combined use of contextual information and multi-modal fusion, and the development of a new representation for providing motion distribution information to HMMs. Specifically, an integrated Bayesian Network is introduced for simultaneously performing information fusion of the individual modality analysis results and exploitation of temporal context, contrary to the usual practice of performing each task separately. Contextual information is in the form of temporal relations among the supported classes. Additionally, a new computationally efficient method for providing motion energy distribution-related information to HMMs, which supports the incorporation of motion characteristics from previous frames to the currently examined one, is presented. The final outcome of this overall video analysis framework is the association of a semantic class with every shot. Experimental results as well as comparative evaluation from the application of the proposed approach to four datasets belonging to the domains of tennis, news and volleyball broadcast video are presented.

Highlights

  • Due to the continuously increasing amount of video content generated everyday and the richness of the available means for sharing and distributing it, the need for efficient and advanced methodologies regarding video manipulation emerges as a challenging and imperative issue

  • Experimental results The proposed approach was experimentally evaluated and compared with literature approaches using videos of the tennis, news and volleyball broadcast domains. The selection of these application domains is made mainly due to the following characteristics that the videos of the aforementioned categories present: (a) a set of meaningful high-level semantic classes, whose detection often requires the use of multi-modal information, is present in such videos, and (b) videos belonging to these domains present relatively well-defined temporal structure, i.e. the semantic classes that they contain tend to occur according to a particular order in time

  • Only a set of manually annotated video content is required by the employed Hidden Markov Models (HMMs) and Bayesian Network (BN) for parameter learning

Read more

Summary

Introduction

Due to the continuously increasing amount of video content generated everyday and the richness of the available means for sharing and distributing it, the need for efficient and advanced methodologies regarding video manipulation emerges as a challenging and imperative issue. The fundamental principle of shifting video manipulation techniques towards the processing of the visual content at a semantic level has been widely adopted. An important issue in the process of semantic video analysis is the number of modalities which are utilized. Approaches that make use of two or more modalities in a collaborative fashion exploit the possible correlations and interdependencies between their respective data [5]. They capture more efficiently the semantic information contained in the video, since the semantics of the latter are typically embedded in multiple forms that are complementary to each other [6]. Modality fusion generally enables the detection of more complex and higher-level semantic concepts and facilitates the effective generation of more accurate semantic descriptions

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.