Conditional random field based image and video content analysis

Xiaofeng Wang

doi:10.32920/ryerson.14660733.v1

Abstract

Image and video content analysis is an interesting, meaningful and challenging topic. In recent years much of the research effort in the multimedia field focuses on indexing and retrieval. Semantic gap between low-level features and high-level content is a bottleneck in most systems. To bridge the semantic gap, new content analysis models need to be developed. In this thesis, algorithms based on a relatively new graphical model, called the conditional random field (CRF) model, are developed for two closely-related problems in content analysis: image labeling and video content analysis. The CRF model can represent spatial interactions in image labeling and temporal interactions in video content analysis. New feature functions are designed to better represent the feature distributions. The mixture feature functions are used in image labeling for databases with nature images, and the independent component analysis (ICA) mixture function is applied in sports video content analysis. The spatial dependence of image parts and the temporal dependence of video frames can be explored by the CRF model more effectively using new feature functions. For image labeling with large databases, the content-based image retrieval method is combined with the CRF image labeling model successfully.

Highlights

IntroductionImage and video content analysis is an interesting and challenging topic in the multimedia signal processing field
1.1 Image and Video Content AnalysisImage and video content analysis is an interesting and challenging topic in the multimedia signal processing field
Multimedia signal processing research is experiencing rapid surge because of the advance of new consumer electrical devices and the Internet, the indexing and retrieval research is dominant in this area

Summary

Introduction

Image and video content analysis is an interesting and challenging topic in the multimedia signal processing field. During the past two decades, content-based image and video retrieval dominated the multimedia signal processing research. The image and video search systems using content-based information instead of keywords become the center stage of multimedia research. The CRF model was first proposed by Lafferty for labeling 1D sequential data such as speech [52]. It is a discriminant probabilistic graphical model which addresses the limitations of a hidden Markov model (HMM). There are spatial interactions in image labeling and temporal interactions in video content analysis. New semantic content analysis algorithms are proposed for automatic processing of images and videos. Nature images and sports videos exhibit strong spatial and temporal dependence separately and modeling these dependencies using modern machine learning and pattern recognition algorithms is crucial to achieve a good understanding of these contents

Methods

Results

Discussion

Conclusion