Abstract

This chapter focuses on a systematic and generic approach which is experimented on scalable video genre classification and event detection. The system aims at the event detection scenario of an input video with an orderly sequential process. Initially, domain-knowledge independent local descriptors are extracted homogeneously from the input video sequence. Then the video representation is created by adopting a Bag-of-word (BoW) model. The video’s genre is firstly identified by applying the k-nearest neighbor (k-NN) classifiers on the initially obtained video representation. Various dissimilarity measures are assessed and evaluated analytically. Then, at the high-level event detection, a hidden conditional random field (HCRF) structured prediction model is utilized for interesting event detection. The input of this event detection relies on middle-level view agents in characterizing each frame of video sequence into one of four view groups, namely closed-up-view, mid-view, long-view and outer-field-view. Unsupervised probabilistic latent semantic analysis (PLSA) based approach is employed at the histogram-based video representation to achieve these middle-level view groups. The framework demonstrates the efficiency and generality in processing voluminous video collection and achieves various tasks in video analysis. The affectiveness of the framework is justified by extensive experimentation. Results are compared with benchmarks and state of the art algorithms. Limited human expertise and effort is involved in both domain-knowledge independent video representation and annotation free unsupervised view labeling. As a result, such a systematic and scalable approach can be widely applied in processing massive videos generically.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call