Abstract
We exploit the criteria to optimize training set construction for the large-scale video semantic classification. Due to the large gap between low-level features and higher-level semantics, as well as the high diversity of video data, it is difficult to represent the prototypes of semantic concepts by a training set of limited size. In video semantic classification, most of the learning-based approaches require a large training set to achieve good generalization capacity, in which large amounts of labor-intensive manual labeling are ineluctable. However, it is observed that the generalization capacity of a classifier highly depends on the geometrical distribution of the training data rather than the size. We argue that a training set which includes most temporal and spatial distribution information of the whole data will achieve a good performance even if the size of training set is limited. In order to capture the geometrical distribution characteristics of a given video collection, we propose four metrics for constructing/selecting an optimal training set, including salience, temporal dispersiveness, spatial dispersiveness, and diversity. Furthermore, based on these metrics, we propose a set of optimization rules to capture the most distribution information of the whole data using a training set with a given size. Experimental results demonstrate these rules are effective for training set construction in video semantic classification, and significantly outperform random training set selection.
Highlights
IntroductionVideo content analysis is an elementary step for mining the semantic information in video collections, in which semantic classification (or we may call it annotation) of video segments is essential for further analysis, as well as important for enabling semantic-level video search
Video content analysis is an elementary step for mining the semantic information in video collections, in which semantic classification of video segments is essential for further analysis, as well as important for enabling semantic-level video search
We classify the shots in the video dataset into the following four semantic concepts: indoor, landscape, cityscape, and others
Summary
Video content analysis is an elementary step for mining the semantic information in video collections, in which semantic classification (or we may call it annotation) of video segments is essential for further analysis, as well as important for enabling semantic-level video search. Most semantic concepts are clear and easy to identify, while due to the large gap between semantics and low-level features, the corresponding features generally are not well-separated in feature space difficult to be identified by computer. This is an open difficulty in computer vision and visual content analysis area. Zhong and Chang [3] propose a unified framework for scene detection and structure analysis by combining domainspecific knowledge with supervised machine learning methods Most of these learning-based approaches require a large training set to achieve good generalization capacity, a great deal of labor-intensive manual labeling is inevitable. Tang et al [7] embed the temporal consistency of video data into the graph-based SSL and propose
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.