Abstract

Video data are usually represented by high dimensional features. The performance of video semantic recognition, however, may be deteriorated due to the irrelevant and redundant components included into the high dimensional representations. To improve the performance of video semantic recognition, we propose a new feature selection framework in this paper and validate it through applications of video semantic recognition. Two issues are considered in our framework. First, while those labeled videos are precious, their relevant labeled images are abundant and available in the WEB. Therefore, a supervised transfer learning is proposed to achieve the cross-media analysis, in which the discriminative features are selected by evaluating feature’s correlation with the classes of videos and relevant images. Second, the labeled videos are normally rare in real-world applications. In our framework, therefore, an unsupervised subspace learning is added to retain the most valuable information and eliminate the feature redundancies by leveraging both labeled and unlabeled videos. The cross-media analysis and embedded learning are simultaneously learned in a joint framework, which enables our algorithm to utilize the common knowledge of cross-media analysis and embedded learning as supplementary information to facilitate decision making. An efficient iterative algorithm is proposed to optimize the proposed learning-based feature selection, in which convergence is guaranteed. Experiments on different databases have demonstrated the effectiveness of the proposed algorithm.

Highlights

  • Video semantics recognition [1] is a fundamental research problem in computer vision [2, 3] and multimedia analysis [4, 5]

  • 3 Experimental results and discussion we propose the video semantic recognition experiments which evaluate the performance of our jointing cross-media analysis and embedded learning (JCAEL) for feature selection

  • 3.1 Experimental datasets In order to evaluate the contribution from crossmedia analysis, we construct three couples of video and image datasets, which include HMDB13←“Extensive Images Databases” (EID, image dataset), UCF10← Actions Images Databases (AID, image dataset), UCF←PPMI4, where “←” denotes the direction of adaptation from images to videos

Read more

Summary

Introduction

Video semantics recognition [1] is a fundamental research problem in computer vision [2, 3] and multimedia analysis [4, 5]. (1) As JCAEL can transfer the learned knowledge from relevant images to videos for improving the video feature selection, it can directly use some labeled images to address the problem of an insufficient label information Such a merit ensures that our method is able to uncover the common discriminative features in videos and images of the same class, which provides us with better interpretability of the features. (2) Our method contains unsupervised embedded learning, which utilizes both labeled and unlabeled videos for feature selection This advantage guarantees that JCAEL can exploit the variance and separability of all training videos to find the common irrelevant or noisy features and generating optimal feature subsets.

Notations
The proposed framework of JCAEL
Influence of cross-media analysis and embedded learning
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.