Abstract
As the amount of available video data has grown substantially, automatic video classification has become an urgent yet challenging task. Most video classification methods focus on acquiring discriminative spacial visual features and motion patterns for video representation, especially deep learning methods, which have achieved very good results on action recognition problems. However, the performance of most of these methods drastically degenerates for more generic video classification tasks where the video contents are much more complex. Thus, in this paper, the mid-level semantics of videos are exploited to bridge the semantic gap between low-level features and high-level video semantics. Inspired by the term ``frequency-inverse document frequency'', a word weighting method for the problem of text classification is introduced to the video domain. The visual objects in videos are regarded as the words in texts, and two new weighting methods are proposed to encode videos by weighting visual objects according to the characteristics of videos. In addition, the semantic similarities between video categories and visual objects are introduced from the text domain as privileged information to facilitate classifier training on the obtained semantic representations of videos. The proposed semantic encoding method (semantic stream) is then fused with the popular two-stream CNN model for the final classification results. Experiments are conducted on two large-scale complex video datasets, CCV and ActivityNet. The experimental results validate the effectiveness of the proposed methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.