Large-scale video semantic recognition based on consistency of segment-level and video-level predictions

Rui Wang,Zheng Wang,Jingjing Chen,Zejia Weng,Yu-Gang Jiang

doi:10.1360/ssi-2020-0014

Abstract

Segment-level video semantic recognition, which known to be an important task in video analysis, attempts to identify the semantic concepts in short video clips.Labeling video segments is difficult because there is an extremely large number of segments and there are no network tags; consequently, only a portion of the video segments are labeled. Determining how to improve the accuracy of semantic recognition of fragmented videos with limited semantic labels is a key challenge in video semantic recognition.This paper proposes a video semantic recognition algorithm based on the consistency of video- and segment-level predictions. The proposed algorithm introduces the constraint of consistency between complete video semantics and fragmentary video semantics. The proposed algorithm can be applied to filter the video segment semantic results to improve recognition accuracy. The proposed algorithm achieved 82.62% mean average precision on the video segment semantic recognition task using the large-scale video dataset YouTube-8M and ranked second in the third YouTube-8M competition.

Full Text