Attention based consistent semantic learning for micro-video scene recognition

Jie Guo,Xiushan Nie,Yuling Ma,Kashif Shaheed,Inam Ullah,Yilong Yin

doi:10.1016/j.ins.2020.05.064

Abstract

Micro-videos are one of the most popular multimedia forms in mobile internet domain, and scene recognition is important for micro-video semantic analyses and understanding. Compared with traditional videos, scene recognition in micro-videos is subject to content inconsistency for the same scene owing to the subjectivity of photographers. Moreover, the importance of frames for semantic representation differs in the same micro-video. These phenomenons limit micro-video scene recognition. To address these issues, in this paper, attention-based consistent semantic learning (ACSL) is proposed for micro-video scene recognition; this consists of a two-branch framework combined with an attention mechanism for the maintenance of the semantic consistency within classes. The experiments conducted in this study on multiple datasets revealed that the proposed ACSL achieves a better performance than other video scene recognition methods.

Full Text