Abstract

Currently, with the rapid development of mobile Internet, micro-video has become a prevailing format of user-generated contents (UGCs) on various social media platforms. Several studies have been conducted towards to understanding high-level micro-video semantics, such as venue categorization, memorability, and popularity. However, these approaches supported tasks with only a single output, which exhibited limitations when attempting to use them to resolve tasks with multiple outputs, especially the multi-label micro-video classification. To tackle this problem, in this paper, we propose a dual multi-modal low-rank decomposition (DMLRD) method for multi-label micro-video classification tasks. To learn more comprehensive micro-video representations, we first learn the low-rank-regularized modality-specific and modality-shared components by considering the consistency and the complementarity among modalities simultaneously. Meanwhile, the less descriptive power of each modality aroused by inherent properties can be solved to a certain extent. To obtain unseen label representations, we next construct a sparsity-regularized multi-matrix normal estimation term to jointly encode the latent relationship structures among labels and dimensions. Experiments on two datasets demonstrate the effectiveness of our proposed method over the state-of-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call